Oracle Scratchpad

March 23, 2017

min/max Upgrade

Filed under: Bugs,CBO,Execution plans,Indexing,Oracle,Troubleshooting — Jonathan Lewis @ 8:53 am GMT Mar 23,2017

A question came up on the OTN database forum a little while ago about a very simple query that was taking different execution paths on two databases with the same table and index definitions and similar data. In one database the plan used the “index full scan (min/max)” operation while the other database used a brute force “index fast full scan” operation.

In most circumstances the starting point to address a question like this is to check whether some configuration details, or some statistics, or the values used in the query are sufficiently different to result in a significant change in costs; and the first simple procedure you can follow is to hint each database to use the plan from the opposite database to see if this produces any clues about the difference – it’s a good idea when doing this test to use one of the more verbose formatting options for the call to dbms_xplan.

In this case, though, the OP discovered a note on MoS reporting exactly the problem he was seeing:

Doc ID 2144428.1: Optimizer Picking Wrong ‘INDEX FAST FULL SCAN’ Plan vs Correct ‘INDEX FULL SCAN (MIN/MAX)’

which referred to

Bug 22662807: OPTIMIZER PICKING INDEX FFS CAN INSTEAD OF MIN/MAX

Conveniently the document suggested a few workarounds:

  • alter session set optimizer_features_enable = ‘11.2.0.3’;
  • alter session set “_fix_control” = ‘13430622:off’;
  • delete object stats [Ed: so that dynamic sampling takes place … maybe a /*+ dynamic_sampling(alias level) */ hint would suffice].

Of the three options my preference would (at least in the short term) be the _fix_control one. Specifically, from the v$system_fix_control view, we can see that it addresses the problem very precisely with the description: “index min/max cardinality estimate fix for filter predicates”.

The example in the bug note showed a very simple statement (even more simple than the OP’s query which was only a single table query anyway), so I thought I’d build a model and run a few tests to see what was going on. Luckily, before I’d started work, one of the other members of the Oak Table network sent an email to the list asking if anyone knew how the optimizer was costing an example he’d constructed – and I’ve finally got around to looking at his example, and here’s the model and answer(s), starting with the data set:


rem
rem     Script:         test_min_max.sql
rem     Dated:          March 2017
rem
rem     Last tested
rem             12.1.0.2
rem             11.2.0.4
rem             11.2.0.3
rem

create table min_max_test nologging
as
with ids as (
        select /*+ Materialize */ rownum  id from dual connect by rownum <= 50000 -- > comment to protect formatting
),
line_nrs as (
        select /*+ Materialize */  rownum line_nr from dual connect by rownum <= 20 -- > comment to protect formatting
)
select
        id, line_nr ,rpad(' ', 800, '*') data
from
        line_nrs, ids
order by
        line_nr, id
;

begin
        dbms_stats.gather_table_stats(
                ownname          => user,
                tabname          =>'min_max_test',
                method_opt       => 'for all columns size 1'
        );
end;
/

create index mmt_ln_id on min_max_test (line_nr, id) nologging;
create index mmt_id    on min_max_test (id)          nologging;

The table has two critical columns: each id has 20 line_nr values associated with it, but the way the data was generated means that the line numbers for a given id are scattered across 20 separate table blocks.

There are two indexes – one on the id which will allow us to find all the rows for a given id as efficiently as possible, and one (slightly odd-looking in this context) that would allow us to find a specific row for a given line_nr and id very efficiently. Two things about these indexes – in a live application they should both be compressed on the first (only, in the case of index mmt_id) column, and secondly the necessity of the mmt_id index is questionable and it might be an index you could drop if you reversed the order of the columns in mmt_ln_id. The thing about these indexes, though, is that they allow us to demonstrate a problem. So let’s query the data – twice, hinting each index in turn:


set serveroutput off

select
        /*+ index(t(id)) */
        min(line_nr)
from
        min_max_test t
where
        id = :b1
;

select * from table(dbms_xplan.display_cursor);

select
        /*+ index(t(line_nr, id)) */
        min(line_nr)
from
        min_max_test t
where
        id = :b1
;

select * from table(dbms_xplan.display_cursor);

It’s fairly safe to make a prediction about the execution plan and cost of the first query – it’s likely to be a range scan that accesses a couple of branch blocks, a leaf block and 20 separate table blocks followed by a “sort aggregate” – with a cost of about 23.

It’s a little harder to make a prediction about the second query. The optimizer could infer that the min(line_nr) has to be close to the left hand section of the index, and could note that the number of rows in the table is the same as the product of the number of distinct values of the two separate columns, and it might note that the id column is evenly distributed (no histogram) across the data, so it might “guess” that it need only range scan all the entries for the first line_nr to find the appropriate id. So perhaps the optimizer will use the index min/max range scan with a cost that is roughly 2 branch blocks plus total leaf blocks / 20 (since there are 20 distinct values for line_nr); maybe it would divide the leaf block estimate by two because “on average” – i.e. for repeated random selections of value for id – it would have to scan half the leaf blocks. There were 2,618 leaf blocks in my index, so the cost should be close to either 133 or 68.

Here are the two plans – range scan first, min/max second:


select  /*+ index(t(id)) */  min(line_nr) from  min_max_test t where id = :b1
-----------------------------------------------------------------------------------------------------
| Id  | Operation                            | Name         | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                     |              |       |       |    23 (100)|          |
|   1 |  SORT AGGREGATE                      |              |     1 |     8 |            |          |
|   2 |   TABLE ACCESS BY INDEX ROWID BATCHED| MIN_MAX_TEST |    20 |   160 |    23   (0)| 00:00:01 |
|*  3 |    INDEX RANGE SCAN                  | MMT_ID       |    20 |       |     3   (0)| 00:00:01 |
-----------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   3 - access("ID"=:B1)


select  /*+ index(t(line_nr, id)) */  min(line_nr) from  min_max_test t where  id = :b1
-----------------------------------------------------------------------------------------
| Id  | Operation                   | Name      | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |           |       |       |    22 (100)|          |
|   1 |  SORT AGGREGATE             |           |     1 |     8 |            |          |
|   2 |   FIRST ROW                 |           |     1 |     8 |    22   (0)| 00:00:01 |
|*  3 |    INDEX FULL SCAN (MIN/MAX)| MMT_LN_ID |     1 |     8 |    22   (0)| 00:00:01 |
-----------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   3 - filter("ID"=:B1)

Spot on with the estimate for the simple range scan – but what did we do wrong with the estimate for the min/max scan ? You might notice in the first example the “table access by rowid batched” and realise that this is running on 12c. Here’s the plan if I get if I set the optimizer_features_enable back to 11.2.0.3 before running the second query again:


select  /*+ index(t(line_nr, id)) */  min(line_nr) from  min_max_test t where  id = :b1
-----------------------------------------------------------------------------------------
| Id  | Operation                   | Name      | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |           |       |       |   136 (100)|          |
|   1 |  SORT AGGREGATE             |           |     1 |     8 |            |          |
|   2 |   FIRST ROW                 |           |     1 |     8 |   136   (1)| 00:00:01 |
|*  3 |    INDEX FULL SCAN (MIN/MAX)| MMT_LN_ID |     1 |     8 |   136   (1)| 00:00:01 |
-----------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   3 - filter("ID"=:B1)

Using the 11.2.0.3 optimizer model the plan has a cost that’s very close to our prediction – we’ll see why there’s a slight difference in a moment. If we set the optimizer_features_enable to 11.2.0.4 the cost drops back to 22. So for our example 11.2.0.3 will use the simple “index range scan” and an upgrade to 11.2.0.4 (or higher) will switch to the “index full scan (min/max)”. If you look at the OTN posting the impact of the change in costing is exactly the other way around – 11.2.0.3 uses the min/max path, 11.2.0.4 uses the simple index range scan.

The techy bit

You really don’t need to know this – experimenting with the optimizer_features_enable (or _fix_control) will give you plans that show you all the numbers you need to see to check whether or not you’ve run into this particular problem – but if you’re interested here’s a little bit from the two 10053 trace files. We need only look at a few critical lines. From the 11.2.0.3 costing for the min/max scan:


Index Stats::
  Index: MMT_ID  Col#: 1
  LVLS: 2  #LB: 2202  #DK: 50000  LB/K: 1.00  DB/K: 20.00  CLUF: 1000000.00  NRW: 1000000.00
  Index: MMT_LN_ID  Col#: 2 1
  LVLS: 2  #LB: 2618  #DK: 1000000  LB/K: 1.00  DB/K: 1.00  CLUF: 125000.00  NRW: 1000000.00

SINGLE TABLE ACCESS PATH
  Single Table Cardinality Estimation for MIN_MAX_TEST[T]
  Column (#1): ID(NUMBER)
    AvgLen: 5 NDV: 50536 Nulls: 0 Density: 0.000020 Min: 1.000000 Max: 50000.000000
  Table: MIN_MAX_TEST  Alias: T
    Card: Original: 1000000.000000  Rounded: 20  Computed: 19.787874  Non Adjusted: 19.787874

 ****** Costing Index MMT_LN_ID
  Access Path: index (Min/Max)
    Index: MMT_LN_ID
    resc_io: 135.000000  resc_cpu: 961594
    ix_sel: 1.000000  ix_sel_with_filters: 1.9788e-05
    Cost: 135.697679  Resp: 135.697679  Degree: 1

I was running 12.1.0.2 so there were a few extra bits and pieces that I’ve deleted (mostly about SQL Plan Directives and in-memory). Critically we can see that the stats collection has a small error for the ID column – 50,536 distinct values (NDV) instead of exactly 50,000. This seems to have given us a cost for the expected index range of: 2 (blevel) + ceiling(2618 (leaf blocks) * 50536 / 1000000) = 2 + ceil(132.3) = 135, to which we add a bit for the CPU and get to 136. (Q.E.D.)

Then we switch to costing for 11.2.0.4:


SINGLE TABLE ACCESS PATH
  Single Table Cardinality Estimation for MIN_MAX_TEST[T]
  Column (#1): ID(NUMBER)
    AvgLen: 5 NDV: 50536 Nulls: 0 Density: 0.000020 Min: 1.000000 Max: 50000.000000
  Table: MIN_MAX_TEST  Alias: T
    Card: Original: 1000000.000000  Rounded: 20  Computed: 19.787874  Non Adjusted: 19.787874

 ****** Costing Index MMT_LN_ID
  Access Path: index (Min/Max)
    Index: MMT_LN_ID
    resc_io: 21.787874  resc_cpu: 156872
    ix_sel: 1.000000  ix_sel_with_filters: 1.9788e-05
    Cost: 22.324608  Resp: 22.324608  Degree: 1

We still have the small error in the number of distinct values for id, so the estimated number of rows that we need to access from the table for a given id (before “aggregating” to find its minimum line_nr) is 19.787874 (Computed: / Non Adjusted:) rather than exactly 20. Notice, then, that the cost of using the index is 19.787874 + 2 which looks suspiciously like adding the blevel to the number of table blocks to get a cost and forgetting that we might have to kiss a lot of frogs before we find the prince. Basically, in this example at least, it looks like the costing algorithm has NOTHING to do with the mechanics of what actually has to happen at run-time.

Footnote

This is only an initial probe into what’s going on with the min/max scan; there are plenty more patterns of data that would need to be tested before we could have any confidence that we had produced a generic model of how the optimizer does its calculations – the only thing to note so far is that there IS a big change as  you move from 11.2.0.3 to later versions: the case on OTN showed the min/max scan disappearing on the upgrade, the example above shows the min/max disappearing on the downgrade – either change could be bad news for parts of a production system.

There are a couple of related bugs that might also be worth reviewing.

  • Bug 11834402 : CBO CHOOSES A SLOW INDEX FULL SCAN OVER A MUCH FASTER INDEX RANGE SCAN
  • Bug 13430622 : INDEX SCAN IN VERY SLOW FOR ONE PREDICATE AND FAST FOR OTHERS

There is a note, though that this last bug was fixed in 12.1

Footnote 2

When experimenting, one idea to pursue as the models get more complex and you’re using indexes with more than two columns is to test whether the presence of carefully chosen column group statistics might make a difference to the optimizer’s estimates of cardinality (hence cost) of the min/max scan.

 

March 9, 2017

Join Elimination

Filed under: CBO,Execution plans,Infrastructure,Oracle — Jonathan Lewis @ 6:39 pm GMT Mar 9,2017

A question has just appeared on OTN describing a problem where code that works in 11g doesn’t work in 12c (exact versions not specified). The code in question is a C-based wrapper for some SQL, and the problem is a buffer overflow problem. The query supplied is as follows:


select T1.C1 from T1, T2 where T1.C1 = T2.D1;

The problem is that this works in 11g where the receiving (C) variable is declared as

char myBuffer [31];

but it doesn’t work in 12c unless the receiving variable is declared as:

char myBuffer [51];

There’s an important bit of background information that might be giving us a clue about what’s happened (although what I’m about to describe isn’t actually the problem unless the SQL provided is a simplified version of the problem SQL that is expected to display the problem). Column C1 is defined as char(30) and column D1 is defined as char(50). Here’s some sample code showing why you might need a buffer of 50+1 bytes to hold something that ought to be 30+1 bytes long. (This may be nothing to do with the anomaly described in the original posting – it’s just something I thought of when I first saw the question.)


rem     Script:         join_elimination_oddity.sql
rem     Author:         Jonathan Lewis
rem     Dated:          Mar 2017

create table t1(
        c30     char(30) primary key
);

create table t2(
        d50     char(50) references t1
);

begin
        dbms_stats.gather_table_stats(
                ownname          => user,
                tabname          =>'T1',
                method_opt       => 'for all columns size 1'
        );

        dbms_stats.gather_table_stats(
                ownname          => user,
                tabname          =>'T2',
                method_opt       => 'for all columns size 1'
        );
end;
/

explain plan for
select
        t1.c30
from
        t1, t2
where
        t1.c30 = t2.d50
;

select * from table(dbms_xplan.display(null,null,'projection'));

So we’re selecting c30 – the 30 byte character column – from t1; what do we actually get ? Here’s the plan with the projection:


--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      |     1 |    52 |     1   (0)| 00:00:01 |
|*  1 |  TABLE ACCESS FULL| T2   |     1 |    52 |     1   (0)| 00:00:01 |
--------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter("T2"."D50" IS NOT NULL)

Column Projection Information (identified by operation id):
-----------------------------------------------------------
   1 - "T2"."D50"[CHARACTER,50]

Table t1 has been eliminated and the projected column is the “equivalent” column from t2 – which is too long for the expected output. To work around this problem you can disable join elimination either by parameter (_optimizer_join_elimination_enabled=false) or by hinting /*+ no_eliminate_join(t1) */ in which case the plan (with my data) became a nested loop join from t2 to t1 with column c1 projected as expected.

Footnote:

Two things to note about my demonstration

  • If you’re going to create a referential integrity constraint between columns they do need to be of exactly the same type. (If you want to get technical you might say “in the same domain” – but Oracle doesn’t do domains.)
  • This extremely simple case demonstrates the problem in 11.2.0.4 as well as 12.1.0.2. Possibly a more complex query could be produced where (thanks to limitations in query transformations) 11g doesn’t spot the option for join elimination while 12c does; alternatively, a very simple two-column example in 11g won’t do join elimination while a two-column example in 12.2 can (though it doesn’t always) – so upgrading to 12.2 MIGHT cause more people to see this anomaly appearing.

Update (14/3/17)

A follow-up to the original posting has identified source of the problem. Rather than a side effect of join elimination the anomaly has appeared because of the “partition join evaluation” mechanism introduced in 12c. There is a hint to disable the mechanism /*+ no_partial_join(alias) */ or – if you can’t get hints into place for all the necessary code – it would be possible to disable the feature completely through the use of a hidden parameter “_optimizer_partial_join_eval”=false.  The usual rule of checking with Oracle support before setting hidden parameter applies, of course.

 

March 1, 2017

I don’t know (yet)

Filed under: CBO,Execution plans,Oracle,Parallel Execution — Jonathan Lewis @ 12:53 pm GMT Mar 1,2017

Here’s a question that came to mind while reading a recent question on the OTN database forum. It’s a question to which I don’t know the answer and, at present, I don’t really want to bother modelling at present – although if I were on a customer site and this looked like a likely explanation for a performance anomaly it’s the sort of thing I would create a model for.

If I have a query that runs parallel and does a “hash join buffered” (see, for example, this URL), it’s possible that the processes creating the build table will manage to create the build table in memory without spilling to disc but then find themselves unable to keep the incoming probe table data in memory and spill it to disc before re-reading it to do the join and forward the results to the parent process.

Here’s the plan from the URL above, showing rowsource execution stats:


------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation               | Name     | Starts | E-Time   |    TQ  |IN-OUT| PQ Distrib | A-Rows |   A-Time   | Buffers | Reads  | Writes |  OMem |  1Mem |  O/1/M   | Max-Tmp |
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT        |          |      1 |          |        |      |            |  70101 |00:00:00.17 |      12 |      0 |      0 |       |       |          |         |
|   1 |  PX COORDINATOR         |          |      1 |          |        |      |            |  70101 |00:00:00.17 |      12 |      0 |      0 |       |       |          |         |
|   2 |   PX SEND QC (RANDOM)   | :TQ10002 |      0 | 00:00:01 |  Q1,02 | P->S | QC (RAND)  |      0 |00:00:00.01 |       0 |      0 |      0 |       |       |          |         |
|*  3 |    HASH JOIN BUFFERED   |          |      2 | 00:00:01 |  Q1,02 | PCWP |            |  70101 |00:00:00.28 |       0 |    310 |    310 |  5952K|  1953K|     2/0/0|    2048 |
|   4 |     PX RECEIVE          |          |      2 | 00:00:01 |  Q1,02 | PCWP |            |  70101 |00:00:00.08 |       0 |      0 |      0 |       |       |          |         |
|   5 |      PX SEND HASH       | :TQ10000 |      0 | 00:00:01 |  Q1,00 | P->P | HASH       |      0 |00:00:00.01 |       0 |      0 |      0 |       |       |          |         |
|   6 |       PX BLOCK ITERATOR |          |      2 | 00:00:01 |  Q1,00 | PCWC |            |  70101 |00:00:00.05 |    1031 |   1005 |      0 |       |       |          |         |
|*  7 |        TABLE ACCESS FULL| T1       |     26 | 00:00:01 |  Q1,00 | PCWP |            |  70101 |00:00:00.02 |    1031 |   1005 |      0 |       |       |          |         |
|   8 |     PX RECEIVE          |          |      2 | 00:00:01 |  Q1,02 | PCWP |            |  70102 |00:00:00.06 |       0 |      0 |      0 |       |       |          |         |
|   9 |      PX SEND HASH       | :TQ10001 |      0 | 00:00:01 |  Q1,01 | P->P | HASH       |      0 |00:00:00.01 |       0 |      0 |      0 |       |       |          |         |
|  10 |       PX BLOCK ITERATOR |          |      2 | 00:00:01 |  Q1,01 | PCWC |            |  70102 |00:00:00.04 |    1031 |   1005 |      0 |       |       |          |         |
|* 11 |        TABLE ACCESS FULL| T2       |     26 | 00:00:01 |  Q1,01 | PCWP |            |  70102 |00:00:00.01 |    1031 |   1005 |      0 |       |       |          |         |
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Note that we have a “hash join buffered” at operation 3, which means incoming data from t2 (the probe table) was buffered before being joined; but we can see 310 writes and reads to temporary space due to that operation. That’s actually t2 spilling from the buffer to disc and then being read back from disc after the t2 scan (at operation 11) completes.

If this pattern of activity appears how is it reflected in the statistics?

Because the build table was completely built in memory you could argue for an “optimal workarea execution”; but because the probe table was temporarily spilled to disc you could argue for a “one-pass workarea operation”. In this case I was able to say that the writes and reads for operation 3 were the t2 table spilling to disc  because (a) I understand (mostly) how parallel hash joins work, (b) I had trace files I could check for timing, and (c) the O/1/M column of the output show 2 optimal operations and no one-pass operations.

The question to which I don’t (yet) know the answer is this: is the I/O caused by this spill to disc captured in v$pgastat under the stats:

NAME                                              VALUE UNIT
-------------------------------------------- ---------- ------------
extra bytes read/written                              0 bytes
cache hit percentage                                100 percent

If so, then you end up with questions like the one on OTN:

How come OEM reports my PGA cache Hit Ratio at 76% when I’ve got so much more memory than seems necessary ?

With the follow-up comment:

All my workarea executions were optimal

Footnote

Though I haven’t examined it recently, I remember noting in the past that the buffer allocation for the incoming probe table was typically a small fraction (often 1/8th or 1/16th) of the size of the buffer allocated for the build table. This may help to explain why you could end up doing a lot of I/O without exhausting the available memory.

I’m probably going to resurrect the example from the linked blog note and check the effects before I publish this post.

 

February 28, 2017

Cost is Time (again)

Filed under: CBO,Execution plans,Oracle — Jonathan Lewis @ 11:19 am GMT Feb 28,2017

The hoary old question about lower cost queries running faster or slower that higher cost queries has appeared once again on the OTN database forum. It’s one I’ve addressed numerous times in the past – including on this blog – but the Internet being what it is the signal keeps getting swamped by the noise. This time around a couple of “new” thoughts crossed my mind when reading the question.

There is a Time column on the standard forms of the execution plan output, and the description of this column is available in the manuals and has been for years (here’s a definition from v$sql_plan from 10gR2, for example):

Elapsed time (in seconds) of the operation as estimated by the optimizer’s cost-based approach. For statements that use the rule-based approach, this column is null.

So the first question is this: why are people looking at the cost when they’re asking about the time ? The second question arises from the bit in brackets (parentheses): the time is given in seconds – so how accurate do you think the optimizer’s estimates of ANYTHING are when the best estimate the optimizer will give you for run-time has a granularity of a second ?

Of course there’s a further observation I could make (which only echoes the first question):  I don’t think I’ve ever seen anyone come up with the question: “Will a query with a lower value for Time run faster or slower than a query with a higher value for Time?”

Bottom Line:

Cost is supposed to be a measure of resource usage (per execution of each operation) and should therefore be a measure of time – but the model fails in many ways so when a plan clearly doesn’t meet reasonable expectations for performance you can (often) use the Cost column as an indicator of where the model has failed and this may give you some clues of how to address the problem.

It is unfortunate that before you can recognise when a particular Cost figure is bad you usually need to know something about the data content, the data distribution  pattern, the run-time caching effects, and the way the optimizer does its arithmetic.

 

February 13, 2017

Band Join 12c

Filed under: 12c,Execution plans,Oracle,Performance,Upgrades — Jonathan Lewis @ 1:53 pm GMT Feb 13,2017

One of the optimizer enhancements that appeared in 12.2 for SQL is the “band join”. that makes certain types of merge join much more  efficient.  Consider the following query (I’ll supply the SQL to create the demonstration at the end of the posting) which joins two tables of 10,000 rows each using a “between” predicate on a column which (just to make it easy to understand the size of the result set)  happens to be unique with sequential values though there’s no index or constraint in place:

select
        t1.v1, t2.v1
from
        t1, t2
where
        t2.id between t1.id - 1
                  and t1.id + 2
;

This query returns nearly 40,000 rows. Except for the values at the extreme ends of the range each of the 10,000 rows in t2 will join to 4 rows in t1 thanks to the simple sequential nature of the data. In 12.2 the query, with rowsource execution stats enabled, completed in 1.48 seconds. In 12.1.0.2 the query, with rowsource execution stats OFF, took a little over 14 seconds. (With rowsource execution stats enabled it took 12.1.0.2 a little over 1 minute to return the first 5% of the data – I didn’t bother to wait for the rest, though the rate would have improved over time.)

Here are the two execution plans – spot the critical difference:


12.1.0.2
-----------------------------------------------------------------------------
| Id  | Operation            | Name | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------
|   0 | SELECT STATEMENT     |      |    25M|   715M|  1058  (96)| 00:00:01 |
|   1 |  MERGE JOIN          |      |    25M|   715M|  1058  (96)| 00:00:01 |
|   2 |   SORT JOIN          |      | 10000 |   146K|    29  (11)| 00:00:01 |
|   3 |    TABLE ACCESS FULL | T1   | 10000 |   146K|    27   (4)| 00:00:01 |
|*  4 |   FILTER             |      |       |       |            |          |
|*  5 |    SORT JOIN         |      | 10000 |   146K|    29  (11)| 00:00:01 |
|   6 |     TABLE ACCESS FULL| T2   | 10000 |   146K|    27   (4)| 00:00:01 |
-----------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   4 - filter("T2"."ID"<="T1"."ID"+2)   -- > had to add GT here to stop WordPress spoiling the format 
   5 - access("T2"."ID">="T1"."ID"-1)
       filter("T2"."ID">="T1"."ID"-1)

12.2.0.1
----------------------------------------------------------------------------
| Id  | Operation           | Name | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------
|   0 | SELECT STATEMENT    |      | 40000 |  1171K|    54  (12)| 00:00:01 |
|   1 |  MERGE JOIN         |      | 40000 |  1171K|    54  (12)| 00:00:01 |
|   2 |   SORT JOIN         |      | 10000 |   146K|    27  (12)| 00:00:01 |
|   3 |    TABLE ACCESS FULL| T1   | 10000 |   146K|    25   (4)| 00:00:01 |
|*  4 |   SORT JOIN         |      | 10000 |   146K|    27  (12)| 00:00:01 |
|   5 |    TABLE ACCESS FULL| T2   | 10000 |   146K|    25   (4)| 00:00:01 |
----------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   4 - access("T2"."ID">="T1"."ID"-1)
       filter("T2"."ID"<="T1"."ID"+2 AND "T2"."ID">="T1"."ID"-1)

Notice how operation 4, the FILTER, that appeared in 12.1 has disappeared in 12.2 and the filter predicate that it used to hold is now part of the filter predicate of the SORT JOIN that has been promoted to operation 4 in the new plan.

As a reminder – the MERGE JOIN operates as follows: for each row returned by the SORT JOIN at operation 2 it calls operation 4. In 12.1 this example will then call operation 5 so the SORT JOIN there happens 10,000 times. It’s important to know, though, that the name of the operation is misleading; what’s really happening is that Oracle is “probing a sorted result set in local memory” 10,000 times – it’s only on the first probe that it finds it has to call operation 6 to read and move the data into local memory in sorted order.

So in 12.1 operation 5 probes (accesses) the in-memory data set starting at the point where t2.id >= t1.id – 1; I believe there’s an optimisation here because Oracle will recall where it started the probe last time and resume searching from that point; having found the first point in the in-memory set where the access predicate it true Oracle will walk through the list passing each row back to the FILTER operation as long as the access predicate is still true, and it will be true right up until the end of the list. As each row arrives at the FILTER operation Oracle checks to see if the filter predicate there is true and passes the row up to the MERGE JOIN operation if it is. We know that on each cycle the FILTER operation will start returning false after receiving 4 rows from SORT JOIN operation – Oracle doesn’t.  On average the SORT JOIN operation will send 5,000 rows to the FILTER operation (for a total of 50,000,000 values passed and discarded).

In 12.2, and for the special case here where the join predicate uses constants to define the range, Oracle has re-engineered the code to eliminate the FILTER operation and to test both parts of the between clause in the same subroutine it uses to probe and scan the rowsource. In 12.2 the SORT JOIN operation will pass 4 rows up to the MERGE JOIN operation and stop scanning on the fifth row it reaches. In my examples that’s an enormous (CPU) saving in subroutine calls and redundant tests.

Footnote:

This “band-join” mechanism only applies when the range is defined by constants (whether literal or bind variable). It doesn’t work with predicates like (e.g.):

where t2.id between t1.id - t1.step_back and t1.id + t1.step_forward

The astonishing difference in performance due to enabling rowsource execution statistics is basically due to the number of subroutine calls eliminated – I believe (subject to a hidden parameter that controls a “sampling frequency”) that Oracle will call the O/S clock twice each time it calls the second SORT JOIN operation from the FILTER operation to acquire the next row. In 12.1 we’re doing roughly 50M redundant calls to that SORT JOIN.

The dramatic difference in performance even when rowsource execution statistics isn’t enabled is probably something you won’t see very often in a production system – after all, I engineered a fairly extreme data set and query for the purposes of demonstration. Note, however, the band join does seemt to introduce a change in cost, so it’s possible that on the upgrade you may find a few cases where the optimizer will switch from a nested loop join to a merge join using a band-join.

January 10, 2017

Join Elimination 12.2

Filed under: 12c,Bugs,Execution plans,Oracle — Jonathan Lewis @ 1:03 pm GMT Jan 10,2017

From time to time someone comes up with the question about whether or not the order of tables in the from clause of a SQL statement should make a difference to execution plans and performance. Broadly speaking the answer is no, although there are a couple of boundary cases were a difference can appear unexpectedly.

When considering join permutations the optimizer has a few algorithms for picking an initial join order and then deciding how to permute from that order, and one of the criteria with the very lowest priority (i.e. when all other factors are equal) is dictated by the order the tables appear in the from clause so if you have enough tables in the from clause it’s possible for the subset of join orders considered to change if you change the from clause in a way that causes the initial join order to change.

It’s been over 11 years since I wrote the article I’ve linked to in the previous paragraph and in that time no-one has yet approached me with other examples of a plan changing due to a change in the from clause order (though, with all the transformations now available to the optimizer, I wouldn’t be surprised if a few cases have appeared occasionally, so feel free to let me know if you think you’ve got an interesting example that I can experiment on).

A little while ago, though, while testing a feature enhancement in 12.2, I finally came across a case where a real difference appeared. Here’s the query I was using – I’ll give you the SQL to reproduce the tables at the end of the article:


select 
	count(c.small_vc_c)
from 
	grandparent	g, 
	parent		p,
	child		c
where
	c.small_num_c between 200 and 215
and	p.id   = c.id_p
and	p.id_g = c.id_g
and	g.id   = p.id_g
;

As you will see later on the three tables grandparent, parent, child have the obvious primary keys and referential integrity constraints. This means that grandparent has a single-column primary key, parent has a two-column primary key, and child has a three-column primary key. The query joins the three tables along their primary keys and then selects data only from the child table, so it’s a good candidate for join elimination.

In earlier versions of Oracle join elimination could take place only if the primary key you joined to was a single column key, so 12.1 and earlier would be able to eliminate just the grandparent from this three-table join; but in 12.2 multi-column primary keys also allow join elimination to take place, so we might hope that the plan we get from this query would eliminate both the grandparent and parent tables. Here’s the plan (pulled from memory after execution):

SQL_ID  8hdybjpn2884b, child number 0
-------------------------------------
select  count(c.small_vc_c) from  grandparent g,  parent  p,  child  c
where  c.small_num_c between 200 and 215 and p.id   = c.id_p and p.id_g
= c.id_g and g.id   = p.id_g

Plan hash value: 4120004759

-----------------------------------------------------------------------------
| Id  | Operation           | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------
|   0 | SELECT STATEMENT    |       |       |       |    26 (100)|          |
|   1 |  SORT AGGREGATE     |       |     1 |    23 |            |          |
|   2 |   NESTED LOOPS      |       |    85 |  1955 |    26   (4)| 00:00:01 |
|*  3 |    TABLE ACCESS FULL| CHILD |    85 |  1615 |    26   (4)| 00:00:01 |
|*  4 |    INDEX UNIQUE SCAN| G_PK  |     1 |     4 |     0   (0)|          |
-----------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   3 - filter(("C"."SMALL_NUM_C"<=215 AND "C"."SMALL_NUM_C">=200))
   4 - access("G"."ID"="C"."ID_G")

It didn’t work quite as expected. The optimizer has managed to eliminate table parent – so that looks like “single column primary key” join elimination has worked, but “multi-column” join elimination hasn’t appeared. On the other hand, I’ve not followed my usual rules for writing SQL so let’s try again. If I follow the pattern I usually follow, my from clause would have been in the order child  -> parent -> grandparent – listing the tables in the order I expect to visit them. Here’s the plan – again pulled from memory – after making this visual change the SQL:


SQL_ID  1uuq5vf4bq0gt, child number 0
-------------------------------------
select  count(c.small_vc_c) from  child  c,  parent  p,  grandparent g
where  c.small_num_c between 200 and 215 and p.id   = c.id_p and p.id_g
= c.id_g and g.id   = p.id_g

Plan hash value: 1546491375

----------------------------------------------------------------------------
| Id  | Operation          | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |       |       |       |    26 (100)|          |
|   1 |  SORT AGGREGATE    |       |     1 |    15 |            |          |
|*  2 |   TABLE ACCESS FULL| CHILD |    85 |  1275 |    26   (4)| 00:00:01 |
----------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - filter(("C"."SMALL_NUM_C"<=215 AND "C"."SMALL_NUM_C">=200))

So join elimination based on multi-column primary keys does work – but you might have to get a bit lucky in the order you list the tables in the from clause.

Footnote.

If you’re wondering whether or not switching from Oracle syntax to ANSI syntax would make a difference, it does: with ANSI syntax both grandparent and parent are eliminated if the SQL lists the tables in the order grandparent -> parent -> child (i.e. the order which doesn’t work properly for Oracle syntax) and only the parent is eliminated for the order child -> parent -> grandparent. In other words, both syntax options have a point of failure but they fail the opposite way around.

Code:


rem
rem	Script:		join_elimination_12c2.sql
rem	Author:		Jonathan Lewis
rem	

-- Environment details eliminated

define m_pad=100

/*
	IDs will be 1 to 1000
*/

create table grandparent 
as
select 
	rownum			id,
	trunc((rownum-1)/5)	small_num_g,
	rpad(rownum,10)		small_vc_g,
	rpad(rownum,&m_pad)	padding_g
from 
	all_objects 
where 
	rownum <= 1000
;

/*
	Each GP has two (scattered) children here
	Parent IDs are 1 to 2,000
*/

create table parent 
as
select 
	1+mod(rownum,1000)	id_g,
	rownum			id,
	trunc((rownum-1)/5)	small_num_p,
	rpad(rownum,10)		small_vc_p,
	rpad(rownum,&m_pad)	padding_p
from 
	all_objects 
where 
	rownum <= 2000
;

/*
	Simple trick to get 5 (clustered) children per parent
	Child IDs are 1 to 12,000
*/

create table child 
as
select 
	id_g,
	id			id_p,
	rownum			id,
	trunc((rownum-1)/5)	small_num_c,
	rpad(rownum,10)		small_vc_c,
	rpad(rownum,&m_pad)	padding_c
from 
	parent	p,
	(
		select /*+ no_merge */ 
			rownum 
		from	parent p 
		where	rownum <= 5
	)	d
;

create unique index g_pk on grandparent(id);
create unique index p_pk on parent     (id_g, id)       compress 1;
create unique index c_pk on child      (id_g, id_p, id) compress 2;

alter table grandparent add constraint g_pk primary key (id);
alter table parent      add constraint p_pk primary key (id_g, id);
alter table child       add constraint c_pk primary key (id_g, id_p, id);

alter table parent add constraint p_fk_g foreign key (id_g)       references grandparent;
alter table child  add constraint c_fk_p foreign key (id_g, id_p) references parent;

rem
rem	Don't need to collect stats because it's 12c
rem

prompt	===============================================================
prompt	Join all three tables with the FROM clause ordered gp -> p -> c
prompt	The final plan is GP->C, The optimizer eliminated P before
prompt	considering GP
prompt	===============================================================

select 
	count(c.small_vc_c)
from 
	grandparent	g, 
	parent		p,
	child		c
where
	c.small_num_c between 200 and 215
and	p.id   = c.id_p
and	p.id_g = c.id_g
and	g.id   = p.id_g
;

select * from table(dbms_xplan.display_cursor(null,null,'outline'));

prompt	===============================================================
prompt	Join all three tables with the FROM clause ordered c -> p -> gp
prompt	The final plan is a tablescan of C only. The optimizer managed 
prompt	to eliminate GP first and P second
prompt	===============================================================

select 
	count(c.small_vc_c)
from 
	child		c,
	parent		p,
	grandparent	g 
where
	c.small_num_c between 200 and 215
and	p.id   = c.id_p
and	p.id_g = c.id_g
and	g.id   = p.id_g
;

select * from table(dbms_xplan.display_cursor(null,null,'outline'));

prompt	==================================================
prompt	Convert to ANSI standard in the order gp -> p -> c
prompt	and both gp and p eliminated.
prompt	==================================================

select 
	count(c.small_vc_c)
from 
	grandparent	g
join
	parent		p
on	p.id_g = g.id
join
	child		c
on	c.id_p = p.id
and	c.id_g = p.id_g
where
	c.small_num_c between 200 and 215
;

select * from table(dbms_xplan.display_cursor(null,null,'outline'));

prompt	===================================================
prompt	Convert to ANSI standard in the order c -> p -> gp
prompt	and only p is eliminated. 
prompt	===================================================

select 
	count(c.small_vc_c)
from 
	child		c
join
	parent		p
on      p.id   = c.id_p 
and	p.id_g = c.id_g 
join
	grandparent	g
on	g.id = p.id_g 
where
	c.small_num_c between 200 and 215
;

select * from table(dbms_xplan.display_cursor(null,null,'outline'));

It’s possible, of course, that with different system stats, or I/O calibration, or extent sizes, or segment space management, or block sizes, sundry other parameter details that you won’t be able to reproduce the results without messing about a little bit, but I don’t think I’ve done anything special in the setup that would make a real difference.

Footnote:

If you’re wondering why the “traditional” and “ANSI” syntax should exhibit this flaw for joins in the opposite direction – remember that ANSI SQL is first transformed into an equivalent Oracle form and – in the simple cases – the first two tables form the first query block then each table after that introduces a new query block, so the optimizer strategy does (approximately) the following translation:


select ... from grandparent join parent join child

==>

select ... from (select ... from grandparent join parent) join child

The optimizer then optimizes the inline query, which eliminates grandparent leaving a join between parent and child, which then allows parent to be eliminated.

Conversely we get:

select ... form child join parent join grandparent 

==>

select ... from (select ... from child join parent) join grandparent

In this form the optimizer eliminates parent from the inline view and is left with a join between child and grandparent – so no further elimination.

November 7, 2016

Filter Subquery

Filed under: Execution plans,Oracle,subqueries — Jonathan Lewis @ 1:04 pm GMT Nov 7,2016

There’s a current thread on the OTN database forum showing an execution plan with a slightly unusual feature. It looks like this:

-----------------------------------------------------------------------------------------------------------------------------------  
| Id  | Operation                                |  Name                          | Rows  | Bytes |TempSpc| Cost  | Pstart| Pstop |  
-----------------------------------------------------------------------------------------------------------------------------------  
|   0 | SELECT STATEMENT                         |                                |   137K|    27M|       |   134K|       |       |  
|*  1 |  HASH JOIN                               |                                |   137K|    27M|    27M|   134K|       |       |  
|*  2 |   HASH JOIN                              |                                |   140K|    26M|  1293M|   133K|       |       |  
|   3 |    TABLE ACCESS FULL                     | PDTCOST_CHARGE_MAP             |    30M|   948M|       | 24044 |       |       |  
|*  4 |    HASH JOIN                             |                                |    11M|  1837M|   810M| 57206 |       |       |  
|   5 |     INDEX FAST FULL SCAN                 | PDTCOST_BILL_INV_TRACK         |    29M|   475M|       | 16107 |       |       |  
|*  6 |     TABLE ACCESS BY LOCAL INDEX ROWID    | BILL_INVOICE_DETAIL            |  5840K|   478M|       |     2 |       |       |  
|   7 |      NESTED LOOPS                        |                                |    11M|  1634M|       |     6 |       |       |  
|   8 |       NESTED LOOPS                       |                                |     2 |   120 |       |     3 |       |       |  
|   9 |        TABLE ACCESS FULL                 | JDL_WORK_LIST                  |     2 |    96 |       |     2 |       |       |  
|  10 |        PARTITION RANGE ITERATOR          |                                |       |       |       |       |   KEY |   KEY |  
|  11 |         TABLE ACCESS BY LOCAL INDEX ROWID| BILL_INVOICE                   |     1 |    12 |       |     1 |   KEY |   KEY |  
|* 12 |          INDEX UNIQUE SCAN               | BILL_INVOICE_XSUM_BILL_REF_NO  |     1 |       |       |       |   KEY |   KEY |  
|  13 |       PARTITION RANGE ITERATOR           |                                |       |       |       |       |   KEY |   KEY |  
|* 14 |        INDEX RANGE SCAN                  | BILL_INVOICE_DETAIL_PK         |    32 |       |       |     1 |   KEY |   KEY |  
|  15 |    SORT AGGREGATE                        |                                |     1 |     8 |       |       |       |       |  
|  16 |     INDEX FAST FULL SCAN                 | PDTCOST_CHARGE_MAP_PK          |    30M|   229M|       | 17498 |       |       |  
|  17 |   INDEX FAST FULL SCAN                   | SERVICE_EMF_CONF_SUBSCR        |  1660K|    19M|       |   575 |       |       |  
-----------------------------------------------------------------------------------------------------------------------------------  

Spot the oddity ? If not, here’s a collapsed version of the plan that makes it easier to see – if you were viewing this plan through OEM or one of the other GUI interfaces to execution plans you’d probably be able to do this by clicking on some sort of  “+/-”  symbol by operation 4:

-----------------------------------------------------------------------------------------------------------------------------------  
| Id  | Operation                                |  Name                          | Rows  | Bytes |TempSpc| Cost  | Pstart| Pstop |  
-----------------------------------------------------------------------------------------------------------------------------------  
|   0 | SELECT STATEMENT                         |                                |   137K|    27M|       |   134K|       |       |  
|*  1 |  HASH JOIN                               |                                |   137K|    27M|    27M|   134K|       |       |  
|*  2 |   HASH JOIN                              |                                |   140K|    26M|  1293M|   133K|       |       |  
|   3 |    TABLE ACCESS FULL                     | PDTCOST_CHARGE_MAP             |    30M|   948M|       | 24044 |       |       |  
|*  4 |    HASH JOIN                             |                                |    11M|  1837M|   810M| 57206 |       |       |  
|  15 |    SORT AGGREGATE                        |                                |     1 |     8 |       |       |       |       |  
|  16 |     INDEX FAST FULL SCAN                 | PDTCOST_CHARGE_MAP_PK          |    30M|   229M|       | 17498 |       |       |  
|  17 |   INDEX FAST FULL SCAN                   | SERVICE_EMF_CONF_SUBSCR        |  1660K|    19M|       |   575 |       |       |  
-----------------------------------------------------------------------------------------------------------------------------------  

How often have you seen a HASH JOIN (operation 2) with three child operations (3, 4, 15) ?

It’s not a formatting error – but since I’ve shown neither the Predicate section of the report nor the original query it’s a little difficult to recognise what’s going on, so here’s the critical part of the original WHERE clause:

AND     P.TRACKING_ID      = PCM.TRACKING_ID  
AND     P.TRACKING_ID_SERV = PCM.TRACKING_ID_SERV  
AND     (   (P.BILLING_INACTIVE_DT IS NULL AND PCM.INACTIVE_DT IS NULL)  
         OR (PCM.ACTIVE_DT = (SELECT MAX(ACTIVE_DT) FROM PDTCOST_CHARGE_MAP PCM1 ))
        )
;  

Operation 4 produces a set of rows derived by joining table P (an alias for pdtcost) to a couple of other tables, and operation 2 joins this to PCM (an alias for pdtcost_change_map) with a simple two-column equality and then introduces a pair of problems: first an “OR SUBQUERY” construct, secondly a predicate that requires data from both tables to be examined before any more rows can be discarded.

Just to clarify the performance implication of this combination of predicates:

If we start from pdtcost (p):

  • If the billing_inactive_dt is null we don’t discard it because satisfires a predicate and we need to check the matching pcm.inactive_dt.
  • If the billing_inactive_dt is NOT null we still can’t discard it because the matching pcm.active_dt may satisfy the subquery predicate.
  • Whatever the state of billing_inactive_dt we have to find the matching pcm row(s)

Starting from pdtcost_charge_map (pcm):

  • We can’t unnest the subquery and use it to drive into p (because of the OR), so we have to scan pcm to apply the subquery.
  • If the active_dt satisfies the subquery we have to find the matching p row.
  • If the active_dt doesn’t satisfy the subquery but pcm_inactive_dt is null we still have to find the matching p row to check the billing_dt.
  • The only time we don’t need to probe p for a match is if the active_dt doesn’t match the subquery and the inactive_dt is not null – which tells us that for a very specific data pattern we have the potential for a (relatively) efficient access path; however this path would require the optimizer to test one part of an OR predicate at one operation in the plan and the second part of the OR predicate at a different operation of the plan and it’s not programmed to do that, so the entire compound predicate test is always run late.

Returning to the question of interpreting this plan with three child operations for a hash join – what does it mean and how does it work ? In effect the plan is the wrong shape – it has concealed a filter operation.  As the join between the two tables takes place the rows are tested against the simple filter condition – each row that satisfies the predicate is passed to the next operation of the plan; for any row doesn’t satisfy the simple filter predicate the subquery is executed to provide a check against active_dt (fortunately, since this is a “constant” subquery we benefit enormously from scalar subquery caching and the subquery will run a most once in the lifetime of the whole query.)

The plan would probably be easier to understand if it looked like this (which may actually be how it would have looked in Oracle 8i):

-----------------------------------------------------------------------------------------------------------------------------------  
| Id  | Operation                                |  Name                          | Rows  | Bytes |TempSpc| Cost  | Pstart| Pstop |  
-----------------------------------------------------------------------------------------------------------------------------------  
|   0 | SELECT STATEMENT                         |                                |   137K|    27M|       |   134K|       |       |  
|*  1 |  HASH JOIN                               |                                |   137K|    27M|    27M|   134K|       |       |  
|*  2a|   FILTER                                 |                                |   140K|    26M|  1293M|   133K|       |       |  
|*  2b|    HASH JOIN                             |                                |   140K|    26M|  1293M|   133K|       |       |  
|   3 |     TABLE ACCESS FULL                    | PDTCOST_CHARGE_MAP             |    30M|   948M|       | 24044 |       |       |  
|*  4 |     HASH JOIN                            |                                |    11M|  1837M|   810M| 57206 |       |       |  
|  15 |    SORT AGGREGATE                        |                                |     1 |     8 |       |       |       |       |  
|  16 |     INDEX FAST FULL SCAN                 | PDTCOST_CHARGE_MAP_PK          |    30M|   229M|       | 17498 |       |       |  
|  17 |   INDEX FAST FULL SCAN                   | SERVICE_EMF_CONF_SUBSCR        |  1660K|    19M|       |   575 |       |       |  
-----------------------------------------------------------------------------------------------------------------------------------  
 

Predicate Information (identified by operation id):
---------------------------------------------------
   2a - filter("P"."BILLING_INACTIVE_DT" IS NULL AND "PCM"."INACTIVE_DT" IS NULL
               OR "PCM"."ACTIVE_DT"= (SELECT MAX("ACTIVE_DT") FROM "PDTCOST_CHARGE_MAP" "PCM1"))

   2b - access("P"."TRACKING_ID"="PCM"."TRACKING_ID" AND
               "P"."TRACKING_ID_SERV"="PCM"."TRACKING_ID_SERV")

This modified plan makes it clear that the hash join (2b) is followed by execution of the filter (2a) subquery (though we can safely infer that the subquery runs only for join rows where at least one of p.billing_inactive_dt or pcm.inactive_dt is not null).

You might wonder whether Oracle actually runs the subquery once at a very early point in the query so that it can, effectively, turn the subquery predicate into “active_dt = {derived constant}” – it’s fairly easy to show that this isn’t the case. Perhaps the most obvious way to do this is to run the query with rowsource execution stats enabled after setting billing_inactive_dt and inactive_dt to null for every row in their respective tables – because if you do that the subquery won’t be run at all.

If you want to experiment with this problem, here’s some code to model it:


drop table pdtcost purge;
drop table pdtcost_charge_map purge;

create table pdtcost
nologging
as
with generator as (
        select
                rownum id
        from dual
        connect by
                level <= 1e4
)
select
        mod(rownum,100)                 filter_col,
        rownum                          tracking_id,
        rownum                          tracking_id_serv,
        decode(
                mod(rownum,97),
                0 , trunc(sysdate),
                    null
        )                               billing_inactive_dt,
/*
        to_date(null)                   billing_inactive_dt,
*/
        lpad('x',100,'x')               padding
from
        generator       v2
where
        rownum <= 1e4
;

alter table pdtcost add constraint pdt_pk primary key(tracking_id, tracking_id_serv);

create table pdtcost_charge_map
nologging
as
with generator as (
        select
                rownum id
        from dual 
        connect by 
                level <= 1e4
)
select
        rownum                          tracking_id,
        rownum                          tracking_id_serv,
        decode(
                mod(rownum,93), 
                0 , trunc(sysdate),
                    null
        )                               inactive_dt,
/*
        to_date(null)                   inactive_dt,    
*/
        trunc(sysdate + dbms_random.value(-100,0))      active_dt,
        lpad('x',100,'x')               padding
from
        generator       v2
where
        rownum <= 1e4
;

alter table pdtcost_charge_map add constraint pcm_pk primary key(tracking_id, tracking_id_serv, active_dt);
-- create index pcm_act_dt on pdtcost_charge_map(active_dt);

-- gather basic table stats if your version needs to.

select
        p.billing_inactive_dt,
        pcm.inactive_dt,
        pcm.active_dt
from
        pdtcost                 p,
        pdtcost_charge_map      pcm
where
        p.filter_col = 0
and     p.tracking_id      = pcm.tracking_id
and     p.tracking_id_serv = pcm.tracking_id_serv
and     (   (p.billing_inactive_dt is null and pcm.inactive_dt is null)
         or (pcm.active_dt = (select max(active_dt) from pdtcost_charge_map pcm1 ))
        )
;

The original question started with a table of 30 million rows and a result set of only 450 rows – suggesting that there ought to be a lot of scope for finding ways to eliminate data early. One possibility, assuming the appropriate indexes exist (which is why I have defined, but commented out, the pcm_act_dt index above), is to convert this query into a union all (taking care to eliminate duplication in the result set) in the following way:

select
        /*+ leading(p pcm) use_nl(pcm) */
        p.billing_inactive_dt,
        pcm.inactive_dt,
        pcm.active_dt
from
        pdtcost                 p,
        pdtcost_charge_map      pcm
where
        p.filter_col = 0
and     pcm.tracking_id      = p.tracking_id
and     pcm.tracking_id_serv = p.tracking_id_serv
and     (p.billing_inactive_dt is null and pcm.inactive_dt is null)
union all
select
        /*+ leading(p pcm) use_nl(pcm) */
        p.billing_inactive_dt,
        pcm.inactive_dt,
        pcm.active_dt
from
        pdtcost                 p,
        pdtcost_charge_map      pcm
where   
        p.filter_col = 0
and     pcm.tracking_id      = p.tracking_id   
and     pcm.tracking_id_serv = p.tracking_id_serv   
and     (p.billing_inactive_dt is not null or pcm.inactive_dt is not null)
and     pcm.active_dt = (select /*+ unnest */ max(active_dt) from pdtcost_charge_map pcm1)
;

Here is the resulting execution plan when the pcm_act_dt index exists. I had to hint the table order and join mechanism because my tables were rather small and the selectivity relatively high – it’s probably safe to assume that selectivities are much better on the original data set and that a path like this is more likely to be chosen unhinted (the full tablescan on pdtcost is irrelevant in the context of the demonstration):


---------------------------------------------------------------------------------------------------------------
| Id  | Operation                      | Name               | Starts | E-Rows | A-Rows |   A-Time   | Buffers |
---------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT               |                    |      1 |        |     98 |00:00:00.06 |     657 |
|   1 |  UNION-ALL                     |                    |      1 |        |     98 |00:00:00.06 |     657 |
|   2 |   NESTED LOOPS                 |                    |      1 |     99 |     98 |00:00:00.05 |     386 |
|   3 |    NESTED LOOPS                |                    |      1 |     99 |     99 |00:00:00.05 |     287 |
|*  4 |     TABLE ACCESS FULL          | PDTCOST            |      1 |     99 |     99 |00:00:00.01 |     173 |
|*  5 |     INDEX RANGE SCAN           | PCM_PK             |     99 |      1 |     99 |00:00:00.01 |     114 |
|*  6 |    TABLE ACCESS BY INDEX ROWID | PDTCOST_CHARGE_MAP |     99 |      1 |     98 |00:00:00.01 |      99 |
|   7 |   NESTED LOOPS                 |                    |      1 |      2 |      0 |00:00:00.01 |     271 |
|   8 |    NESTED LOOPS                |                    |      1 |    100 |      1 |00:00:00.01 |     270 |
|*  9 |     TABLE ACCESS FULL          | PDTCOST            |      1 |    100 |    100 |00:00:00.01 |     166 |
|* 10 |     INDEX UNIQUE SCAN          | PCM_PK             |    100 |      1 |      1 |00:00:00.01 |     104 |
|  11 |      SORT AGGREGATE            |                    |      1 |      1 |      1 |00:00:00.01 |       2 |
|  12 |       INDEX FULL SCAN (MIN/MAX)| PCM_ACT_DT         |      1 |      1 |      1 |00:00:00.01 |       2 |
|* 13 |    TABLE ACCESS BY INDEX ROWID | PDTCOST_CHARGE_MAP |      1 |      1 |      0 |00:00:00.01 |       1 |
---------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   4 - filter(("P"."FILTER_COL"=0 AND "P"."BILLING_INACTIVE_DT" IS NULL))
   5 - access("PCM"."TRACKING_ID"="P"."TRACKING_ID" AND
              "PCM"."TRACKING_ID_SERV"="P"."TRACKING_ID_SERV")
   6 - filter("PCM"."INACTIVE_DT" IS NULL)
   9 - filter("P"."FILTER_COL"=0)
  10 - access("PCM"."TRACKING_ID"="P"."TRACKING_ID" AND
              "PCM"."TRACKING_ID_SERV"="P"."TRACKING_ID_SERV" AND "PCM"."ACTIVE_DT"=)
  13 - filter(("P"."BILLING_INACTIVE_DT" IS NOT NULL OR "PCM"."INACTIVE_DT" IS NOT NULL))


You’ll notice that this plan also displays an interesting little quirk – at operation 10 we can see the index unique scan of index pcm_act_dt that occurs once for each row returned from pdtcost; but each unique scan is preceded by a call to run the subquery (except that scalar subquery caching means the subquery runs only once in total) to supply a value for active_dt that can be used in the unique scan. (In the absence of the pcm_act_dt index the full scan min/max would be a fast full scan of the primary key.)

With a little luck the OP will be able to apply the same strategy to his query, though it may be a little harder to get the desired plan since the original query includes 6 tables; but the principle doesn’t change.

Footnote:

various people on the OTN thread have pointed out that there are some odd details about the optimizers cardinality predictions which may mean that part of the problem is simply an issue of misleading (possibly out of date) object statistics. It’s possible that with better estimates the optimizer may change the plan so much that even the strategy of getting all the rows from pdtcost_charge_map related to the rows acquired from pdtcost and then eliminating based on a late filter may be efficient enough for the OP.  By changing the data volume and distribution in my test case one of the plans (which predicted 100 rows from 100,000) was as follows:


-------------------------------------------------------------------------------------------------------------
| Id  | Operation                    | Name               | Starts | E-Rows | A-Rows |   A-Time   | Buffers |
-------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |                    |      1 |        |     98 |00:00:01.58 |    2307 |
|   1 |  NESTED LOOPS                |                    |      1 |     98 |     98 |00:00:01.58 |    2307 |
|   2 |   NESTED LOOPS               |                    |      1 |    100 |    100 |00:00:00.01 |    1814 |
|*  3 |    TABLE ACCESS FULL         | PDTCOST            |      1 |    100 |    100 |00:00:00.01 |    1699 |
|*  4 |    INDEX RANGE SCAN          | PCM_PK             |    100 |      1 |    100 |00:00:00.01 |     115 |
|*  5 |   TABLE ACCESS BY INDEX ROWID| PDTCOST_CHARGE_MAP |    100 |      1 |     98 |00:00:01.57 |     493 |
|   6 |    SORT AGGREGATE            |                    |      1 |      1 |      1 |00:00:01.57 |     393 |
|   7 |     INDEX FAST FULL SCAN     | PCM_PK             |      1 |    100K|    100K|00:00:01.02 |     393 |
-------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   3 - filter("P"."FILTER_COL"=0)
   4 - access("P"."TRACKING_ID"="PCM"."TRACKING_ID" AND
              "P"."TRACKING_ID_SERV"="PCM"."TRACKING_ID_SERV")
   5 - filter((("P"."BILLING_INACTIVE_DT" IS NULL AND "PCM"."INACTIVE_DT" IS NULL) OR
              "PCM"."ACTIVE_DT"=))

October 17, 2016

Fixed Stats

Filed under: Execution plans,Oracle,Statistics,Statspack,Troubleshooting — Jonathan Lewis @ 12:43 pm GMT Oct 17,2016

There are quite a lot of systems around the world that aren’t using the AWR (automatic workload repository) and ASH (active session history) tools to help them with trouble shooting because of the licensing requirement – so I’m still finding plenty of sites that are using Statspack and I recently came across a little oddity at one of these sites that I hadn’t noticed before: one of the Statspack snapshot statements was appearing fairly regularly in the Statspack report under the “SQL Ordered by Elapsed Time” section – even when the application had been rather busy and had generated lots of other work that was being reported. It was the following statement – the collection of file-level statistics:


select
       ts.name      tsname
     , df.name      filename
     , fs.phyrds
     , fs.phywrts
     , fs.readtim
     , fs.writetim
     , fs.singleblkrds
     , fs.phyblkrd
     , fs.phyblkwrt
     , fs.singleblkrdtim
     , fw.count     wait_count
     , fw.time      time
     , df.file#
  from x$kcbfwait   fw
     , v$filestat   fs
     , v$tablespace ts
     , v$datafile   df
 where ts.ts#    = df.ts#
   and fs.file#  = df.file#
   and fw.indx+1 = df.file#
;

The execution plan didn’t look very friendly, and the volume of I/O it generated (several hundred thousand disk reads) was surprising. The reason why the statement stood out so much in this case was that there was a fairly large number of files in the database (over 1,000) and the default execution plan was showing very bad cardinality estimates that resulted in highly inappropriate cartesian merge joins. At best the statement was taking around 2 minutes to run, at worst it was much, much worse.

This system was running 10g – also something which is still fairly common, though becoming much scarcer – which produced the following execution plan (which  I’ve recreated on a much smaller system):

-------------------------------------------------------------------------------------------------
| Id  | Operation                     | Name            | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT              |                 |     1 |   535 |     2 (100)| 00:00:01 |
|*  1 |  HASH JOIN                    |                 |     1 |   535 |     2 (100)| 00:00:01 |
|   2 |   MERGE JOIN CARTESIAN        |                 |     5 |  1190 |     0   (0)| 00:00:01 |
|   3 |    NESTED LOOPS               |                 |     1 |   199 |     0   (0)| 00:00:01 |
|   4 |     MERGE JOIN CARTESIAN      |                 |     1 |   173 |     0   (0)| 00:00:01 |
|*  5 |      FIXED TABLE FULL         | X$KCCTS         |     1 |    43 |     0   (0)| 00:00:01 |
|   6 |      BUFFER SORT              |                 |     1 |   130 |     0   (0)| 00:00:01 |
|*  7 |       FIXED TABLE FULL        | X$KCFIO         |     1 |   130 |     0   (0)| 00:00:01 |
|*  8 |     FIXED TABLE FIXED INDEX   | X$KCCFE (ind:1) |     1 |    26 |     0   (0)| 00:00:01 |
|   9 |    BUFFER SORT                |                 |   100 |  3900 |     0   (0)| 00:00:01 |
|  10 |     FIXED TABLE FULL          | X$KCBFWAIT      |   100 |  3900 |     0   (0)| 00:00:01 |
|* 11 |   VIEW                        | GV$DATAFILE     |     1 |   297 |     1 (100)| 00:00:01 |
|  12 |    SORT ORDER BY              |                 |     1 |   957 |     1 (100)| 00:00:01 |
|  13 |     NESTED LOOPS              |                 |     1 |   957 |     0   (0)| 00:00:01 |
|  14 |      NESTED LOOPS             |                 |     1 |   647 |     0   (0)| 00:00:01 |
|  15 |       NESTED LOOPS            |                 |     1 |   371 |     0   (0)| 00:00:01 |
|* 16 |        FIXED TABLE FULL       | X$KCCFN         |     1 |   323 |     0   (0)| 00:00:01 |
|* 17 |        FIXED TABLE FIXED INDEX| X$KCVFH (ind:1) |     1 |    48 |     0   (0)| 00:00:01 |
|* 18 |       FIXED TABLE FIXED INDEX | X$KCCFE (ind:1) |     1 |   276 |     0   (0)| 00:00:01 |
|* 19 |      FIXED TABLE FULL         | X$KCCFN         |     1 |   310 |     0   (0)| 00:00:01 |
-------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - access("TSTSN"="TS#" AND "K"."KCFIOFNO"="FILE#" AND "FILE#"="FW"."INDX"+1)
   5 - filter("TSTSN"<>(-1) AND "INST_ID"=USERENV('INSTANCE'))
   7 - filter("K"."INST_ID"=USERENV('INSTANCE'))
   8 - filter("F"."FEDUP"<>0 AND "F"."FENUM"="K"."KCFIOFNO")
  11 - filter("INST_ID"=USERENV('INSTANCE'))
  16 - filter("FN"."FNNAM" IS NOT NULL AND "FN"."FNTYP"=4 AND BITAND("FN"."FNFLG",4)<>4)
  17 - filter("FN"."FNFNO"="FH"."HXFIL")
  18 - filter("FE"."FEDUP"<>0 AND "FN"."FNFNO"="FE"."FENUM" AND
              "FE"."FEFNH"="FN"."FNNUM" AND "FE"."FETSN"<>(-1))
  19 - filter("FE"."FEPAX"<>65535 AND "FE"."FEPAX"<>0 AND "FE"."FEPAX"="FNAUX"."FNNUM"
              OR ("FE"."FEPAX"=0 OR "FE"."FEPAX"=65535) AND "FE"."FENUM"="FNAUX"."FNFNO" AND
              "FNAUX"."FNTYP"=4 AND "FNAUX"."FNNAM" IS NOT NULL AND BITAND("FNAUX"."FNFLG",4)<>4 AND
              "FE"."FEFNH"="FNAUX"."FNNUM")

Note particularly the two Cartesian merge joins and the very late filter at operation 1.

Note also the number of times the cardinality estimate is 1 – always a bit of a threat when the query gets complicated: “anything goes following a one for Rows”.

The easy (first thought) solution was simply to gather stats on all the fixed objects in this query:


begin
        dbms_stats.gather_table_stats('sys','x$kcbfwait',method_opt=>'for all columns size 1');
        dbms_stats.gather_table_stats('sys','x$kccfe',   method_opt=>'for all columns size 1');
        dbms_stats.gather_table_stats('sys','x$kccfn',   method_opt=>'for all columns size 1');
        dbms_stats.gather_table_stats('sys','x$kccts',   method_opt=>'for all columns size 1');
        dbms_stats.gather_table_stats('sys','x$kcfio',   method_opt=>'for all columns size 1');
        dbms_stats.gather_table_stats('sys','x$kcvfh',   method_opt=>'for all columns size 1');
end;
/

The option to gather fixed objects stats individually with a call to dbms_stats.gather_table_stats() is not commonly known, but it does work.

Here’s the plan (again from the small system) after stats collection:

--------------------------------------------------------------------------------------------------
| Id  | Operation                      | Name            | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT               |                 |    29 | 10411 |     3 (100)| 00:00:01 |
|*  1 |  HASH JOIN                     |                 |    29 | 10411 |     3 (100)| 00:00:01 |
|*  2 |   HASH JOIN                    |                 |     1 |   350 |     2 (100)| 00:00:01 |
|   3 |    MERGE JOIN CARTESIAN        |                 |    25 |  1325 |     0   (0)| 00:00:01 |
|   4 |     NESTED LOOPS               |                 |     4 |   148 |     0   (0)| 00:00:01 |
|*  5 |      FIXED TABLE FULL          | X$KCFIO         |   200 |  6200 |     0   (0)| 00:00:01 |
|*  6 |      FIXED TABLE FIXED INDEX   | X$KCCFE (ind:1) |     1 |     6 |     0   (0)| 00:00:01 |
|   7 |     BUFFER SORT                |                 |     7 |   112 |     0   (0)| 00:00:01 |
|*  8 |      FIXED TABLE FULL          | X$KCCTS         |     7 |   112 |     0   (0)| 00:00:01 |
|*  9 |    VIEW                        | GV$DATAFILE     |     1 |   297 |     1 (100)| 00:00:01 |
|  10 |     SORT ORDER BY              |                 |     1 |   316 |     1 (100)| 00:00:01 |
|  11 |      NESTED LOOPS              |                 |     1 |   316 |     0   (0)| 00:00:01 |
|  12 |       NESTED LOOPS             |                 |     1 |   248 |     0   (0)| 00:00:01 |
|  13 |        NESTED LOOPS            |                 |     1 |   226 |     0   (0)| 00:00:01 |
|* 14 |         FIXED TABLE FULL       | X$KCCFE         |     4 |   612 |     0   (0)| 00:00:01 |
|* 15 |         FIXED TABLE FIXED INDEX| X$KCCFN (ind:1) |     1 |    73 |     0   (0)| 00:00:01 |
|* 16 |        FIXED TABLE FIXED INDEX | X$KCVFH (ind:1) |     1 |    22 |     0   (0)| 00:00:01 |
|* 17 |       FIXED TABLE FULL         | X$KCCFN         |     1 |    68 |     0   (0)| 00:00:01 |
|  18 |   FIXED TABLE FULL             | X$KCBFWAIT      |   400 |  3600 |     0   (0)| 00:00:01 |
--------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - access("FILE#"="FW"."INDX"+1)
   2 - access("TSTSN"="TS#" AND "K"."KCFIOFNO"="FILE#")
   5 - filter("K"."INST_ID"=USERENV('INSTANCE'))
   6 - filter("F"."FEDUP"<>0 AND "F"."FENUM"="K"."KCFIOFNO")
   8 - filter("TSTSN"<>(-1) AND "INST_ID"=USERENV('INSTANCE'))
   9 - filter("INST_ID"=USERENV('INSTANCE'))
  14 - filter("FE"."FEDUP"<>0 AND "FE"."FETSN"<>(-1))
  15 - filter("FN"."FNTYP"=4 AND "FN"."FNNAM" IS NOT NULL AND BITAND("FN"."FNFLG",4)<>4
              AND "FN"."FNFNO"="FE"."FENUM" AND "FE"."FEFNH"="FN"."FNNUM")
  16 - filter("FN"."FNFNO"="FH"."HXFIL")
  17 - filter("FE"."FEPAX"<>65535 AND "FE"."FEPAX"<>0 AND "FE"."FEPAX"="FNAUX"."FNNUM" OR
              ("FE"."FEPAX"=0 OR "FE"."FEPAX"=65535) AND "FE"."FENUM"="FNAUX"."FNFNO" AND
              "FNAUX"."FNTYP"=4 AND "FNAUX"."FNNAM" IS NOT NULL AND BITAND("FNAUX"."FNFLG",4)<>4 AND
              "FE"."FEFNH"="FNAUX"."FNNUM")

Note the changes in cardinality estimates: they now look a little more realistic and we’re down to one cartesian merge join which (if you have a rough idea of what your X$ tables hold) still looks a little surprising at first sight but not completely unreasonable. A change of plan doesn’t necessarily mean much without the data and time behind it, of course, so here are the two sets of results from a 10g database with a handful of datafiles and tablespaces showing the Row Source Operation sections from the tkprof output before and after stats collection:

Before stats collection:

Rows     Row Source Operation
-------  ---------------------------------------------------
      6  HASH JOIN  (cr=0 pr=0 pw=0 time=1957860 us)
  16800   MERGE JOIN CARTESIAN (cr=0 pr=0 pw=0 time=33855662 us)
     42    NESTED LOOPS  (cr=0 pr=0 pw=0 time=73795 us)
   1400     MERGE JOIN CARTESIAN (cr=0 pr=0 pw=0 time=21555 us)
      7      FIXED TABLE FULL X$KCCTS (cr=0 pr=0 pw=0 time=3204 us)
   1400      BUFFER SORT (cr=0 pr=0 pw=0 time=7233 us)
    200       FIXED TABLE FULL X$KCFIO (cr=0 pr=0 pw=0 time=1210 us)
     42     FIXED TABLE FIXED INDEX X$KCCFE (ind:1) (cr=0 pr=0 pw=0 time=1859092 us)
  16800    BUFFER SORT (cr=0 pr=0 pw=0 time=67643 us)
    400     FIXED TABLE FULL X$KCBFWAIT (cr=0 pr=0 pw=0 time=2008 us)
      6   VIEW  GV$DATAFILE (cr=0 pr=0 pw=0 time=68087 us)
      6    SORT ORDER BY (cr=0 pr=0 pw=0 time=68065 us)
      6     NESTED LOOPS  (cr=0 pr=0 pw=0 time=65989 us)
      6      NESTED LOOPS  (cr=0 pr=0 pw=0 time=56632 us)
      6       NESTED LOOPS  (cr=0 pr=0 pw=0 time=47217 us)
      6        FIXED TABLE FULL X$KCCFN (cr=0 pr=0 pw=0 time=19830 us)
      6        FIXED TABLE FIXED INDEX X$KCVFH (ind:1) (cr=0 pr=0 pw=0 time=25568 us)
      6       FIXED TABLE FIXED INDEX X$KCCFE (ind:1) (cr=0 pr=0 pw=0 time=9849 us)
      6      FIXED TABLE FULL X$KCCFN (cr=0 pr=0 pw=0 time=9715 us)

After stats collection:


Rows     Row Source Operation
-------  ---------------------------------------------------
      6  HASH JOIN  (cr=0 pr=0 pw=0 time=196576 us)
      6   HASH JOIN  (cr=0 pr=0 pw=0 time=195829 us)
     42    MERGE JOIN CARTESIAN (cr=0 pr=0 pw=0 time=4390 us)
      6     NESTED LOOPS  (cr=0 pr=0 pw=0 time=7810 us)
    200      FIXED TABLE FULL X$KCFIO (cr=0 pr=0 pw=0 time=1224 us)
      6      FIXED TABLE FIXED INDEX X$KCCFE (ind:1) (cr=0 pr=0 pw=0 time=150150 us)
     42     BUFFER SORT (cr=0 pr=0 pw=0 time=1574 us)
      7      FIXED TABLE FULL X$KCCTS (cr=0 pr=0 pw=0 time=1353 us)
      6    VIEW  GV$DATAFILE (cr=0 pr=0 pw=0 time=41058 us)
      6     SORT ORDER BY (cr=0 pr=0 pw=0 time=41005 us)
      6      NESTED LOOPS  (cr=0 pr=0 pw=0 time=39399 us)
      6       NESTED LOOPS  (cr=0 pr=0 pw=0 time=34229 us)
      6        NESTED LOOPS  (cr=0 pr=0 pw=0 time=15583 us)
      6         FIXED TABLE FULL X$KCCFE (cr=0 pr=0 pw=0 time=1124 us)
      6         FIXED TABLE FIXED INDEX X$KCCFN (ind:1) (cr=0 pr=0 pw=0 time=15067 us)
      6        FIXED TABLE FIXED INDEX X$KCVFH (ind:1) (cr=0 pr=0 pw=0 time=18971 us)
      6       FIXED TABLE FULL X$KCCFN (cr=0 pr=0 pw=0 time=5581 us)
    400   FIXED TABLE FULL X$KCBFWAIT (cr=0 pr=0 pw=0 time=1615 us)

The execution time has dropped from about 2 seconds to less than 1/10th of a second – and all I’ve got is 6 or 7 files and tablespaces.  (Some of the “time=” values in the first plan are very odd, but the final time figure is about right.)

Generating an intermediate result set of 16,800 rows for a tiny number of files is not a good thing – just imagine how big that number would get with 1,000 files and a couple of hundred tablespaces.

I have to say that (for a couple of tiny databases) the 11.2.0.4 and 12.1.0.2 systems I checked this query on had no problem and immediately picked a sensible path. It’s possible that the definition of some of the v$ objects has actually changed or that the optimizer features have changed (some assistence from complex view merging, perhaps) – but if you are still running Statspack, even if it’s on 11g or 12c, then it’s worth checking from time to time how much work goes into executing the snapshot and seeing if you need some fixed object stats to make things a little more efficient.

Footnote:

Several years ago I wrote a short note about how Statspack actually captured its own execution time (from 10g onwards) and how you could run a report on it to check the run time. It’s worth running that report from time to time. I’ve recently updated that note to add the equivalent query against the AWR.

September 29, 2016

IOT limitation

Filed under: Execution plans,Infrastructure,IOT,Oracle — Jonathan Lewis @ 10:17 am GMT Sep 29,2016

In the right circumstances Index Organized Tables (IOTs) give us tremendous benefits – provided you use them in the ideal fashion. Like so many features in Oracle, though, you often have to compromise between the benefit you really need and the cost of the side effect that a feature produces.

The fundamental design targets for an IOT are that you have short rows and only want to access them through index range scans of primary key. The basic price you pay for optimised access is the extra work you have to do as you insert the data. Anything you do outside the two specific targets is likely to lead to increased costs of using the IOT – and there’s one particular threat that I’ve mentioned twice in the past (here and here). I want to mention it one more time with a focus on client code and reporting.


create table iot1 (
        id1     number(7.0),
        id2     number(7.0),
        v1      varchar2(10),
        v2      varchar2(10),
        padding varchar2(500),
        constraint iot1_pk primary key(id1, id2)
)
organization index
including id2
overflow
;

insert into iot1
with generator as (
        select  --+ materialize
                rownum id
        from dual
        connect by
                level <= 1e4
)
select
        mod(rownum,311)                 id1,
        mod(rownum,337)                 id2,
        to_char(mod(rownum,20))         v1,
        to_char(trunc(rownum/100))      v2,
        rpad('x',500,'x')               padding
from
        generator       v1,
        generator       v2
where
        rownum <= 1e5 ; commit; begin dbms_stats.gather_table_stats( ownname => user,
                tabname          => 'IOT1'
                method_opt       => 'for all columns size 1'
        );
end;
/

alter system flush buffer_cache;

select table_name, blocks from user_tables where table_name = 'IOT1' or table_name like 'SYS_IOT_OVER%';
select index_name, leaf_blocks from user_indexes where table_name = 'IOT1';

set autotrace traceonly
select max(v2) from iot1;
set autotrace off

I’ve created an index organized table with an overflow. The table definition places all columns after the id2 column into the overflow segment. After collecting stats I’ve then queried the table with a query that, for a heap table, would produce a tablescan as the execution plan. But there is no “table”, there is only an index for an IOT. Here’s the output I get (results from 11g and 12c are very similar):

TABLE_NAME               BLOCKS
-------------------- ----------
SYS_IOT_OVER_151543        8074
IOT1

INDEX_NAME           LEAF_BLOCKS
-------------------- -----------
IOT1_PK                      504

---------------------------------------------------------------------------------
| Id  | Operation             | Name    | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------
|   0 | SELECT STATEMENT      |         |     1 |     4 | 99793   (1)| 00:00:04 |
|   1 |  SORT AGGREGATE       |         |     1 |     4 |            |          |
|   2 |   INDEX FAST FULL SCAN| IOT1_PK |   100K|   390K| 99793   (1)| 00:00:04 |
---------------------------------------------------------------------------------

Statistics
----------------------------------------------------------
     100376  consistent gets
       8052  physical reads

The index segment has 504 leaf blocks, the overflow segment has 8,074 used blocks below the high water mark. The plan claims an index fast full scan of the index segment – but the physical reads statistic looks more like a “table” scan of the overflow segment. What’s actually happening ?

The 100,000+ consistent reads should tell you what’s happening – we really are doing an index fast full scan on the index segment, and for each index entry we go to the overflow segment to find the v2 value. Oracle doesn’t have a mechanism for doing a “tablescan” of just the overflow segment – even though the definition of the IOT ought (apparently) to be telling Oracle exactly which columns are in the overflow.

In my particular test Oracle reported a significant number of “db file scattered read” waits against the overflow segment, but these were for “prefetch warmup”; in a normal system with a buffer cache full of other data this wouldn’t have happened. The other interesting statistic that showed up was “table fetch continued row” – which was (close to) 100,000, again highlighting that we weren’t doing a normal full tablescan.

In terms of normal query processing this anomaly of attempted “tablescans” being index driven probably isn’t an issue but, as I pointed out in one of earlier posts on the topic, when Oracle gathers stats on the “table” it will do a “full tablescan”. If you have a very large table with an overflow segment it could be a very slow process – especially if you’ve engineered the IOT for the right reason, viz: the data arrives in the wrong order relative to the order you want to query it, and you’ve kept the rows in the IOT_TOP short by dumping the rarely used data in the overflow. With this in mind you might want to make sure that you write a bit of special code that gathers stats only on the columns you know to be in the IOT_TOP, creates representative numbers for the other columns, then locks the stats until the next time you want to refresh them.

 

June 22, 2016

Conditional SQL- 6

Filed under: Conditional SQL,Execution plans,Oracle,Performance — Jonathan Lewis @ 2:16 pm GMT Jun 22,2016

An odd little anomaly showed up on the OTN database forum a few days ago where a query involving a table covered by Oracle Label Security (OLS) seemed to wrap itself into a non-mergeable view when written using traditional Oracle SQL, but allowed for view-merging when accessed through ANSI standard SQL. I don’t know why there’s a difference but it did prompt a thought about non-mergeable views and what I’ve previously called “conditional SQL” – namely SQL which holds a predicate that should have been tested in the client code and not passed to the database engine.

The thought was this – could the database engine decide to do a lot of redundant work if you stuck a silly predicate inside a non-mergeable view: the answer turns out to be yes. Here’s a demonstration I’ve run on 11g and 12c:


rem
rem     Script:         conditional_fail.sql
rem     Author:         Jonathan Lewis
rem     Dated:          June 2016
rem

create table t2
nologging
as
with generator as (
        select  --+ materialize
                rownum id
        from dual
        connect by
                level <= 1e4
)
select
        rownum                  id,
        mod(rownum,1e5)         n1,
        lpad('x',100,'x')       padding
from
        generator       v1,
        generator       v2
where
        rownum <= 1e6
;

create table t1
nologging
as
with generator as (
        select  --+ materialize
                rownum id
        from dual
        connect by
                level <= 1e4
)
select
        rownum                  id,
        mod(rownum,1e3)         n1,
        lpad('x',100,'x')       padding
from
        generator       v1,
        generator       v2
where
        rownum <= 1e5
;
-- gather simple statistics (not needed in 12c) 

begin
        dbms_stats.gather_table_stats(
                ownname          => user,
                tabname          => 'T1',
                method_opt       => 'for all columns size 1'
        );
        dbms_stats.gather_table_stats(
                ownname          => user,
                tabname          => 'T2',
                method_opt       => 'for all columns size 1'
        );
end;
/

create index t2_i1 on t2(id);

variable b1 number
variable b2 number

exec :b1 := 1; :b2 := 0

There’s nothing terribly significant about the data, beyond the fact that I’ve got a “small” table and a “large” table that I can use to encourage the optimizer to do a hash join. I’ve also created a couple of bind variables and set them to values that ensure that we can see that b1 is definitely not smaller than b2. So here’s a simple query – with a mergeable inline view in the first instance which is then hinted to make the view non-mergeable.


select
        t1.n1, count(*), sum(v1.n1)
from
        t1,
        (select t2.id, t2.n1 from t2 where :b1 < :b2) v1
where
        t1.n1 = 0
and     v1.id = t1.id
group by
        t1.n1
;

select
        t1.n1, count(*), sum(v1.n1)
from
        t1,
        (select /*+ no_merge */ t2.id, t2.n1 from t2 where :b1 < :b2) v1
where
        t1.n1 = 0
and     v1.id = t1.id
group by
        t1.n1
;

Clearly, for our values of b1 and b2, the query will not return any data. In fact we can go further and say that the presence of the “bind variable predicate” in the inline view either has no effect on the volume of data returned or it eliminates all the data. But the presence of the no_merge hint makes a difference to how much work Oracle does for the “no data” option. Here are the two plans, pulled from the memory of an 11g instance after enabling rowsource execution statistics – first when the view is mergeable:


-----------------------------------------------------------------------------------------------------
| Id  | Operation                      | Name  | Starts | E-Rows | Cost (%CPU)| A-Rows |   A-Time   |
-----------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT               |       |      1 |        |   508 (100)|      0 |00:00:00.01 |
|   1 |  SORT GROUP BY NOSORT          |       |      1 |      1 |   508   (2)|      0 |00:00:00.01 |
|*  2 |   FILTER                       |       |      1 |        |            |      0 |00:00:00.01 |
|   3 |    NESTED LOOPS                |       |      0 |    100 |   508   (2)|      0 |00:00:00.01 |
|   4 |     NESTED LOOPS               |       |      0 |    100 |   508   (2)|      0 |00:00:00.01 |
|*  5 |      TABLE ACCESS FULL         | T1    |      0 |    100 |   208   (4)|      0 |00:00:00.01 |
|*  6 |      INDEX RANGE SCAN          | T2_I1 |      0 |      1 |     2   (0)|      0 |00:00:00.01 |
|   7 |     TABLE ACCESS BY INDEX ROWID| T2    |      0 |      1 |     3   (0)|      0 |00:00:00.01 |
-----------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - filter(:B1<:B2)
   5 - filter("T1"."N1"=0)
   6 - access("T2"."ID"="T1"."ID")

Notice how, despite t2 being the second table in the join, the bind variable predicate has worked its way to the top of the execution plan and execution has terminated after the run-time engine has determined that 1 is not less than zero.

Compare this with the plan when the view is non-mergeable:

------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation              | Name    | Starts | E-Rows | Cost (%CPU)| A-Rows |   A-Time   | Buffers |  OMem |  1Mem | Used-Mem |
------------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT       |         |      1 |        |  2300 (100)|      0 |00:00:00.01 |    1599 |       |       |          |
|   1 |  SORT GROUP BY NOSORT  |         |      1 |      1 |  2300   (4)|      0 |00:00:00.01 |    1599 |       |       |          |
|*  2 |   HASH JOIN            |         |      1 |    100 |  2300   (4)|      0 |00:00:00.01 |    1599 |  2061K|  2061K| 1109K (0)|
|   3 |    JOIN FILTER CREATE  | :BF0000 |      1 |    100 |   208   (4)|    100 |00:00:00.01 |    1599 |       |       |          |
|*  4 |     TABLE ACCESS FULL  | T1      |      1 |    100 |   208   (4)|    100 |00:00:00.01 |    1599 |       |       |          |
|   5 |    VIEW                |         |      1 |   1000K|  2072   (3)|      0 |00:00:00.01 |       0 |       |       |          |
|*  6 |     FILTER             |         |      1 |        |            |      0 |00:00:00.01 |       0 |       |       |          |
|   7 |      JOIN FILTER USE   | :BF0000 |      0 |   1000K|  2072   (3)|      0 |00:00:00.01 |       0 |       |       |          |
|*  8 |       TABLE ACCESS FULL| T2      |      0 |   1000K|  2072   (3)|      0 |00:00:00.01 |       0 |       |       |          |
------------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("V1"."ID"="T1"."ID")
   4 - filter("T1"."N1"=0)
   6 - filter(:B1<:B2)
   8 - filter(SYS_OP_BLOOM_FILTER(:BF0000,"T2"."ID"))

Thanks to the no_merge hint the bind variable predicate has not been promoted to the top of the plan, so the run-time engine has produced a plan that requires it to access data from table t1 before visiting table t2. In fact the optimizer has decided to do a hash join gathering all the relevant data from t1 and building an in-memory hash table before deciding that 1 is greater than zero and terminating the query.

Be careful if you write SQL that compares bind variables (or other pseudo-constants such as calls to sys_context) with bind variables (etc.); you may find that you’ve managed to produce code that forces the optimizer to do work that it could have avoided if only it had found a way of doing that comparison at the earliest possible moment.

These plans were from 11g, but 12c can behave the same way although, with my specific data set, I had to add the no_push_pred() hint to the query to demonstrate the effect of hash join appearing.

Footnote 1

An interesting side effect of this plan is that it has also allowed a Bloom filter to appear in a serial hash join – not something you’d normally expect to see, so I changed the predicate to :b1 > :b2 to see if the no_merge hint combined with the Bloom filter was faster than merging without the Bloom filter. Here are the two plans, pulled from memory. Running this on 11g I had to hint the hash join when I removed the no_merge hint:


Non-mergeable view - Bloom filter appears
------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation              | Name    | Starts | E-Rows | Cost (%CPU)| A-Rows |   A-Time   | Buffers |  OMem |  1Mem | Used-Mem |
------------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT       |         |      1 |        |  2300 (100)|      1 |00:00:00.12 |   17725 |       |       |          |
|   1 |  SORT GROUP BY NOSORT  |         |      1 |      1 |  2300   (4)|      1 |00:00:00.12 |   17725 |       |       |          |
|*  2 |   HASH JOIN            |         |      1 |    100 |  2300   (4)|    100 |00:00:00.12 |   17725 |  2061K|  2061K| 1106K (0)|
|   3 |    JOIN FILTER CREATE  | :BF0000 |      1 |    100 |   208   (4)|    100 |00:00:00.01 |    1599 |       |       |          |
|*  4 |     TABLE ACCESS FULL  | T1      |      1 |    100 |   208   (4)|    100 |00:00:00.01 |    1599 |       |       |          |
|   5 |    VIEW                |         |      1 |   1000K|  2072   (3)|   1605 |00:00:00.10 |   16126 |       |       |          |
|*  6 |     FILTER             |         |      1 |        |            |   1605 |00:00:00.09 |   16126 |       |       |          |
|   7 |      JOIN FILTER USE   | :BF0000 |      1 |   1000K|  2072   (3)|   1605 |00:00:00.08 |   16126 |       |       |          |
|*  8 |       TABLE ACCESS FULL| T2      |      1 |   1000K|  2072   (3)|   1605 |00:00:00.07 |   16126 |       |       |          |
------------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("V1"."ID"="T1"."ID")
   4 - filter("T1"."N1"=0)
   6 - filter(:B1<B2)
   8 - filter(SYS_OP_BLOOM_FILTER(:BF0000,"T2"."ID"))

View merging allowed - no Bloom filter
-------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation            | Name | Starts | E-Rows | Cost (%CPU)| A-Rows |   A-Time   | Buffers |  OMem |  1Mem | Used-Mem |
-------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT     |      |      1 |        |  2300 (100)|      1 |00:00:07.56 |   17725 |       |       |          |
|   1 |  SORT GROUP BY NOSORT|      |      1 |      1 |  2300   (4)|      1 |00:00:07.56 |   17725 |       |       |          |
|*  2 |   FILTER             |      |      1 |        |            |    100 |00:00:07.56 |   17725 |       |       |          |
|*  3 |    HASH JOIN         |      |      1 |    100 |  2300   (4)|    100 |00:00:07.56 |   17725 |  2061K|  2061K| 1446K (0)|
|*  4 |     TABLE ACCESS FULL| T1   |      1 |    100 |   208   (4)|    100 |00:00:00.01 |    1599 |       |       |          |
|   5 |     TABLE ACCESS FULL| T2   |      1 |   1000K|  2072   (3)|   1000K|00:00:01.94 |   16126 |       |       |          |
-------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - filter(:B1<:B2)
   3 - access("T2"."ID"="T1"."ID")
   4 - filter("T1"."N1"=0)

Things to note – the Bloom filter eliminated all but 1,605 rows from the tablescan before passing them to the hash join operation to probe the hash table; the run time of the query without filtering was 7.56 seconds (!) compared to 0.12 seconds with the Bloom filter. Fortunately I decided that this was too good to be true BEFORE I published the results and decided to re-run the tests with statistics_level set back to typical and found that most of the difference was CPU time spent on collecting rowsource execution statistics. The query with the Bloom filter was still faster, but only just – the times were more like 0.09 seconds vs. 0.12 seconds.

Footnote 2

The source of the problem on OTN was that as well as using OLS the query in question included a user-defined function. Since you can write a user-defined function that “spies” on the data content and uses (e.g.) dbms_output to write data to the terminal this poses a security risk; if a predicate calling that function executed before the security predicate had been tested then your function could output data that your query shouldn’t be able to report. To avoid this security loophole Oracle restricts the way it merges views (unless you set optimizer_secure_view_merging to false). For a more detailed explanation and demonstration of the issues, see this item on Christian Antognini’s blog.

I still don’t know why the ANSI form of the query managed to bypass this issue, but the predicate with the user-defined function was applied as a filter at the very last step of the plan, so perhaps there was something about the transformation Oracle took to get from ANSI syntax to its internal syntax (with cascading lateral views) that made it possible for the optimizer to recognize and eliminate the security threat efficiently.

June 20, 2016

Plan Shapes

Filed under: Execution plans,Oracle — Jonathan Lewis @ 12:58 pm GMT Jun 20,2016

There are a number of articles, webinars, and blogs online about how to read execution plans, but many of them seem to stop after the the minimum description of the simplest type of plan, so I thought I’d throw out a brief comment on a couple of the slightly more complicated things that are likely to appear fairly commonly because you sometimes find plans with very similar shapes but extremely different interpretation.

First: select with scalar subqueries in the select list (there’s no need to worry about what the table definitions look like):


rem
rem     Script:         plan_shapes.sql
rem     Author:         Jonathan Lewis
rem     Dated:          May 2016
rem

select
        id, n1,
        (select /*+  no_unnest */ max(n1) from t2 where t2.id = t1.n1)  t2_max,
        (select /*+  no_unnest */ max(n1) from t3 where t3.id = t1.n1)  t3_max
from
        t1
where
        id between 100 and 119
;


--------------------------------------------------------------------------------------------------------
| Id  | Operation                            | Name  | Starts | E-Rows | A-Rows |   A-Time   | Buffers |
--------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                     |       |      1 |        |     20 |00:00:00.01 |       8 |
|   1 |  SORT AGGREGATE                      |       |     20 |      1 |     20 |00:00:00.01 |      63 |
|   2 |   TABLE ACCESS BY INDEX ROWID BATCHED| T2    |     20 |      2 |     40 |00:00:00.01 |      63 |
|*  3 |    INDEX RANGE SCAN                  | T2_I1 |     20 |      2 |     40 |00:00:00.01 |      23 |
|   4 |  SORT AGGREGATE                      |       |     20 |      1 |     20 |00:00:00.01 |      83 |
|   5 |   TABLE ACCESS BY INDEX ROWID BATCHED| T3    |     20 |      3 |     60 |00:00:00.01 |      83 |
|*  6 |    INDEX RANGE SCAN                  | T3_I1 |     20 |      3 |     60 |00:00:00.01 |      23 |
|   7 |  TABLE ACCESS BY INDEX ROWID BATCHED | T1    |      1 |     21 |     20 |00:00:00.01 |       8 |
|*  8 |   INDEX RANGE SCAN                   | T1_I1 |      1 |     21 |     20 |00:00:00.01 |       4 |
--------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   3 - access("T2"."ID"=:B1)
   6 - access("T3"."ID"=:B1)
   8 - access("ID">=100 AND "ID"<=119)

We have a select statement, with two columns in the select list generated by (correlated) scalar subqueries.

The overall shape of the plan shows the driving query as the last child plan for the SELECT (operations 7-8). The first and second child plans are the plans for the two scalar subqueries in turn (and the order the sub-plans appear is the order of the scalar subqueries in the select list). In this case the main query returned 20 rows (A-Rows=20), and the scalar subqueries executed 20 times each. There are a few other details we could mention, but the key feature of the plan is that the driver is the last sub-plan.

Second: update with scalar subqueries:


update t1
set
        n1 = (select max(n1) from t2 where t2.id = t1.n1),
        v1 = (select max(v1) from t3 where t3.id = t1.n1)
where
        id between 1000 and 1999
;

---------------------------------------------------------------------------------------------------------
| Id  | Operation                             | Name  | Starts | E-Rows | A-Rows |   A-Time   | Buffers |
---------------------------------------------------------------------------------------------------------
|   0 | UPDATE STATEMENT                      |       |      1 |        |      0 |00:00:00.13 |   10361 |
|   1 |  UPDATE                               | T1    |      1 |        |      0 |00:00:00.13 |   10361 |
|   2 |   TABLE ACCESS BY INDEX ROWID BATCHED | T1    |      1 |   1001 |   1000 |00:00:00.01 |      22 |
|*  3 |    INDEX RANGE SCAN                   | T1_I1 |      1 |   1001 |   1000 |00:00:00.01 |       4 |
|   4 |   SORT AGGREGATE                      |       |    916 |      1 |    916 |00:00:00.04 |    3672 |
|   5 |    TABLE ACCESS BY INDEX ROWID BATCHED| T2    |    916 |      2 |   1832 |00:00:00.03 |    3672 |
|*  6 |     INDEX RANGE SCAN                  | T2_I1 |    916 |      2 |   1832 |00:00:00.01 |    1840 |
|   7 |   SORT AGGREGATE                      |       |    916 |      1 |    916 |00:00:00.05 |    4588 |
|   8 |    TABLE ACCESS BY INDEX ROWID BATCHED| T3    |    916 |      3 |   2748 |00:00:00.04 |    4588 |
|*  9 |     INDEX RANGE SCAN                  | T3_I1 |    916 |      3 |   2748 |00:00:00.01 |    1840 |
---------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   3 - access("ID">=1000 AND "ID"<=1999)
   6 - access("T2"."ID"=:B1)
   9 - access("T3"."ID"=:B1)


In this statement we update two columns by executing (correlated) scalar subqueries. The most important feature of interpreting this plan is that it is the exact opposite of the earlier select statement. In this plan the first subplan is the driving part of the statement – it’s the part of the plan that tells us how to find rows to be updated (and we find 1,000 of them); the 2nd and 3rd sub-plans correspond to the subqueries in the two columns whose value we set. In this case we happen to get some benefit from scalar subquery caching so the two subqueries each run 916 times. (The fact that both subqueries run the same number of times is not a coincidence – the caching benefit is dependent on the value(s) used for driving the subqueries, and that’s the t1.n1 column in both cases.)

Finally: because people do all sorts of complicated things, and it’s easy to get deceived if you read the plan without seeing the SQL, one more example (starting with the plan):


---------------------------------------------------------------------------------------------------------
| Id  | Operation                             | Name  | Starts | E-Rows | A-Rows |   A-Time   | Buffers |
---------------------------------------------------------------------------------------------------------
|   0 | UPDATE STATEMENT                      |       |      1 |        |      0 |00:00:00.14 |   11257 |
|   1 |  UPDATE                               | T1    |      1 |        |      0 |00:00:00.14 |   11257 |
|   2 |   TABLE ACCESS BY INDEX ROWID BATCHED | T1    |      1 |   1001 |   1000 |00:00:00.01 |      22 |
|*  3 |    INDEX RANGE SCAN                   | T1_I1 |      1 |   1001 |   1000 |00:00:00.01 |       4 |
|   4 |   SORT AGGREGATE                      |       |   1103 |      1 |   1103 |00:00:00.06 |    5519 |
|   5 |    TABLE ACCESS BY INDEX ROWID BATCHED| T3    |   1103 |      3 |   3306 |00:00:00.04 |    5519 |
|*  6 |     INDEX RANGE SCAN                  | T3_I1 |   1103 |      3 |   3306 |00:00:00.01 |    2213 |
|   7 |   SORT AGGREGATE                      |       |    916 |      1 |    916 |00:00:00.11 |    9191 |
|   8 |    TABLE ACCESS BY INDEX ROWID BATCHED| T2    |    916 |      2 |   1832 |00:00:00.03 |    3672 |
|*  9 |     INDEX RANGE SCAN                  | T2_I1 |    916 |      2 |   1832 |00:00:00.01 |    1840 |
---------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   3 - access("ID">=1000 AND "ID"<=1999)
   6 - access("T3"."ID"=:B1)
   9 - access("T2"."ID"=:B1)


This plan looks very like the previous plan – it’s an update so the first sub-plan will be identifying the rows to be updated, but then what ?

The easiest assumption – always one worth considering, providing you remember that there are other possibilities – is that we have two subqueries executing for each row we update – but in this example there’s a little clue in the numbers that this isn’what the statement does. The driving query (operations 2 – 3) identifies 1,000 rows, so how can operations 4 – 6 start 1,103 times each ? Something more complicated is going on.

I’ve combined the features of the first two examples. I have a single set subquery, but it contains a scalar subquery in the select list – so operations 4 – 9 are the sub-plan for a single select statement – and a select statement with a scalar subquery in the select list puts the main plan last (operations 7 – 9) and its scalar subquery sub-plan first (operations 4 – 6). Here’s the SQL:


update t1
set
        (n1,v1) = (
                select
                        max(n1),
                        max((select /*+  no_unnest */ max(v1) from t3 where t3.id = t2.n1))
                from    t2
                where   t2.id = t1.n1
        )
where
        id between 1000 and 1999
;

What happens here is that the driving body of the update statement identifies 1,000 rows so the scalar subquery against t2 should execute 1,000 times; thanks to scalar subquery caching, though, it only executes 916 times. Each time it executes it finds 2 row and for each of those rows it executes the scalar subquery against t3 which, fortunately, also benefits from its own scalar subquery caching and so runs only 1,103 times in total

I could go on, of course, with increasingly complex examples – for example scalar subqueries that contain decode() calls with scalar subqueries as their inputs; where clauses which compare the results of scalar subqueries, and so on. It can get quite difficult to see, even with the SQL in front of you, what the plan is trying to tell you so I’ll leave you with one thought: always use the qb_name() hint to name every single subquery so that, if interpretation of the plan gets a bit difficult, you can use the ‘alias’ formatting option in the call to dbms_xplan to connect each table reference in the plan with the query block it came from in the query.

 

 

June 10, 2016

Uniquely parallel

Filed under: 12c,distributed,Execution plans,Oracle,Parallel Execution,Performance,Upgrades — Jonathan Lewis @ 7:36 am GMT Jun 10,2016

Here’s a surprising (to me) execution plan from 12.1.0.2 – parallel execution to find one row in a table using a unique scan of a unique index – produced by running the following script (data creation SQL to follow):


set serveroutput off
set linesize 180
set trimspool on
set pagesize 60

alter session set statistics_level = all;

variable b1 number
exec :b1 := 50000

select /*+ parallel (3) */ id, v1 from t2 where id=:b1;

select * from table(dbms_xplan.display_cursor(null,null,'allstats parallel'));

break on dfo_number skip 1 on tq_id skip 1 on server_type

select
        dfo_number, tq_id, server_type, instance, process, num_rows
from
        v$pq_tqstat
order by
        dfo_number, tq_id, server_type desc, instance, process
;

All I’ve done is enable rowsource execution statistics, set a bind variable to a value, query a table with a /*+ parallel(3) */ hint to find the one row that will be identified by primary key, and then reported the actual execution plan. When I first ran the test Oracle didn’t report the execution statistics correctly so I’ve also queried v$pq_tqstat to show the PX servers used and the flow of data through the plan. Here’s the plan, followed by the  results from v$pq_tqstat:


SQL_ID  0dzynh9d29pt9, child number 0
-------------------------------------
select /*+ parallel (3) */ id,v1 from t2 where id=:b1

Plan hash value: 247082613

---------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                         | Name     | Starts | E-Rows |    TQ  |IN-OUT| PQ Distrib | A-Rows |   A-Time   |
---------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                  |          |      1 |        |        |      |            |      1 |00:00:00.02 |
|   1 |  PX COORDINATOR                   |          |      1 |        |        |      |            |      1 |00:00:00.02 |
|   2 |   PX SEND QC (RANDOM)             | :TQ10001 |      0 |      1 |  Q1,01 | P->S | QC (RAND)  |      0 |00:00:00.01 |
|   3 |    TABLE ACCESS BY INDEX ROWID    | T2       |      0 |      1 |  Q1,01 | PCWP |            |      0 |00:00:00.01 |
|   4 |     BUFFER SORT                   |          |      0 |        |  Q1,01 | PCWC |            |      0 |00:00:00.01 |
|   5 |      PX RECEIVE                   |          |      0 |      1 |  Q1,01 | PCWP |            |      0 |00:00:00.01 |
|   6 |       PX SEND HASH (BLOCK ADDRESS)| :TQ10000 |      0 |      1 |  Q1,00 | S->P | HASH (BLOCK|      0 |00:00:00.01 |
|   7 |        PX SELECTOR                |          |      0 |        |  Q1,00 | SCWC |            |      0 |00:00:00.01 |
|*  8 |         INDEX UNIQUE SCAN         | T2_PK    |      0 |      1 |  Q1,00 | SCWP |            |      0 |00:00:00.01 |
---------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   8 - access("ID"=:B1)

Note
-----
   - Degree of Parallelism is 3 because of hint

DFO_NUMBER      TQ_ID SERVER_TYP   INSTANCE PROCES   NUM_ROWS
---------- ---------- ---------- ---------- ------ ----------
         1          0 Producer            1 P003            0
                                          1 P004            1
                                          1 P005            0
                      Consumer            1 P000            0
                                          1 P001            1
                                          1 P002            0

                    1 Producer            1 P000            0
                                          1 P001            1
                                          1 P002            0
                      Consumer            1 QC              1

As you can see the table access follows a unique scan of an index and, although the rowsource execution stats report zero starts for the unique scan, we can see from v$pq_tqstat that slave P004 acquired a “row” (actually a rowid) and passed it to slave P001 which then acquired a row from the table and passed that row to the query coordinator. Oracle really did execute a parallel query, starting and stopping a total of 6 sessions to perform a single unique index access.

You’ll notice operation 7 is one you’ve only seen in the latest version of Oracle. The PX SELECTOR was introduced in 12c to reduce the number of times a complex parallel query would funnel into the query coordinator (parallel to serial) and then fan out again (serial to parallel) generating a new data flow operation tree (DFO tree) spawning one or two new parallel server groups as it did so. To stop this happening a step that needs to serialise in a 12c parallel plan can nominate one of the existing PX server processes (from each set, if necessary) to do the job so that the same set of PX servers can carry on running the query without the need for a new DFO tree to appear.

This enhancement to parallel execution plans is a good idea – except when it appears in my silly little query and turns something that ought to be quick and cheap into a job that is far more resource-intensive than it should be.

At this point, of course, you’re probably wondering what kind of idiot would put a parallel() hint into a query that was doing nothing but selecting one row by primary key – the answer is: “the Oracle optimizer in 12c”. I discovered this anomaly while creating a demonstration of the way that a distributed parallel query has to serialise through a single database link even if the operations at the two ends of the link run parallel. Here’s the SQL I wrote for the full demonstration:


rem     Script:         distributed_pq.sql
rem     Author:         Jonathan Lewis
rem     Dated:          May 2016

define m_remote='test@loopback'
define m_remote='orcl@loopback'

create table t1
nologging
as
with generator as (
        select  --+ materialize
                rownum id
        from dual
        connect by
                level <= 1e4
)
select
        cast(rownum as number(8,0))                     id,
        cast(lpad(rownum,8,'0') as varchar2(8))         v1,
        cast(rpad('x',100) as varchar2(100))            padding
from
        generator       v1,
        generator       v2
where
        rownum <= 1e5
;

create table t2
as
select  *
from    t1
where   mod(id,100) = 0
;

alter table t2 add constraint t2_pk primary key(id);

begin
        dbms_stats.gather_table_stats(
                ownname          => user,
                tabname          =>'T1',
                method_opt       => 'for all columns size 1'
        );

        dbms_stats.gather_table_stats(
                ownname          => user,
                tabname          =>'T2',
                method_opt       => 'for all columns size 1'
        );
end;
/

set serveroutput off

select
        /*+ parallel(3) */
        t1.v1, t2.v1
from
        t1,
        t2@&m_remote
where
        mod(t1.id,10) = 0
and     t2.id = t1.id
and     mod(to_number(t2.v1),10) = 1
;

select * from table(dbms_xplan.display_cursor(null,null,'allstats parallel'));

If you want to run this demo you’ll need to do something about formatting the output; more importantly you’ll have to create a database link (with a loopback link) and set up a define identifying it at the line where I’ve got orcl@loopback and test@loopback (which are my 12c and 11g loopback links respectively).

Here’s the plan (with rowsource stats) I got from the 12c test:


----------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation             | Name     | Starts | E-Rows |    TQ  |IN-OUT| PQ Distrib | A-Rows |   A-Time   | Buffers | Reads  |
----------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT      |          |      1 |        |        |      |            |      0 |00:01:14.67 |       7 |      0 |
|   1 |  NESTED LOOPS         |          |      1 |     10 |        |      |            |      0 |00:01:14.67 |       7 |      0 |
|   2 |   PX COORDINATOR      |          |      1 |        |        |      |            |  10000 |00:00:00.11 |       7 |      0 |
|   3 |    PX SEND QC (RANDOM)| :TQ10000 |      0 |   1000 |  Q1,00 | P->S | QC (RAND)  |      0 |00:00:00.01 |       0 |      0 |
|   4 |     PX BLOCK ITERATOR |          |      3 |   1000 |  Q1,00 | PCWC |            |  10000 |00:03:17.72 |    1745 |   1667 |
|*  5 |      TABLE ACCESS FULL| T1       |     39 |   1000 |  Q1,00 | PCWP |            |  10000 |00:00:00.06 |    1745 |   1667 |
|   6 |   REMOTE              | T2       |  10000 |      1 |        |      |            |      0 |00:01:14.44 |       0 |      0 |
----------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   5 - access(:Z>=:Z AND :Z<=:Z)
       filter(MOD("T1"."ID",10)=0)

Remote SQL Information (identified by operation id):
----------------------------------------------------
   6 - SELECT /*+ SHARED (3) */ "ID","V1" FROM "T2" "T2" WHERE "ID"=:1 AND MOD(TO_NUMBER("V1"),10)=1
       (accessing 'ORCL@LOOPBACK' )

Note
-----
   - Degree of Parallelism is 3 because of hint

I have hacked this output a little – the “Remote SQL” section didn’t get reported by display_cursor(), so I’ve inserted the remote sql I got from a call to dbms_xplan.display() after using explain plan to generate a plan. Note the /*+ shared(3) */ hint that appears in the remote SQL – that’s the internal version of a parallel(3) hint.

In 11g the query complete in 2.4 seconds, in 12c the query took nearly 75 seconds to run thanks to the 12c enhancement that allowed it to obey the hint! Looking at the time column (and ignoring the anomalous 3:17 at operation 4 – which might roughly be echoing 3 * 1:14) we can see that the time goes on the calls to the remote database (and a check of v$session_event shows this time spent in “SQL*Net message from db link”), so the obvious thing to do is check what actually happened at the remote database and we can do that by searching the library cache for a recognizable piece of the remote SQL – here’s the SQL to do that, with the results from 11g followed by the results from 12c:


SQL> select sql_id, child_number, executions, px_servers_executions, sql_text from v$sql
  2  where sql_text like '%SHARED%' and sql_text not like 'select sql_id%';

11g results
SQL_ID        CHILD_NUMBER EXECUTIONS PX_SERVERS_EXECUTIONS
------------- ------------ ---------- ---------------------
SQL_TEXT
------------------------------------------------------------------------------------------------------------------------------------
c0f292z5czhwk            0      10000                     0
SELECT /*+ SHARED (3) */ "ID","V1" FROM "T2" "T2" WHERE MOD(TO_NUMBER("V1"),10)=1 AND "ID"=:1


12c results
SQL_ID        CHILD_NUMBER EXECUTIONS PX_SERVERS_EXECUTIONS
------------- ------------ ---------- ---------------------
SQL_TEXT
------------------------------------------------------------------------------------------------------------------------------------
7bk51w7vtagwd            0      10000                     0
SELECT /*+ SHARED (3) */ "ID","V1" FROM "T2" "T2" WHERE "ID"=:1 AND MOD(TO_NUMBER("V1"),10)=1

7bk51w7vtagwd            1          0                 59995
SELECT /*+ SHARED (3) */ "ID","V1" FROM "T2" "T2" WHERE "ID"=:1 AND MOD(TO_NUMBER("V1"),10)=1

It’s not surprising to see that the query has executed 10,000 times – that’s what we were told by the Starts statistic from dbms_output.display_cursor(), but 12c has 60,000 (with a little error) PX Servers executions of the statement. That’s 10,000 executions * degree 3 * the 2 slave sets we saw in my original execution plan. (It’s an odd little quirk of the two versions of Oracle that the order of predicates in the remote SQL was reversed between 11g and 12c – leading to two different SQL_IDs).

By enabling rowsource execution stats at the system level I was able to capture the remote execution plan with its stats:


SQL_ID  7bk51w7vtagwd, child number 0
-------------------------------------
SELECT /*+ SHARED (3) */ "ID","V1" FROM "T2" "T2" WHERE "ID"=:1 AND
MOD(TO_NUMBER("V1"),10)=1

--------------------------------------------------------------------------------------------------------
| Id  | Operation                         | Name     | Starts | E-Rows | A-Rows |   A-Time   | Buffers |
--------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                  |          |      0 |        |      0 |00:00:00.01 |       0 |
|   1 |  PX COORDINATOR                   |          |      0 |        |      0 |00:00:00.01 |       0 |
|   2 |   PX SEND QC (RANDOM)             | :TQ10001 |      0 |      1 |      0 |00:00:00.01 |       0 |
|*  3 |    TABLE ACCESS BY INDEX ROWID    | T2       |  29983 |      1 |      0 |00:00:22.21 |    1000 |
|   4 |     BUFFER SORT                   |          |  29995 |        |    999 |00:00:21.78 |       0 |
|   5 |      PX RECEIVE                   |          |  29924 |      1 |    994 |00:00:21.21 |       0 |
|   6 |       PX SEND HASH (BLOCK ADDRESS)| :TQ10000 |      0 |      1 |      0 |00:00:00.01 |       0 |
|   7 |        PX SELECTOR                |          |  29993 |        |    999 |00:00:06.08 |   19992 |
|*  8 |         INDEX UNIQUE SCAN         | T2_PK    |  29999 |      1 |   1000 |00:00:00.24 |   20000 |
--------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   3 - filter(MOD(TO_NUMBER("V1"),10)=1)
   8 - access("ID"=:1)

Unlike the test case I started with, this output did show the number of starts (with a few missing) and the work done across the slaves. Our index probe had to do two buffer gets on every execution, and we have 10,000 executions of the query so 20,000 buffer gets on the index unique scan. Even though only one slave actually does any work with the PX Selector, all three slaves in that set seem to “start” the relevant operations. The definition of the data meant that only one index probe in 10 was successful so we only visited 1,000 buffers from the table. If you’re familiar with reading plans with rowsource execution stats you’ll appreciate that something has gone wrong in the reporting here – that 1,000 at operation 3 should read 21,000 because it ought to include the 20,000 from the index scan (at least, that’s what a serial plan would do).

If you’re still wondering why running this query as a parallel query should take so long – after all it’s only 10,000 executions in 70 seconds – bear in mind that Oracle has to allocate and deallocate 6 PX servers to new sessions each time it starts; the instance activity stats showed “logons cumulative” going up by 60,000 each time I ran the driving query: that’s about 850 logons (and log offs) per second. I don’t think my test machine would give a realistic impression of the impact of a couple of copies of this query running simultaneously, but when I tried the contention introduce increased the run time to 93 seconds.

tl;dr

Watch out for poor performance becoming disastrous for distributed parallel queries when you upgrade from 11g to 12c

 

 

.

June 7, 2016

Quiz Night

Filed under: 12c,Execution plans,Oracle,subqueries — Jonathan Lewis @ 7:35 pm GMT Jun 7,2016

Here’s an execution plan from a recent OTN database forum posting:

 
------------------------------------------------------------------------------------------------
| Id  | Operation                 | Name               | Rows  | Bytes | Cost (%CPU)| Time     |
------------------------------------------------------------------------------------------------
|   0 | UPDATE STATEMENT          |                    |     1 |   231 |   160   (6)| 00:00:02 |
|   1 |  UPDATE                   | GS_TABLE           |       |       |            |          |
|*  2 |   HASH JOIN SEMI          |                    |     1 |   231 |   130   (0)| 00:00:02 |
|*  3 |    TABLE ACCESS FULL      | GS_TABLE           |     5 |   895 |   123   (0)| 00:00:02 |
|*  4 |    TABLE ACCESS FULL      | UPDATEDPROGRAMCODE |   850 | 44200 |     7   (0)| 00:00:01 |
|*  5 |   VIEW                    |                    |    11 |  2024 |     8  (13)| 00:00:01 |
|*  6 |    WINDOW SORT PUSHED RANK|                    |    11 |   440 |     8  (13)| 00:00:01 |
|*  7 |     FILTER                |                    |       |       |            |          |
|*  8 |      TABLE ACCESS FULL    | UPDATEDPROGRAMCODE |    11 |   440 |     7   (0)| 00:00:01 |
|   9 |   VIEW                    |                    |   850 |  1138K|     9  (23)| 00:00:01 |
|  10 |    SORT ORDER BY          |                    |   850 |   685K|     9  (23)| 00:00:01 |
|* 11 |     VIEW                  |                    |   850 |   685K|     8  (13)| 00:00:01 |
|  12 |      WINDOW SORT          |                    |   850 | 47600 |     8  (13)| 00:00:01 |
|* 13 |       FILTER              |                    |       |       |            |          |
|* 14 |        TABLE ACCESS FULL  | UPDATEDPROGRAMCODE |   850 | 47600 |     7   (0)| 00:00:01 |
------------------------------------------------------------------------------------------------

Unfortunately the originator of this plan declined to show us the query or answer any questions about where the work was going, but did confirm a speculative comment I had made that the instance was 12c. So the question is this: can you spot what it was that made me think that the plan came from 12c ?

I have to say, by the way, that there may be ways to get this plan from 11g, it was just that my first impression was that it was probably 12c and I didn’t attempt to come up with a way of getting a similar plan from 11g. (And, as far as the general shape of the plan is concerned, I can think of two different types of query that could produce it.)

Footnote

You are allowed to prove me wrong.

Answer

(Which might be me showing ignorance rather than inspiration)

The basic shape of the plan suggests to me that the query is of the form:

update gs_table 
set     col1 = (select from updatedprogramcode),
        col2 = (select from updatedprogramcode)
where   exists (select from updatedprogramcode)         -- possibly "where IN (subquery)"
;

There are a couple of variations in how the “set” subquery content might vary, and I’ll write up a short blog about that later.

Having noted this basic shape, I then noted that the subqueries involved analytic functions – as indicated by the WINDOW SORT operations; moreover one of them used a PUSHED RANK option and the other was embedded in a non-mergeable VIEW (operation 11). Updates with subqueries generally involve correlated columns – and prior to 12c there are some restrictions on how far up the tree the correlation can go. Here’s a sample query (using two tables that I’ve cloned from all_objects) to demonstrate:


update t1 set
        data_object_id = (
                select  objno
                from    (
                        select
                                object_id objno,
                                row_number() over (order by  object_id desc) rn
                        from
                                t2
                        where
                                t2.object_type = t1.object_type
                        )
                where rn = 1
        )
/

We need to embed the inner select statement in an inline view because we want to use the result of the row_number() analytic function in a filter predicate, but in Oracle 11g the reference to t1.object_id can’t correlate back to the outer t1 table, while in 12c this isn’t a problem. Here’s the 12c plan, followed by the 11g error:


12c Plan (autotrace)
----------------------------------------------------------------------------------
| Id  | Operation                 | Name | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------
|   0 | UPDATE STATEMENT          |      |  5000 | 45000 | 70012  (15)| 00:04:34 |
|   1 |  UPDATE                   | T1   |       |       |            |          |
|   2 |   TABLE ACCESS FULL       | T1   |  5000 | 45000 |    12   (0)| 00:00:01 |
|*  3 |   VIEW                    |      |     1 |    26 |    13   (8)| 00:00:01 |
|*  4 |    WINDOW SORT PUSHED RANK|      |   556 |  6116 |    13   (8)| 00:00:01 |
|*  5 |     TABLE ACCESS FULL     | T2   |   556 |  6116 |    12   (0)| 00:00:01 |
----------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   3 - filter("RN"=1)
   4 - filter(ROW_NUMBER() OVER ( ORDER BY INTERNAL_FUNCTION("OBJECT_ID")
              DESC )<=1)
   5 - filter("T2"."OBJECT_TYPE"=:B1)


11g Error
---------
                                t2.object_type = t1.object_type
                                                 *
ERROR at line 11:
ORA-00904: "T1"."OBJECT_TYPE": invalid identifier

Notice, by the way that my predicate “rn = 1” has resulted in the WINDOW SORT PUSHED RANK that appeared in the original plan.

In case I haven’t said it enough times: this is a just a rapid inference I drew from looking briefly at the plan and I haven’t tried hard to work out whether there is a way to get a plan like this in 11g. It was nice being proved right by the follow-up post from the OP, but my guess may have been right by accident – I’d rather be proved wrong than carry on thinking I’d got it right when I hadn’t … so feel free to supply an example in the comments.

 

May 11, 2016

dbms_xplan

Filed under: dbms_xplan,Execution plans,Oracle,Parallel Execution — Jonathan Lewis @ 12:22 pm GMT May 11,2016

My favourite format options for dbms_xplan.display_cursor().

This is another of those posts where I tell you about something that I’ve frequently mentioned but never documented explicitly as a good (or, at least, convenient) idea. It also another example of how easy it is to tell half the story most of the time when someone asks a “simple” question.

You’re probably familiar with the idea of “tuning by cardinality feedback” – comparing the predicted data volumes with the actual data volumes from an execution plan – and I wrote a short note about how to make that comparison last week; and you’re probably familiar with making a call to dbms_xplan.display_cursor() after enabling the capture of rowsource execution statistics (in one of three ways) for the execution of the query, and the format parameter usually suggested for the call is ‘allstats last’ to get the execution stats for the most recent execution of the query. I actually like to see the Cost column of the execution plan as well, so I usually add that to the format, so (with all three strategies shown for an SQL*Plus environment):

set linesize 180
set trimspool on
set pagesize 60
set serveroutput off

alter session set "_rowsource_execution_statistics"=true;
alter session set statistics_level=all;

select /*+ gather_plan_statistics */ * from user_tablespaces;

select * from table(dbms_xplan.display_cursor(null,null,'allstats last cost'));

So what do we often forget to mention:

  • For SQL*Plus it is important to ensure that serveroutput is off
  • The /*+ gather_plan_statistics */ option uses sampling, so may be a bit inaccurate
  • The two accurate strategies may add a significant, sometimes catastrophic, amount of CPU overhead
  • This isn’t appropriate if the query runs parallel

For a parallel query the “last” execution of a query is typically carried out by the query co-ordinator, so the rowsource execution stats of many (or all) of the parallel execution slaves are likely to disappear from the output. If you’re testing with parallel queries you need to add some “tag” text to the query to make it unique and omit the ‘last’ option from the format string.

Now, a common suggestion is that you need to add the ‘all’ format option instead – but this doesn’t mean “all executions” it means (though doesn’t actually deliver) all the data that’s available about the plan. So here’s an execution plans produced after running a parallel query and using ‘allstats all’ as the format option (t1 is a copy of all_objects, and this demo is running on 12.1.0.2).

SQL_ID  51u5j42rvnnfg, child number 1
-------------------------------------
select  /*+   parallel(2)  */  object_type,  sum(object_id) from t1
group by object_type order by object_type

Plan hash value: 2919148568

------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                | Name     | Starts | E-Rows |E-Bytes| Cost (%CPU)| E-Time   |    TQ  |IN-OUT| PQ Distrib | A-Rows |   A-Time   | Buffers | Reads  |  OMem |  1Mem |  O/1/M   |
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT         |          |      1 |        |       |   113 (100)|          |        |      |            |     30 |00:00:00.04 |       5 |      0 |       |       |          |
|   1 |  PX COORDINATOR          |          |      1 |        |       |            |          |        |      |            |     30 |00:00:00.04 |       5 |      0 |       |       |          |
|   2 |   PX SEND QC (ORDER)     | :TQ10001 |      0 |     30 |   420 |   113   (9)| 00:00:01 |  Q1,01 | P->S | QC (ORDER) |      0 |00:00:00.01 |       0 |      0 |       |       |          |
|   3 |    SORT GROUP BY         |          |      2 |     30 |   420 |   113   (9)| 00:00:01 |  Q1,01 | PCWP |            |     30 |00:00:00.01 |       0 |      0 |  2048 |  2048 |     2/0/0|
|   4 |     PX RECEIVE           |          |      2 |     30 |   420 |   113   (9)| 00:00:01 |  Q1,01 | PCWP |            |     50 |00:00:00.01 |       0 |      0 |       |       |          |
|   5 |      PX SEND RANGE       | :TQ10000 |      0 |     30 |   420 |   113   (9)| 00:00:01 |  Q1,00 | P->P | RANGE      |      0 |00:00:00.01 |       0 |      0 |       |       |          |
|   6 |       HASH GROUP BY      |          |      2 |     30 |   420 |   113   (9)| 00:00:01 |  Q1,00 | PCWP |            |     50 |00:00:00.05 |    1492 |   1440 |  1048K|  1048K|     2/0/0|
|   7 |        PX BLOCK ITERATOR |          |      2 |  85330 |  1166K|   105   (2)| 00:00:01 |  Q1,00 | PCWC |            |  85330 |00:00:00.03 |    1492 |   1440 |       |       |          |
|*  8 |         TABLE ACCESS FULL| T1       |     26 |  85330 |  1166K|   105   (2)| 00:00:01 |  Q1,00 | PCWP |            |  85330 |00:00:00.01 |    1492 |   1440 |       |       |          |
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Query Block Name / Object Alias (identified by operation id):
-------------------------------------------------------------

   1 - SEL$1
   8 - SEL$1 / T1@SEL$1

Predicate Information (identified by operation id):
---------------------------------------------------

   8 - access(:Z>=:Z AND :Z<=:Z)

Column Projection Information (identified by operation id):
-----------------------------------------------------------

   1 - "OBJECT_TYPE"[VARCHAR2,23], SUM()[22]
   2 - (#keys=0) "OBJECT_TYPE"[VARCHAR2,23], SUM()[22]
   3 - (#keys=1; rowset=200) "OBJECT_TYPE"[VARCHAR2,23], SUM()[22]
   4 - (rowset=200) "OBJECT_TYPE"[VARCHAR2,23], SYS_OP_MSR()[25]
   5 - (#keys=1) "OBJECT_TYPE"[VARCHAR2,23], SYS_OP_MSR()[25]
   6 - (rowset=200) "OBJECT_TYPE"[VARCHAR2,23], SYS_OP_MSR()[25]
   7 - (rowset=200) "OBJECT_ID"[NUMBER,22], "OBJECT_TYPE"[VARCHAR2,23]
   8 - (rowset=200) "OBJECT_ID"[NUMBER,22], "OBJECT_TYPE"[VARCHAR2,23]

Note
-----
   - Degree of Parallelism is 2 because of hint


48 rows selected.

You’ll notice we’ve reported the “alias” and “projection” information – those are two of the format options that you can use with a + or – to include or exclude if you want. We’ve also got E-Bytes and E-time columns in the body of the plan. In other words (at least in my opinion) we’ve got extra information that makes the output longer and wider and therefore harder to read.

The format string I tend to use for parallel query is ‘allstats parallel cost’ – which (typically) gives something like the following:

SQL_ID  51u5j42rvnnfg, child number 1
-------------------------------------
select  /*+   parallel(2)  */  object_type,  sum(object_id) from t1
group by object_type order by object_type

Plan hash value: 2919148568

-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                | Name     | Starts | E-Rows | Cost (%CPU)|    TQ  |IN-OUT| PQ Distrib | A-Rows |   A-Time   | Buffers | Reads  |  OMem |  1Mem |  O/1/M   |
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT         |          |      1 |        |   113 (100)|        |      |            |     30 |00:00:00.04 |       5 |      0 |       |       |          |
|   1 |  PX COORDINATOR          |          |      1 |        |            |        |      |            |     30 |00:00:00.04 |       5 |      0 |       |       |          |
|   2 |   PX SEND QC (ORDER)     | :TQ10001 |      0 |     30 |   113   (9)|  Q1,01 | P->S | QC (ORDER) |      0 |00:00:00.01 |       0 |      0 |       |       |          |
|   3 |    SORT GROUP BY         |          |      2 |     30 |   113   (9)|  Q1,01 | PCWP |            |     30 |00:00:00.01 |       0 |      0 |  2048 |  2048 |     2/0/0|
|   4 |     PX RECEIVE           |          |      2 |     30 |   113   (9)|  Q1,01 | PCWP |            |     50 |00:00:00.01 |       0 |      0 |       |       |          |
|   5 |      PX SEND RANGE       | :TQ10000 |      0 |     30 |   113   (9)|  Q1,00 | P->P | RANGE      |      0 |00:00:00.01 |       0 |      0 |       |       |          |
|   6 |       HASH GROUP BY      |          |      2 |     30 |   113   (9)|  Q1,00 | PCWP |            |     50 |00:00:00.05 |    1492 |   1440 |  1048K|  1048K|     2/0/0|
|   7 |        PX BLOCK ITERATOR |          |      2 |  85330 |   105   (2)|  Q1,00 | PCWC |            |  85330 |00:00:00.03 |    1492 |   1440 |       |       |          |
|*  8 |         TABLE ACCESS FULL| T1       |     26 |  85330 |   105   (2)|  Q1,00 | PCWP |            |  85330 |00:00:00.01 |    1492 |   1440 |       |       |          |
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   8 - access(:Z>=:Z AND :Z<=:Z)

Note
-----
   - Degree of Parallelism is 2 because of hint


30 rows selected.

Of course you may prefer ‘allstats all’ – and sometimes I do actually want to see the alias or projection information – but I think there’s so much information available on the execution plan output that anything that makes it a little shorter, cleaner and tidier is a good thing.

You might have noticed, by the way, that the Buffers, Reads, and A-Time columns have still managed to lose information on the way up from operation 6; information that should have been summing up the plan has simply disappeared.  Make sure you do a sanity check for disappearing numbers when you’re looking at more complex plans.

 

May 5, 2016

E-rows / A-rows

Filed under: Execution plans,Oracle — Jonathan Lewis @ 1:26 pm GMT May 5,2016

A recent post on the OTN database forum reminded me how easy it is to forget to keep repeating a piece of information after the first couple of hundred times you’ve explained it. No matter how “intuitively obvious” it is for one person, it’s new to someone else.

Here’s an execution plan that raised the question that prompted this note – it comes from calling dbms_xplan.display_cursor() with the ‘allstats last’ format option after enabling rowsource execution statisics (using hint gather_plan_statistics, or setting parameter statistics_level to all, or setting hidden parameter “_rowsource_execution_statistics” to true):

-----------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                             | Name                   | Starts | E-Rows | A-Rows |   A-Time   | Buffers | Reads  |
-----------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                      |                        |      1 |        |   1989 |00:00:04.96 |    9280 |    897 |
|   1 |  NESTED LOOPS OUTER                   |                        |      1 |   2125 |   1989 |00:00:04.96 |    9280 |    897 |
|   2 |   NESTED LOOPS OUTER                  |                        |      1 |   2125 |   1989 |00:00:04.93 |    9271 |    895 |
|   3 |    NESTED LOOPS OUTER                 |                        |      1 |   2125 |   1989 |00:00:00.03 |    5732 |      0 |
|   4 |     COLLECTION ITERATOR PICKLER FETCH |                        |      1 |   1989 |   1989 |00:00:00.01 |       0 |      0 |
|*  5 |     TABLE ACCESS BY INDEX ROWID       | TABLE1                 |   1989 |      1 |   1178 |00:00:00.03 |    5732 |      0 |
|*  6 |      INDEX RANGE SCAN                 | IDX_TABLE1             |   1989 |      2 |   2197 |00:00:00.02 |    3545 |      0 |
|   7 |    TABLE ACCESS BY INDEX ROWID        | TABLE2                 |   1989 |      1 |   1178 |00:00:03.26 |    3539 |    895 |
|*  8 |     INDEX UNIQUE SCAN                 | IDX_TABLE2_PK          |   1989 |      1 |   1178 |00:00:03.25 |    2359 |    895 |
|   9 |   TABLE ACCESS BY INDEX ROWID         | TABLE3                 |   1989 |      1 |      0 |00:00:00.03 |       9 |      2 |
|* 10 |    INDEX UNIQUE SCAN                  | IDX_TABLE3_PK          |   1989 |      1 |      0 |00:00:00.03 |       9 |      2 |
-----------------------------------------------------------------------------------------------------------------------------------

Many people have heard about “tuning by cardinality feedback” (a term first used, I think, by Wolfgang Breitling many years ago), and it’s fairly common knowledge that this requires you to compare the predicted (estimated) E-rows with the actual A-rows generated by a line of the plan. A critical detail that often fails to appear when this point is being explained is that there is a very important difference between the meaning of E-rows and A-rows. Although this is a point I make very clear in my presentations on how to read execution plans, it was only when I found myself explaining it to Tony Hasler a couple of years ago [ed: see comment 3 – at least 6 years ago!] that I realised that it’s not commonly known and that, in casual conversation, it’s easy to overlook the fact that it’s something that you might need to mention explicitly. So here’s the critical detail:

  • E-Rows is the number of rows the optimizer is expecting the operation to produce each time it is called
  • A-Rows is the cumulative number of rows summed across all calls to the operation

As a consequence of this difference any comparison you make should not be comparing A-Rows with E-rows, it should be comparing A-Rows with E-Rows * Starts so, for example,  operation 6 in the plan above (index range scan idx_table1) shows a reasonable estimate because 1,989 starts at 2 rows per start is fairly close to the final 2,197 rows produced (especially when you consider that there’s a lot of integer rounding going on to display the values).

 

Next Page »

Powered by WordPress.com.