Oracle Scratchpad

August 31, 2016

Parallel_index hint

Filed under: Bugs,CBO,Oracle,Parallel Execution — Jonathan Lewis @ 1:28 pm GMT Aug 31,2016

Prompted by a recent OTN posting I’ve dug out from my library the following demonstration of an anomalty with the parallel_index() hint. This note is a warning about  how little we understand hints and what they’re supposed to mean, and how we can be caught out by an upgrade. We’ll start with a data set which, to match a comment made in the origina posting rather than being a necessity for the demonstration, has an index that I’ve manipulated to be larger than the underlying table:


rem
rem     Script:         parallel_index_hint_bug.sql
rem     Author:         Jonathan Lewis
rem     Dated:          December 2005
rem

create table t1
nologging
as
with generator as (
        select  --+ materialize
                rownum id
        from dual
        connect by
                level <= 1e4
)
select
        rownum                  id,
        mod(rownum,1e4)         modded,
        lpad(rownum,10,'0')     v1,
        lpad('x',30,'x')        padding
from
        generator       v1,
        generator       v2
where
        rownum <= 1e6
;

begin
        dbms_stats.gather_table_stats(
                ownname          => user,
                tabname          =>'T1',
                method_opt       => 'for all columns size 1'
        );
end;
/

create index t1_i1 on t1(modded) pctfree 75;
alter table t1 modify modded not null;

Your figures may vary slightly if you try to recreate this model, but according to my stats blocks=7876 for the table and leaf_blocks=8054 for the index. Now I’m going to generate the execution plans for a simple query – count(modded) from the table – with a varying selection of hints.


select /*+ index_ffs(t1 t1_i1) */ count(modded) from t1;

select /*+ parallel_index(t1 t1_i1 20) */ count(modded) from t1;

select /*+ index_ffs(t1 t1_i1) parallel_index(t1 t1_i1 20) */ count(modded) from t1;

select /*+ parallel_index(t1 t1_i1 20) parallel(t1 10) */ count(modded) from t1;

Pause for thought

  • which of the four statements will have a plan that uses an index fast full scan ?
  • which of the four statements will have a plan that indicates parallel execution ?
  • which of the four statements will indicate a parallel index fast full scan ?
  • why are two of the plans going to be identical but with different costs ?

The most interesting bit of this note is in the last question because it’s also the answer to a more subtle “why didn’t Oracle do what I thought it should” question. Here are the four plans I got from an instance of 11.2.0.4 (with a little labelling to remind us about the hinting):


==================================================
index_ffs() on it's own - get index fast full scan
==================================================

-----------------------------------------------------------------------
| Id  | Operation             | Name  | Rows  | Cost (%CPU)| Time     |
-----------------------------------------------------------------------
|   0 | SELECT STATEMENT      |       |     1 |  1043   (4)| 00:00:06 |
|   1 |  SORT AGGREGATE       |       |     1 |            |          |
|   2 |   INDEX FAST FULL SCAN| T1_I1 |  1000K|  1043   (4)| 00:00:06 |
-----------------------------------------------------------------------

=====================================================
parallel_index() on it's own
Get serial tablescan which is cheaper than serial FFS
=====================================================

-------------------------------------------------------------------
| Id  | Operation          | Name | Rows  | Cost (%CPU)| Time     |
-------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |     1 |  1027   (4)| 00:00:06 |
|   1 |  SORT AGGREGATE    |      |     1 |            |          |
|   2 |   TABLE ACCESS FULL| T1   |  1000K|  1027   (4)| 00:00:06 |
-------------------------------------------------------------------

========================================================
parallel_index() with index_ffs()
Get parallel fast full scan - at same cost as SERIAL FFS
========================================================

-----------------------------------------------------------------------------------------------------------
| Id  | Operation                 | Name     | Rows  | Cost (%CPU)| Time     |    TQ  |IN-OUT| PQ Distrib |
-----------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT          |          |     1 |  1043   (4)| 00:00:06 |        |      |            |
|   1 |  SORT AGGREGATE           |          |     1 |            |          |        |      |            |
|   2 |   PX COORDINATOR          |          |       |            |          |        |      |            |
|   3 |    PX SEND QC (RANDOM)    | :TQ10000 |     1 |            |          |  Q1,00 | P->S | QC (RAND)  |
|   4 |     SORT AGGREGATE        |          |     1 |            |          |  Q1,00 | PCWP |            |
|   5 |      PX BLOCK ITERATOR    |          |  1000K|  1043   (4)| 00:00:06 |  Q1,00 | PCWC |            |
|   6 |       INDEX FAST FULL SCAN| T1_I1    |  1000K|  1043   (4)| 00:00:06 |  Q1,00 | PCWP |            |
-----------------------------------------------------------------------------------------------------------

===============================================
parallel_index() with parallel()
Get parallel fast full scan - costed correctly.
(Not costed at the degree given for table).
===============================================

-----------------------------------------------------------------------------------------------------------
| Id  | Operation                 | Name     | Rows  | Cost (%CPU)| Time     |    TQ  |IN-OUT| PQ Distrib |
-----------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT          |          |     1 |    58   (4)| 00:00:01 |        |      |            |
|   1 |  SORT AGGREGATE           |          |     1 |            |          |        |      |            |
|   2 |   PX COORDINATOR          |          |       |            |          |        |      |            |
|   3 |    PX SEND QC (RANDOM)    | :TQ10000 |     1 |            |          |  Q1,00 | P->S | QC (RAND)  |
|   4 |     SORT AGGREGATE        |          |     1 |            |          |  Q1,00 | PCWP |            |
|   5 |      PX BLOCK ITERATOR    |          |  1000K|    58   (4)| 00:00:01 |  Q1,00 | PCWC |            |
|   6 |       INDEX FAST FULL SCAN| T1_I1    |  1000K|    58   (4)| 00:00:01 |  Q1,00 | PCWP |            |
-----------------------------------------------------------------------------------------------------------

First plan – we supplied the index_ffs() hint, it’s valid: so Oracle did an index fast full scan. The serial cost was 1,043.

Second plan – we supplied the parallel_index() hint: but the optimizer apparently didn’t consider the cost of a parallel index fast full scan (which ought to have cost something like (1043/20)/0.9 = 58; instead it chose the serial tablescan at a cost of 1027 (cheaper than a serial index fast full scan because of the way I had defined a very large index).

Third plan – we supplied the parallel_index() hint with an explicit index_ffs() hint: both were legal so the optimizer obeyed the hints and produced a parallel index fast full scan (which is what we might have hoped would happen automatically for the second query). But the cost of the query is 1,043 – the cost of the serial index fast full scan.

Final plan – we didn’t hint an index_ffs() we hinted parallel() and parallel_index(): “strangely” Oracle has selected the parallel index fast full scan – and this time the cost is 58 (which, as I pointed out above, is (1043/20)/0.9, the value it’s “supposed” to be). It looks as if Oracle uses the parallel scaling factor on the index only if the table is ALSO hinted (or defined) to be treated to parallel execution.

Note, by the way, that I’ve hinted the table parallel 10 so that the cost of the parallel tablescan (ca. (1027/10)/0.9 = 114) will be larger than the cost of the parallel index fast full scan. Normally you’re likely to have the same degree hinted for table and index, and the index is likely to be smaller than the table which means that if you see this anomaly in production it’s more likely to be a case of Oracle using a parallel tablescan when you’re expecting a parallel index fast full scan. Oracle will have compared the serial cost of an index fast full scan against the parallel cost of the table.

Footnote:

When I ran the script on 12c there was one important change. The cost of the “correctly costed” index fast full scan came out at 103 rather than 58. Oracle has used the degree from the parallel() hint for the tablescan to calculate the cost of the parallel index fast full scan. If you are unlucky enough to have some code that has conflicting degrees hinted (or defined) across tables and indexes then you may see some plans change because parallel index fast full scans suddenly change their cost.

Addendum (prompted by Randolf Geist’s comment): The plan in 12c also had a Note: “Degree of Parallelism is 4 because of table property”. This was a little odd because the degree of the table was 1 and the hint said 10, but the arithmetic of the tablescan had ( as indicated) used a DOP of 4 – despite reporting Degree=0 in the 10053 trace file. The only way I could get the number 4 was to look at the CPU parameters – bearing in mind the critical parallel parameters:

NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
cpu_count                            integer     2
parallel_threads_per_cpu             integer     2

parallel_adaptive_multi_user         boolean     TRUE
parallel_automatic_tuning            boolean     FALSE
parallel_degree_limit                string      CPU
parallel_degree_policy               string      MANUAL
parallel_io_cap_enabled              boolean     FALSE

The anomaly of “degree 10” appearing in the costing for the index fast full scan was echoed in the 10053 trace: the file reported “Degree: 20”, then showed the cost being divided by 10.

At run-time, though, the query DID run parallel 20 when hinted with both the parallel() and parallel_index() hints, and ran parallel 10 (while costing and repeating the note about parallel 4) when modified to have only the /*+ parallel(t1 10) */ hint.

2 Comments »

  1. Jonathan,

    > When I ran the script on 12c there was one important change. The cost of the “correctly costed” index fast full scan came out at 103 rather than 58. Oracle has used the degree from the parallel() hint for the tablescan to calculate the cost of the parallel index fast full scan.

    Given the fact that the index cost is larger than the table cost in your model, this means in this case you got a parallel TABLE ACCESS FULL in 12c when using PARALLEL and PARALLEL_INDEX hints only and both are costed using the same degree?

    Randolf

    Comment by Randolf Geist — August 31, 2016 @ 5:13 pm GMT Aug 31,2016 | Reply

    • Randolf,

      You’re right – if the tablescan was also costed parallel 10 then the tablescan path should have been chosen. I’ve updated the end of the note to point out that the tablescan claimed to limit itself to parallel 4, with the associated cost (which is why the tablescan plan didn’t appear).

      Comment by Jonathan Lewis — August 31, 2016 @ 5:40 pm GMT Aug 31,2016 | Reply


RSS feed for comments on this post. TrackBack URI

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Blog at WordPress.com.