Oracle Scratchpad

February 15, 2007

Index not used (10g)

Filed under: CBO,Troubleshooting — Jonathan Lewis @ 11:50 pm GMT Feb 15,2007

Here’s an example of why execution plans can change after an upgrade to 10gR2.

I’ve created a table with an artificially exaggerated problem – dependent columns – that highlights some inconsistencies in the optimizer’s behaviour. Here’s the code:

create table t1
as
select
	mod(rownum,20) 		n1,
	mod(rownum,20)		n2,
	rpad('x',40,'x')	ind_pad,
	rpad('x',200)		padding
from
	all_objects
where
	rownum  <= 4000
;

create index t1_i1 on t1(n1, ind_pad, n2)
pctfree 91
;

-- collect statistics at this point

Note that there are 20 possible pairs of values for (n1, n2) – ignore the ind_pad column, it’s just something that I’ve used to increase the size of the index at the branch level without changing the arithmetic.

Here’s the query that I want to demonstrate:

select
	padding
from
	t1
where
	n1	= 2
and	ind_pad	= rpad('x',40,'x')
and	n2	= 2
;

From the optimizer’s perspective, the combined selectivity of the three predicates is 1/400 (by multiplying 1/20, 1, and 1/20), which results in a predicted cardinality of 10. Here’s the autotrace execution plan from 9.2.0.8 (with CPU costing disabled).

Execution Plan
----------------------------------------------------------
   0      SELECT STATEMENT Optimizer=ALL_ROWS (Cost=12 Card=10 Bytes=2470)
   1    0   TABLE ACCESS (BY INDEX ROWID) OF 'T1' (TABLE) (Cost=12 Card=10 Bytes=2470)
   2    1     INDEX (RANGE SCAN) OF 'T1_I1' (INDEX) (Cost=4 Card=10)

And here’s the plan from 10.2.0.1 (again with CPU costing disabled).

----------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost  |
----------------------------------------------------------
|   0 | SELECT STATEMENT  |      |    10 |  2470 |    23 |
|*  1 |  TABLE ACCESS FULL| T1   |    10 |  2470 |    23 |
----------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter("N1"=2 AND "IND_PAD"='xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx' AND "N2"=2)

Oops!

Let’s try hinting the code with an index() hint to see what we get, as that might give us a clue about why the change in plan has appeared :

---------------------------------------------------------------------
| Id  | Operation                   | Name  | Rows  | Bytes | Cost  |
---------------------------------------------------------------------
|   0 | SELECT STATEMENT            |       |    10 |  2470 |   168 |
|   1 |  TABLE ACCESS BY INDEX ROWID| T1    |    10 |  2470 |   168 |
|*  2 |   INDEX RANGE SCAN          | T1_I1 |   200 |       |    25 |
---------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("N1"=2 AND "IND_PAD"='xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx' AND "N2"=2)

Note, especially, the increase in the cardinality predicted for the index, with the increased cost of scanning the index, and the massive increase in the cost of visiting the table.

What’s happened is this: for the purposes of scanning the index, the optimizer has used the number of distinct keys (user_indexes.distinct_keys) to determine how much of the index will be scanned, and how many rowids identified. This has increased the cost and cardinality of the index line quite dramatically – and since it is the “index selectivity with filtering” multiplied by the clustering_factor that dictates the cost of the table access by index, the total cost of the query has gone up as well. However, by the time that the optimizer gets down to the table, the calculation has fallen back to the “product of columns” style of calculation – so we end up with a prediction of a high cost returning a few rows.

In theory, of course, the new cost is a better estimate than the old – but for some people this change could be bad news at least for some queries.

Apart from the explicit change of plan shown in this trivial example, consider the impact this will have on more complex queries: if one step of the query is suddenly much more expensive, and only returns a few rows, the optimizer is quite likely to move that step to an earlier point in the plan to ensure that the cost is incurred as few times as possible.

General guideline for the upgrade to 10gR2: watch out for queries that contain predicates on every column of a multi-column non-unique index; their execution plans may change.

Update (Mar 2008): This note was one I had forgotten about when I wrote a newer version of the same thing, highlighting the next step in the evolution of the execution plan when I tried the same thing with 11g.

12 Comments »

  1. Nice. Is this behavior due to absence of system statistics?

    Comment by Arul R — February 16, 2007 @ 2:06 am GMT Feb 16,2007 | Reply

  2. Arul, no; I simply omitted system statistics to make it a little easier for people to see the arithmetic working itself out.

    Comment by Jonathan Lewis — February 16, 2007 @ 6:21 am GMT Feb 16,2007 | Reply

  3. Is this a case of correlated columns? Would dynamic sampling help?

    Comment by DJ — February 16, 2007 @ 2:44 pm GMT Feb 16,2007 | Reply

  4. “Nice. Is this behavior due to absence of system statistics?”

    Arul, in fact the problem is in the _presence_ of statistics (for _this specific_ example) and default optimizer_dynamic_sampling (which is 2).

    Dynamic sampling at level 2 _without_ stats picks cardinality quite right:

    SQL&gt; select
      2   padding
      3  from
      4   t1
      5  where
      6   n1 = 2
      7  and ind_pad = rpad('x',40,'x')
      8  and n2 = 2
      9  ;
    
    Execution Plan
    ----------------------------------------------------------
    Plan hash value: 3617692013
    
    --------------------------------------------------------------------------
    | Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
    --------------------------------------------------------------------------
    |   0 | SELECT STATEMENT  |      |   200 | 30000 |    33   (0)| 00:00:01 |
    |*  1 |  TABLE ACCESS FULL| T1   |   200 | 30000 |    33   (0)| 00:00:01 |
    --------------------------------------------------------------------------
    
    Predicate Information (identified by operation id):
    ---------------------------------------------------
    
       1 - filter("N1"=2 AND "IND_PAD"='xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
                  xxxx' AND "N2"=2)
    
    Note
    -----
       - dynamic sampling used for this statement
    

    “Is this a case of correlated columns? Would dynamic sampling help?”

    With stats and dynamic sampling at level 4 the reporting cardinality were 188 (much better than 10 but still a little incorrect). The correct cardinality (200) were reported only at level 7.

    P.S. 10.2.0.3

    Comment by Alexander Fatkulin — February 17, 2007 @ 9:02 am GMT Feb 17,2007 | Reply

  5. DJ, I’ve used the expression “dependent columns” in the note – but “correlated columns” is an equally valid way of describing the issue.

    Alex, I believe Arul was thinking of the statistics created by calls to dbms_stats.gather_system_stats() – i.e. the CPU speed, and disk I/O speeds.

    DJ/Alex: the question of dynamic sampling does depend on which problem you are trying to solve. The fact that you can get 10g to calculate the correct table selectivity by dynamic sampling isn’t (necessarily) an advantage in this case.

    The point I was making was that after an upgrade there was a reason why some of your execution plans could change. As a consequence of the incomplete treatment, though, the new cost and cardinality are not self-consistent. Making 10g work out the correct cardinality isn’t necessarily going to get you back to the 9i execution plan – because now the optimizer has two reasons for doing something different.

    Comment by Jonathan Lewis — February 17, 2007 @ 11:33 am GMT Feb 17,2007 | Reply

  6. Jonathan,

    yes I do realize that Arul mentioned system stats (sreadtim, mreadtim, cpuspeed, etc.). I’ve just focused on incorrectly reported cardinality as a self-contained problem (sorry if that distracts from your original points).

    By the way – looks like the index cardinality also depends on how stats were collected in term of histograms…

    SQL&gt; begin
      2   dbms_stats.gather_table_stats(
      3    ownname =&gt; user,
      4    tabname =&gt; 't1',
      5    method_opt =&gt; 'for all columns size 1',
      6    cascade =&gt; true
      7   );
      8  end;
      9  /
    
    PL/SQL procedure successfully completed.
    
    SQL&gt; select /*+ first_rows */ padding
     2   from t1
     3   where n1 = 2
     4    and ind_pad = rpad('x',40,'x')
     5    and n2 = 2;
    
    xecution Plan
    ---------------------------------------------------------
    lan hash value: 1429545322
    
    ------------------------------------------------------------------------------------
     Id  | Operation                   | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
    ------------------------------------------------------------------------------------
       0 | SELECT STATEMENT            |       |    10 |  2470 |   168   (0)| 00:00:03 |
       1 |  TABLE ACCESS BY INDEX ROWID| T1    |    10 |  2470 |   168   (0)| 00:00:03 |
    *  2 |   INDEX RANGE SCAN          | T1_I1 |   200 |       |    25   (0)| 00:00:01 |
    ------------------------------------------------------------------------------------
    
    redicate Information (identified by operation id):
    --------------------------------------------------
    
      2 - access("N1"=2 AND "IND_PAD"='xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx'
                  AND "N2"=2)
    
    SQL&gt; begin
      2   dbms_stats.gather_table_stats(
      3    ownname =&gt; user,
      4    tabname =&gt; 't1',
      5    method_opt =&gt; 'for all columns size 254',
      6    cascade =&gt; true
      7   );
      8  end;
      9  /
    
    PL/SQL procedure successfully completed.
    
    SQL&gt; select /*+ first_rows */ padding
      2   from t1
      3   where n1 = 2
      4    and ind_pad = rpad('x',40,'x')
      5    and n2 = 2;
    
    Execution Plan
    ----------------------------------------------------------
    Plan hash value: 1429545322
    
    -------------------------------------------------------------------------------------
    | Id  | Operation                   | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
    -------------------------------------------------------------------------------------
    |   0 | SELECT STATEMENT            |       |    10 |  2470 |    12   (0)| 00:00:01 |
    |   1 |  TABLE ACCESS BY INDEX ROWID| T1    |    10 |  2470 |    12   (0)| 00:00:01 |
    |*  2 |   INDEX RANGE SCAN          | T1_I1 |    10 |       |     4   (0)| 00:00:01 |
    -------------------------------------------------------------------------------------
    
    Predicate Information (identified by operation id):
    ---------------------------------------------------
    
       2 - access("N1"=2 AND "IND_PAD"='xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx'
                   AND "N2"=2)
    
    picking up the relative section from the 10053 trace:
    
    Access Path: index (AllEqRange)
        Index: T1_I1
        resc_io: 168.00  resc_cpu: 1278402
        ix_sel: 0.05  ix_sel_with_filters: 0.05
        Cost: 168.08  Resp: 168.08  Degree: 1
    
    Access Path: index (AllEqRange)
        Index: T1_I1
        resc_io: 12.00  resc_cpu: 89557
        ix_sel: 0.0024997  ix_sel_with_filters: 0.0024997
        Cost: 12.01  Resp: 12.01  Degree: 1
    

    Comment by Alexander Fatkulin — February 17, 2007 @ 12:22 pm GMT Feb 17,2007 | Reply

  7. Alex, interesting – the presence of a (redundant) histogram seems to block the new code path, whether it’s a “frequency” histogram (as in your example) or a “height-balanced” histogram.

    Comment by Jonathan Lewis — February 19, 2007 @ 8:57 am GMT Feb 19,2007 | Reply

  8. […] I had forgotten that I previously published this example (without the 11g change) some time […]

    Pingback by Everything Changes « Oracle Scratchpad — March 13, 2008 @ 1:31 pm GMT Mar 13,2008 | Reply

  9. Hi Jonathan,
    In which cases the index will not be used in explain plan? For example, when a funtion is used on the indexed column.

    Comment by Ratna Kishore — May 26, 2008 @ 5:29 am BST May 26,2008 | Reply

  10. John, i’m sorry if i’m posting my questions in the wrong thread.

    We have a Datawarehouse environment and are running incremental ETL’s as well as Reports from the same. We know that this is not an ideal thing to do and will move to separate reporting environments in future. I have couple of questions, which came up again and again as developers are complaining of the locking issues which are due to reports and etl at the same time. Can you give your comments on these questions:

    1) If a query is running and it access a table and at the same point a refresh job tries to truncate the same table will the truncate job wait till the query completes ?

    2) If a query is running and it access a table and at the same point a refresh job tries to drop an index that is used in the explain plan by the will the drop index job wait till the query completes ?

    3) If a query is running and it access a table and at the same point a refresh job tries to create an index on the same table that query is accessing will the create index statement wait till the query completes ?

    4) If a query is running and it access a table and at the same point a refresh job tries to recreate an index that is used n the explain plan by the query will the create index statement wait till the query completes ?

    Thanks,
    Gary

    Comment by Gary — May 26, 2010 @ 6:02 pm BST May 26,2010 | Reply

    • Gary,

      Can you give your comments on these questions:
      Certainly –
      1) It is easy to build models to discover the answers to all four questions – and the experience will help you practice your skills at identifying, analysing and solving problems.
      2) This isn’t a forum – it’s a blog. I don’t answer readers’ questions unless they happen to raise interesting points that are specifically related to the blog note that they’ve been attached to.
      3) If you want to make a special point of irritating me when you ask a question, make sure you don’t bother to get my name right.

      Comment by Jonathan Lewis — May 29, 2010 @ 10:19 am BST May 29,2010 | Reply


RSS feed for comments on this post. TrackBack URI

Comments and related questions are welcome.

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Website Powered by WordPress.com.