Oracle Scratchpad

January 13, 2012

Quiz Night

Filed under: Hints,Indexing,Oracle — Jonathan Lewis @ 6:41 pm GMT Jan 13,2012

Browsing a little history recently I came across a note I’d written about the new-style index hint. In that note I claimed that:


… the index has to start with the columns (product_group, id) in that order – with preference given to an exact match, otherwise using the lowest cost index that starts the right way.

On reading this statement I suddenly realised that I hadn’t actually proved (to myself, even) that if I had the indexes (product_group, id) and (product_group, id, other_col) then a two-column hint forced Oracle to use the two column index in all (legal) circumstances.

So, tonight’s quiz – are there any edge cases, and what easy ways can you think of to prove (or disprove) the claim for the general case.

Footnote: you don’t have to demonstrate the method, just a brief outline of the idea will be sufficient.

Update Jan 19th

I’m sorry it’s taken so long to respond to this. It was a post that I pre-dated at the end of December, and I forgot that it would launch itself.

The most interesting comment, from my perspective, came from Valentin Nikotin – “what if the index you are hinting is in another schema?” Virtually every test I do is based on a single schema, I rarely use two, or more, schemas at the same time. So what if there’s a special case that somehow the hint has a precedence that assumes the indexes will be found by (in effect) querying user_ind_columns rather than all_ind_columns ? I wouldn’t expect this to be the case – but when you’re trying to do pre-emptive trouble-shooting it’s this type of case (i.e. “nobody does that sort of thing”) that you can easily overlook.

The other responents came up with the type of thing that I would call the correct strategy. We need to start with an SQL statement that uses the three-column index by default and uses the two-column index only when it’s hinted. Of course, if we can examine the 10053 trace and see the change, especially the restriction to the two-column index explicitly being labelled with something like “index demanded by user hint”, then we can be pretty confident that the interpretation is correct.

The question then is, how can we construct a suitable demonstration.

In general, if you have a three-column index that starts with the same columns in the same order as the two-column index then the three-column index will be bigger and have a higher clustering_factor so, from a purely arithmetic perspective, it will be less desirable than the two column index except for one special class of queries. Consider a query of the form:

select  col3
from    t1
where   col1 = {constant}
and     col2 = {constant}

With a suitable three-column index this query can be answered from completely within the index, with only the first two columns Oracle has to visit the table. So if we ensure that there are several rows scattered around the table we can be confident (in the general case) that the optimizer will see that the three-column query has a lower cost than the two-column query.

I don’t need to set up the model for you – Charles Hooper has already done the work.

Footnote 2:
Following Valentin Nikotin’s comment, you might like to consider that any reports you currently run against the user_indexes and user_ind_columns views (and their partition-related equivalents) should be run against the all_indexes and all_ind_columns views with a restriction of the form: “table_owner = user”.

12 Comments »

  1. Index with the same name that is owned by the other user.

    Comment by Valentin Nikotin — January 13, 2012 @ 9:53 pm GMT Jan 13,2012 | Reply

    • The Optimizer transforms the hint to /*+ INDEX (“T” “IND”) */ while it performs.
      After that it will consider all indexes having the same name.

      SQL> create user u1 identified by u1 quota unlimited on users;
      
      User created.
      
      SQL> create user u2 identified by u2 quota unlimited on users;
      
      User created.
      
      SQL> create table u1.abc(a int not null, b int, c int);
      
      Table created.
      
      SQL> create index u1.ind on u1.abc (a, b);
      
      Index created.
      
      SQL> create index u2.ind on u1.abc (a, b, c);
      
      Index created.
      
      SQL> insert into u1.abc select level, level, level from dual connect by level < 1e5;
      
      99999 rows created.
      
      SQL> commit;
      
      Commit complete.
      
      SQL> exec dbms_stats.gather_table_stats('u1','abc');
      
      PL/SQL procedure successfully completed.
      
      SQL> explain plan for select /*+ index(t(a, b))*/ * from u1.abc t;
      
      Explained.
      
      SQL> select * from table(dbms_xplan.display);
      
      PLAN_TABLE_OUTPUT
      --------------------------------------------------------------------------------
      Plan hash value: 1202921093
      
      -------------------------------------------------------------------------
      | Id  | Operation        | Name | Rows  | Bytes | Cost (%CPU)| Time     |
      -------------------------------------------------------------------------
      |   0 | SELECT STATEMENT |      | 99999 |  1464K|   323   (1)| 00:00:04 |
      |   1 |  INDEX FULL SCAN | IND  | 99999 |  1464K|   323   (1)| 00:00:04 |
      -------------------------------------------------------------------------
      
      8 rows selected.

      Comment by Valentin Nikotin — January 15, 2012 @ 3:52 pm GMT Jan 15,2012 | Reply

      • Valentin,
        Although Oracle reports an index name in the final execution plan output, if you check the outline section of the plan – dbms_xplan.display(null,null,’outline’) you will find that it is still thinking of the index in terms of an ordered list of columns.

        Comment by Jonathan Lewis — January 26, 2012 @ 9:48 pm GMT Jan 26,2012 | Reply

    • The another example of the situation when Oracle doesn’t perform correctly new-style hint, while old-style works fine, is a hint for table through a db link. The hint in the query on the remote side will have ??? instead of the index name:

      SQL> alter system flush shared_pool;
      
      System altered.
      
      SQL> select /*+ index(t (a b))*/ * from u1.abc@selflink t where rownum = 1;
      
               A          B          C
      ---------- ---------- ----------
             362        362        362
      
      SQL> select sql_text, executions from v$sql where sql_text like 'SELECT%ABC%';
      
      SQL_TEXT                                                                                        EXECUTIONS
      ----------------------------------------------------------------------------------------------- ----------
      SELECT /*+ FULL(P) +*/ * FROM "U1"."ABC" P                                                               0
      SELECT /*+ INDEX ("A1" ???) */ "A1"."A","A1"."B","A1"."C" FROM "U1"."ABC" "A1" WHERE ROWNUM=1            1
      
      SQL> alter system flush shared_pool;
      
      System altered.
      
      SQL> select /*+ index(t ind)*/ * from u1.abc@selflink t where rownum = 1;
      
               A          B          C
      ---------- ---------- ----------
               1          1          1
      
      SQL> select sql_text, executions from v$sql where sql_text like 'SELECT%ABC%';
      
      SQL_TEXT                                                                                        EXECUTIONS
      ----------------------------------------------------------------------------------------------- ----------
      SELECT /*+ FULL(P) +*/ * FROM "U1"."ABC" P                                                               0
      SELECT /*+ INDEX ("A1" "IND") */ "A1"."A","A1"."B","A1"."C" FROM "U1"."ABC" "A1" WHERE ROWNUM=1          1
      

      Comment by Valentin Nikotin — January 15, 2012 @ 8:57 pm GMT Jan 15,2012 | Reply

      • Valentin,

        That’s an interesting observation.
        The behaviour probably varies with version of Oracle, so it’s worth having a couple of simple test cases so that you can check what’s going on in each upgrade and patch.

        Comment by Jonathan Lewis — January 26, 2012 @ 9:42 pm GMT Jan 26,2012 | Reply

  2. Test with a SQL statement that doesn’t reference the columns in the two column index, eg select other_col from products order by other_col;
    A CBO (10053) trace on the this SQL without the hint shows evaluations of access paths for a table scan or using the three column index only.
    A CBO trace on the SQL with the hint shows “User hint to use this index” for the two column index and only shows access path options using the two column index.

    Comment by Ari — January 14, 2012 @ 9:14 am GMT Jan 14,2012 | Reply

  3. If you select product_group, id, other_col, in totality or not (but at most only those columns) and you hint to use the two columns index (product_group, id) for a query involving in its where clause only a predicate on other_col then the CBO might(depending on your data and how it is scattered) do an index FFS on the 3 columns index instead of the hinted 2 columns index

    Comment by Mohamed Houri — January 15, 2012 @ 9:17 am GMT Jan 15,2012 | Reply

  4. For what it is worth, I was not successful in disproving what you stated when testing Oracle Database 11.2.0.2 (although it is possible to make it appear as though what is stated is incorrect, simply by using an invalid index hint). The test table and indexes that I started with were defined as follows:

    CREATE TABLE T1 (
      C1 NUMBER NOT NULL,
      C2 NUMBER NOT NULL,
      C3 VARCHAR2(30) NOT NULL,
      C4 VARCHAR2(200));
     
    INSERT INTO T1
    SELECT
      MOD(ROWNUM-1, 90) * 4 C1,
      ROWNUM - 1 C2,
      TO_CHAR(ROWNUM - 1, 'RN') C3,
      LPAD('A',200,'A') C4
    FROM
      DUAL
    CONNECT BY
      LEVEL<=1000000;
     
    CREATE INDEX IND_T1_C1_C2 ON T1(C1,C2);
    CREATE INDEX IND_T1_C2_C1_C3 ON T1(C2,C1,C3);
    CREATE INDEX IND_T1_C3_C1_C2 ON T1(C3,C1,C2);
     
    EXEC DBMS_STATS.GATHER_TABLE_STATS(OWNNAME=>USER,TABNAME=>'T1',CASCADE=>TRUE)
     
    SET AUTOTRACE TRACEONLY EXPLAIN
    SET LINESIZE 120
    SET PAGESIZE 1000
    

    Unhinted, the following query accesses the index on columns C2, C1, and C3 to avoid accessing the table:

    SELECT
      C1,
      C2,
      C3
    FROM
      T1;
     
    Plan hash value: 2374279026
     
    ----------------------------------------------------------------------------------------
    | Id  | Operation            | Name            | Rows  | Bytes | Cost (%CPU)| Time     |
    ----------------------------------------------------------------------------------------
    |   0 | SELECT STATEMENT     |                 |  1000K|    23M|  1823  (11)| 00:00:01 |
    |   1 |  INDEX FAST FULL SCAN| IND_T1_C2_C1_C3 |  1000K|    23M|  1823  (11)| 00:00:01 |
    ----------------------------------------------------------------------------------------
    

    Let’s hint the optimizer to use the index on the columns C1 and C2:

     
    SELECT /*+ INDEX(T1 (C1 C2)) */
      C1,
      C2,
      C3
    FROM
      T1;
     
    Plan hash value: 3388050039
     
    --------------------------------------------------------------------------------------------
    | Id  | Operation                   | Name         | Rows  | Bytes | Cost (%CPU)| Time     |
    --------------------------------------------------------------------------------------------
    |   0 | SELECT STATEMENT            |              |  1000K|    23M|  1012K  (1)| 00:06:46 |
    |   1 |  TABLE ACCESS BY INDEX ROWID| T1           |  1000K|    23M|  1012K  (1)| 00:06:46 |
    |   2 |   INDEX FULL SCAN           | IND_T1_C1_C2 |  1000K|       |  3026  (10)| 00:00:02 |
    --------------------------------------------------------------------------------------------
    

    In the above, the optimizer obeyed the hint, even though the calculated cost from the unhinted plan increased from 1,823 to 1,012,000.

    Let’s reverse the order of the columns in the index hint:

    SELECT /*+ INDEX(T1 (C2 C1)) */
      C1,
      C2,
      C3
    FROM
      T1;
     
    ----------------------------------------------------------
    Plan hash value: 1746297295
    
    ------------------------------------------------------------------------------------
    | Id  | Operation        | Name            | Rows  | Bytes | Cost (%CPU)| Time     |
    ------------------------------------------------------------------------------------
    |   0 | SELECT STATEMENT |                 |  1000K|    23M|  5171   (6)| 00:00:03 |
    |   1 |  INDEX FULL SCAN | IND_T1_C2_C1_C3 |  1000K|    23M|  5171   (6)| 00:00:03 |
    ------------------------------------------------------------------------------------
    

    In the above, note that the index on columns C2, C1, and C3 was used, but the cost is now calculated at 5,171 rather than 1,823 as it was in the unhinted plan. The INDEX FAST FULL SCAN operation is now shown as an INDEX FULL SCAN operation.

    We have an index on columns C2, C1, and C3, but we also have an index on columns C3, c1, and C2. What happens when we specify the columns C3, C1, and C2 in the index hint in that order?

    SELECT /*+ INDEX(T1 (C3 C1 C2)) */
      C1,
      C2,
      C3
    FROM
      T1;
     
    Execution Plan
    ----------------------------------------------------------
    Plan hash value: 2273443829
     
    ------------------------------------------------------------------------------------
    | Id  | Operation        | Name            | Rows  | Bytes | Cost (%CPU)| Time     |
    ------------------------------------------------------------------------------------
    |   0 | SELECT STATEMENT |                 |  1000K|    23M|  5283   (6)| 00:00:03 |
    |   1 |  INDEX FULL SCAN | IND_T1_C3_C1_C2 |  1000K|    23M|  5283   (6)| 00:00:03 |
    ------------------------------------------------------------------------------------
    

    An index full scan was selected to performed on the IND_T1_C3_C1_C2 index at a cost 5,283, rather than using the IND_T1_C2_C1_C3 index that previously resulted in a cost of 5,171 – so the optimizer will not alter the order of the columns in the index hint to reduce the calculated cost.

    If we add a WHERE clause that places a restriction on column C2 to be less than 10, the optimizer could use a couple of different access paths. Let’s specify the columns C1 and C2 in the index hint to see which index is selected:

    SELECT /*+ INDEX(T1 (C1 C2)) */
      C1,
      C2,
      C3
    FROM
      T1
    WHERE
      C2<10;
     
    Plan hash value: 1883798457
     
    --------------------------------------------------------------------------------------------
    | Id  | Operation                   | Name         | Rows  | Bytes | Cost (%CPU)| Time     |
    --------------------------------------------------------------------------------------------
    |   0 | SELECT STATEMENT            |              |    10 |   250 |   104   (1)| 00:00:01 |
    |   1 |  TABLE ACCESS BY INDEX ROWID| T1           |    10 |   250 |   104   (1)| 00:00:01 |
    |*  2 |   INDEX SKIP SCAN           | IND_T1_C1_C2 |    10 |       |    93   (2)| 00:00:01 |
    --------------------------------------------------------------------------------------------
     
    Predicate Information (identified by operation id):
    ---------------------------------------------------
       2 - access("C2"<10)
           filter("C2"<10)
    

    In the above, a skip scan was selected because the index with columns C1 and C2 was specified in the hint.

    Let’s try another example that possibly might be considered a case where the optimizer disobeys the hint or is free to change the order of the columns specified in the index hint (this might be incorrectly considered an edge case):

    SELECT /*+ INDEX(T1 (C1 C2 C3)) */
      C1,
      C2,
      C3
    FROM
      T1
    WHERE
      C2<10;
      
    Plan hash value: 4150417361
     
    ------------------------------------------------------------------------------------
    | Id  | Operation        | Name            | Rows  | Bytes | Cost (%CPU)| Time     |
    ------------------------------------------------------------------------------------
    |   0 | SELECT STATEMENT |                 |    10 |   250 |     3   (0)| 00:00:01 |
    |*  1 |  INDEX RANGE SCAN| IND_T1_C2_C1_C3 |    10 |   250 |     3   (0)| 00:00:01 |
    ------------------------------------------------------------------------------------
     
    Predicate Information (identified by operation id):
    ---------------------------------------------------
       1 - access("C2"<10)
    

    A quick peek at the above output might suggest that the optimizer could decide to locate an index with columns C1, C2, and C3 in any order – but I do not believe that this is the case. I believe that the optimizer considered the index hint specified in the SQL statement as being invalid (a check of the 10053 trace might confirm).

    Let’s create another index and then repeat the above SQL statement.

    CREATE INDEX IND_T1_C1_C2_C3 ON T1(C1,C2,C3);
     
    SELECT /*+ INDEX(T1 (C1 C2 C3)) */
      C1,
      C2,
      C3
    FROM
      T1
    WHERE
      C2<10;
     
    Execution Plan
    ----------------------------------------------------------
    Plan hash value: 212907557
     
    ------------------------------------------------------------------------------------
    | Id  | Operation        | Name            | Rows  | Bytes | Cost (%CPU)| Time     |
    ------------------------------------------------------------------------------------
    |   0 | SELECT STATEMENT |                 |    10 |   250 |    93   (2)| 00:00:01 |
    |*  1 |  INDEX SKIP SCAN | IND_T1_C1_C2_C3 |    10 |   250 |    93   (2)| 00:00:01 |
    ------------------------------------------------------------------------------------
     
    Predicate Information (identified by operation id):
    ---------------------------------------------------
       1 - access("C2"<10)
           filter("C2"<10)
    

    The index with the columns that matched the order of the columns in the index hint was selected, even though the calculated cost would have been lower if the optimizer were permitted to select any index with the columns listed in the index hint.

    What about a case where there is an exact match between an index definition and an index hint, and there is also another index with one additional column which would avoid the table access:

    SELECT /*+ INDEX(T1 (C1 C2)) */
      C1,
      C2,
      C3
    FROM
      T1
    WHERE
      C2<10;
     
    Plan hash value: 1883798457
     
    --------------------------------------------------------------------------------------------
    | Id  | Operation                   | Name         | Rows  | Bytes | Cost (%CPU)| Time     |
    --------------------------------------------------------------------------------------------
    |   0 | SELECT STATEMENT            |              |    10 |   250 |   104   (1)| 00:00:01 |
    |   1 |  TABLE ACCESS BY INDEX ROWID| T1           |    10 |   250 |   104   (1)| 00:00:01 |
    |*  2 |   INDEX SKIP SCAN           | IND_T1_C1_C2 |    10 |       |    93   (2)| 00:00:01 |
    --------------------------------------------------------------------------------------------
     
    Predicate Information (identified by operation id):
    ---------------------------------------------------
       2 - access("C2"<10)
           filter("C2"<10)
    

    The index that exactly match the index hint was selected.

    What if we only specify in the index hint a leading column, when there are two indexes with that leading column, one of which allows the optimizer to avoid the table access:

     
    SELECT /*+ INDEX(T1 (C1)) */
      C1,
      C2,
      C3
    FROM
      T1
    WHERE
      C2<10;
     
    Plan hash value: 212907557
     
    ------------------------------------------------------------------------------------
    | Id  | Operation        | Name            | Rows  | Bytes | Cost (%CPU)| Time     |
    ------------------------------------------------------------------------------------
    |   0 | SELECT STATEMENT |                 |    10 |   250 |    93   (2)| 00:00:01 |
    |*  1 |  INDEX SKIP SCAN | IND_T1_C1_C2_C3 |    10 |   250 |    93   (2)| 00:00:01 |
    ------------------------------------------------------------------------------------
     
    Predicate Information (identified by operation id):
    ---------------------------------------------------
       1 - access("C2"<10)
           filter("C2"<10)
    

    The optimizer selected the lowest cost access path from the two indexes that matched the hint.

    What if we specify a column in the index hint that is not listed in the SELECT or WHERE clauses?

    SELECT /*+ INDEX(T1 (C3)) */
      C1,
      C2
    FROM
      T1
    WHERE
      C2<10;
     
    Plan hash value: 1328421701
     
    ------------------------------------------------------------------------------------
    | Id  | Operation        | Name            | Rows  | Bytes | Cost (%CPU)| Time     |
    ------------------------------------------------------------------------------------
    |   0 | SELECT STATEMENT |                 |    10 |    90 |  4039   (1)| 00:00:02 |
    |*  1 |  INDEX SKIP SCAN | IND_T1_C3_C1_C2 |    10 |    90 |  4039   (1)| 00:00:02 |
    ------------------------------------------------------------------------------------
     
    Predicate Information (identified by operation id):
    ---------------------------------------------------
       1 - access("C2"<10)
           filter("C2"<10)
    

    The above shows that the optimizer still obeyed the intention of the hint – it found an index that started with the specified column and selected to perform an INDEX SKIP SCAN even though column C2, specified in the WHERE clause, is the third column in the index definition.

    What happens if we try something silly by modifying the WHERE clause in the previous SQL statement to specify that essentially every row from the table will be retrieved:

    SELECT /*+ INDEX(T1 (C3)) */
      C1,
      C2
    FROM
      T1
    WHERE
      C2<1000000;
     
    Plan hash value: 1328421701
     
    ------------------------------------------------------------------------------------
    | Id  | Operation        | Name            | Rows  | Bytes | Cost (%CPU)| Time     |
    ------------------------------------------------------------------------------------
    |   0 | SELECT STATEMENT |                 |  1000K|  8789K|  5283   (6)| 00:00:03 |
    |*  1 |  INDEX SKIP SCAN | IND_T1_C3_C1_C2 |  1000K|  8789K|  5283   (6)| 00:00:03 |
    ------------------------------------------------------------------------------------
     
    Predicate Information (identified by operation id):
    ---------------------------------------------------
       1 - access("C2"<1000000)
           filter("C2"<1000000)
    

    At least with B*tree indexes and simple SQL statements, there do not appear to be any edge cases, although specifying an invalid index hint might appear to be an edge case at first glance.

    Comment by Charles Hooper — January 15, 2012 @ 3:53 pm GMT Jan 15,2012 | Reply

    • Jonathan,

      It might be the case that the original basis for your observation is found in the Oracle Database documentation (found in the 10.2 and 11.2 documentation library – a bit more information is provided in the 11.2 documentation libary (in section 19.2.4 Specifying Complex Index Hints), but the link to create the V view (link to Example 16-3) was removed).
      http://docs.oracle.com/cd/B19306_01/server.102/b14211/hintsref.htm#PFGRF50105

      * table specifies the name

      * column specifies the name of a column in the specified table

      ** The columns can optionally be prefixed with table qualifiers allowing the hint to specify bitmap join indexes where the index columns are on a different table than the indexed table. If tables qualifiers are present, they must be base tables, not aliases in the query.

      ** Each column in an index specification must be a base column in the specified table, not an expression. Function-based indexes cannot be hinted using a column specification unless the columns specified in the index specification form the prefix of a function-based index.

      The hint is resolved as follows:
      * If an index name is specified, only that index is considered.

      * If a column list is specified and an index exists whose columns match the specified columns in number and order, only that index is considered. If no such index exists, then any index on the table with the specified columns as the prefix in the order specified is considered. In either case, the behavior is exactly as if the user had specified the same hint individually on all the matching indexes.

      For example, in Example 16-3 the job_history table has a single-column index on the employee_id column and a concatenated index on employee_id and start_date columns. To specifically instruct the optimizer on index use, the query can be hinted as follows:

      SELECT /*+ INDEX(v.j jhist_employee_ix (employee_id start_date)) */ * FROM v;
      

      The above quote agrees with your statement. However, the index hint example provided in the above quote does not appear to be valid because both an index name and a column list are specified. In light of the apparently invalid index hint found in the 10.2 and 11.2 documentation, a bit more experimentation might be required.

      I find that it is interesting that the following index hint found in the documentation works:

      SELECT /*+ INDEX(v.e2.e3 emp_job_ix) */  * 
        FROM v;
      

      Comment by Charles Hooper — January 15, 2012 @ 8:04 pm GMT Jan 15,2012 | Reply

      • Charles,

        Thanks for that. I’ve always found it harder to find things in the manuals since they went electronic, but I’m surprised I hadn’t noticed that before. (Maybe it wasn’t there when I first checked, or maybe I only checked in the 10.1 manual).

        To return the favour – the /*+ index(v.e2.e3 emp_job_ix) */ format goes all the way back to Oracle 8, and is the form used to supply hints against tables inside view definitions. If it actually works it means that e2 is a view inside v, and e3 is a view inside e2.

        Comment by Jonathan Lewis — January 26, 2012 @ 9:47 pm GMT Jan 26,2012 | Reply

  5. […] my previous post, I made the comment: In general, if you have a three-column index that starts with the same […]

    Pingback by Quiz Night « Oracle Scratchpad — January 19, 2012 @ 8:51 am GMT Jan 19,2012 | Reply

  6. […] reminded that it was possible to create non-specific index hints that specify table columns when a recent quiz was posted that asked to find specific cases where the behavior is other than expected with the […]

    Pingback by Non-Specific Index Hints « Charles Hooper's Oracle Notes — January 24, 2012 @ 8:39 pm GMT Jan 24,2012 | Reply


RSS feed for comments on this post. TrackBack URI

Leave a reply to Jonathan Lewis Cancel reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Website Powered by WordPress.com.