Oracle Scratchpad

March 30, 2017

Adding Indexes

Filed under: Function based indexes,Indexing,Oracle — Jonathan Lewis @ 8:31 am BST Mar 30,2017

The following question came up on the OTN database forum recently:


We have below table with columns,

Table T1
Columns:
-----------
Col_1, Col_2, Col_3, Col_4, Col_5, Col_6, Col_7, Col_8, Col_9, Col_10, Col_11, Col_12, Col_13, Col_14, Col_15

on which below indexes are created.

XXTEST_Col_1    Col_1
XXTEST_Col_2    Col_2
XXTEST_Col_3    Col_3
XXTEST_Col_5    Col_5
XXTEST_Col_6    Col_6
XXTEST_Col_7    Col_7
XXTEST_Col_8    Col_8
XXTEST_Col_8    (Col_4, Col_10, Col_11)

I have requirement to update table T1 as below and it’s taking really long. [JPL: I’m assuming that the naming of the second xxtest_col_8 index is a trivial error introduced while the OP was concealing names.)

UPDATE T1
SET Col_3 = x_value,
    Col_6 = y_value
where Col_4='N'
and Col_3 IS NULL;

To improve performance, created below index and it improved the performance of the update.

 
New Index:  XXTEST_Col_4    (Col_4, Col_3)

My question is it good idea to create the above index with already existing indexes? Is there any other better way to handle this?

There is insufficient information to answer these questions properly but if I had been given this problem there are several things I’d want to check before I came up with any answers. (In the absence of complete information the default answers are “probably not”, “probably”).

What Happened

The first question, of course, is “why did the update take a really long time?”, and the second is “why did the performance improve after the index creation?”.

There are (generally) two execution paths for the update – full tablescan or index access through the index starting with col_4. Whichever path Oracle took would the update have been faster (and fast enough) if the alternative path had been taken ?  (I have to include the “generally” caveat since it’s just possible that with silly settings for a couple of parameters and statistics the optimizer could have been doing an amazingly stupid “index full scan” on any index that contained a column declared as not null.)

Then consider the fact that the update is changing two columns that are indexed – and one of those columns (col_3) is changing from null to a value, so it’s going to increase the size of the index. Is the slow performance due to the time spent maintaining the indexes, and is the improved performance due to the fact that there are no rows left to update because the first run of the test updated all of them ?

It does seem reasonably likely that Oracle would use the new index for the update and that the better precision would be the cause of the improved performance; but a side effect of the index creation is that Oracle would now have some statistics about the combination of (col_4, col_3) that wasn’t previously available. Perhaps gathering stats on the column group (col_4, col_3) would have been sufficient to result in a change in execution path that would have given the required performance without introducing the extra  maintenance costs.

What’s the requirement

Before deciding whether or not an index should be used to improve the performance of a statement it’s important to have some idea of the cost/benefit/risk analysis. This often boils down to “how much data should change, how often does it happen, what could possibly go wrong”.

As far as “what could possibly go wrong” – by adding the index (col_4, col_3) we introduce 4 groups of risk:

  • any SQL currently using the col_3 index might change plans
  • any SQL currently using the (col_4, col_10, col_11) index might change plans
  • any query that has a predicate “col_3 is null” might change plans because that predicate could be resolved in the index if col_4 had been declared not null,
  • any query with equality predicates on both col_3 and col_4 might change plans because the new index supplies some statistics about the combination

We don’t have enough information to evaluate the degree of risk – and we haven’t been given any figures about the volume of data that changes or the frequency with which the update runs, so we can’t really say much beyond raising the warning flags.

What does the data look like.

Despite my closing comment above, we have two little clues about the data. First is that one of the predicates is col_4 = ‘N’ – to me this always raises the suspicion that we’re looking at a column which is a Yes/No, or Yes/No/NULL column; secondly the OP has marked the following response as “Correct Answer”:

Depends on your data.
Check count of both queries

select count(*) from T1 where Col_4='N' and Col_3 IS NULL;
select count(*) from T1 where Col_4='N';

If there is a significant difference in count, and delete is executed frequently, it might be good to have

If we assume that this means (a) there is a significant difference in the counts and (b) the delete IS executed frequently then we we have some (though not really enough) useful information about what the data looks like and how it is handled. Perhaps over a short period of time a small volume of new data appears where col_3 is null and there is a scheduled mechanism that looks for that data and modifies it.

If that’s the case then an index that gives use high precision when we’re searching for a (very) small amount of data from a very large table would be very helpful, while an index that starts with the first of our search columns (col_4 is the leading column of an existing index) might actually be a liability if we do a large range scan, visit the table for lots of rows, and then discard them because they fail our second predicate.

However, if this IS the case then it’s not a good idea to create a large index that is constantly subject to change – especially if the update from NULL to something else results in updates that scatter the col_3 values randomly around the index. (The NULL rows will be well packed in the index, and may be in a small collection of recent table blocks; but the update will delete one index entry and we have no idea about the impact that inserting the replacement could have.) So, if we’re also allowed to modify the code, let’s create an index that is as small as possible, won’t be used accidentally by any other code, and minimises the workload when updated. Here’s the code to create the index and one option for gathering stats on the resulting hidden column, and the modified update code:


create index xxtext_fbi1 on t1(
        case when col_4='N' and col_3 is null then 0 end
)
;

begin
        dbms_stats.gather_table_stats(
                user,
                't1',
                method_opt=>'for columns (case when col_4=''N'' and col_3 is null then 0 end) size 1'
        );
end;
/

update t1 set
        col_3 = {x_value},
        col_6 = {y_value}
where
        case when col_4='N' and col_3 is null then 0 end = 0
;

This index will contain entries for just those rows that we need to update, and when we update them the index entries will simply disappear. The index is the smallest that we could have, with the minimum workload imposed, and no risk of being hi-jacked by any other process.

As a variation on a theme, and depending on version, we could create a virtual column with this definition (making it invisible in 12c) and creating an index on the virtual column. For preference I think I’d use the FBI in 11g, and the invisible virtual column in 12c.


alter table t1
add (
        wanted_flag invisible
        generated always as (
                case when col_4='N' and col_3 is null then 0 end
        ) virtual
)
;

begin
        dbms_stats.gather_table_stats(
                user,
                't1',
                method_opt=>'for columns wanted_flag size 1'
        );
end;
/

create index xxtest_virt1 on t1(wanted_flag);

update t1 set
        col_3 = {x_value},
        col_6 = {y_value}
where
        wanted_flag = 0
;

The thread raised a couple of other worthwhile points – one post pointed out that if they were going to create a new index they would want to make it as useful for as many other processes as possible: this, of course, is a different view on the cost/benefit/risk assessment, especially (I would say) if there is no option to create a very small, low-cost, index that is guaranteeably safe. The point also came up that there are two or three variations on how one could produce an expression that produced index entries only for the rows where col_4 = ‘N’ and col_3 is null; and a point about further possibilities if there were a few different values for col_4 that resulted in similar queries.

3 Comments »

  1. Hello.
    As always this is great source of knowledge.
    But I have one concern about index:
    New Index: XXTEST_Col_4 (Col_4, Col_3)

    As far as I know nulls isn’t indexed at last standard BTree index position so this predicate work’s only partial on firs column and last ‘is useless’ (probably is used to filter data) :
    where Col_4=’N’ and Col_3 IS NULL

    So maybe better option is drop index:
    XXTEST_Col_3 Col_3

    and create new one on (col_3, col_4)?
    With this approach would be one index to maintenance and maybe even better work with this case?

    Comment by Paweł — March 31, 2017 @ 3:58 pm BST Mar 31,2017 | Reply

    • Pawel,

      Thanks for the comment – I hadn’t heard the suggestion that trailing nulls wouldn’t be part of the index before so I ran up a little test.

      The execution plans from autotrace show that Oracle has no problem with nulls in the index, provided at least one column of the index entry is not null.

      In fact the only oddity appears when col_3 is the leading column – I can’t think of a good reason why there should be an index filter predicate on col_4.

      Here’s the s

      
      SQL> select /*+ index(t1(col_3)) */ * from t1 where col_3 is null and col_4 = 'N';
      
      --------------------------------------------------------------------------
      | Id  | Operation        | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
      --------------------------------------------------------------------------
      |   0 | SELECT STATEMENT |       |     1 |     4 |     1   (0)| 00:00:01 |
      |*  1 |  INDEX RANGE SCAN| T1_34 |     1 |     4 |     1   (0)| 00:00:01 |
      --------------------------------------------------------------------------
      
      Predicate Information (identified by operation id):
      ---------------------------------------------------
         1 - access("COL_3" IS NULL AND "COL_4"='N')
             filter("COL_4"='N')
      
      SQL> select /*+ index(t1(col_4)) */ * from t1 where col_3 is null and col_4 = 'N';
      
      --------------------------------------------------------------------------
      | Id  | Operation        | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
      --------------------------------------------------------------------------
      |   0 | SELECT STATEMENT |       |     1 |     4 |     1   (0)| 00:00:01 |
      |*  1 |  INDEX RANGE SCAN| T1_43 |     1 |     4 |     1   (0)| 00:00:01 |
      --------------------------------------------------------------------------
      
      Predicate Information (identified by operation id):
      ---------------------------------------------------
         1 - access("COL_4"='N' AND "COL_3" IS NULL)
      
      

      Here’s the symbolic dump of an index entry with a trailing null:

      
      row#1[8007] flag: ------, lock: 0, len=12
      col 0; len 1; (1):  4e
      col 1; NULL
      col 2; len 6; (6):  01 40 00 9a 00 00
      
      

      Counting bytes you can see that the NULL is represented by a single (pseudo-length) byte, and checking the raw dump this turns out to be FF (rather than the 0 you might expect).

      Comment by Jonathan Lewis — April 1, 2017 @ 8:09 pm BST Apr 1,2017 | Reply

  2. Hi again.
    I already find info about grouping nulls at the end of the index – thx, haven’t heard about that before but many times I heard that nulls isn’t indexed so when we want smallest possible index then values that are useless change to null (like You did using case).
    I ran some tests and single column index won’t work for null search – maybe I done something wrong or maybe is too late for testing.

    Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit Production
    PL/SQL Release 11.2.0.4.0 - Production
    CORE	11.2.0.4.0	Production
    TNS for Linux: Version 11.2.0.4.0 - Production
    NLSRTL Version 11.2.0.4.0 - Production
    
    create table t1 as
    select *
    from (select case mod(level, 100000)
                   when 0 then null
                   else level
                 end
                   c1,
                 case mod(level, 3)
                   when 1 then 'A'
                   when 2 then 'B'
                   else 'C'
                 end
                   c2,
                   lpad('X', 1000, 'X') c3
          from dual
          connect by level  'X', tabname => 'T1');
    end;
    /
    
    -- two indexes: first on c1 and second on c2
    
    create index t1_c1 on t1 (c1);
    create index t1_c2 on t1 (c2);
    
    -------------------------------------
    SQL_ID  4j60u7bc8rv4j, child number 1
    -------------------------------------
    select * from t1 where c1 is null   and c2 = 'A'
    
    Plan hash value: 3617692013
    
    --------------------------------------------------------------------------
    | Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
    --------------------------------------------------------------------------
    |   0 | SELECT STATEMENT  |      |       |       |  3924 (100)|          |
    |*  1 |  TABLE ACCESS FULL| T1   |     1 |  1008 |  3924   (1)| 00:00:48 |
    --------------------------------------------------------------------------
    
    Predicate Information (identified by operation id):
    ---------------------------------------------------
    
       1 - filter(("C1" IS NULL AND "C2"='A'))
    
    
    -------------------------------------
    SQL_ID  fk5hvjmm11q82, child number 1
    -------------------------------------
    select /*+ index(t1(c1)) */* from t1 where c1 is null   and c2 = 'A'
    
    Plan hash value: 3617692013
    
    --------------------------------------------------------------------------
    | Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
    --------------------------------------------------------------------------
    |   0 | SELECT STATEMENT  |      |       |       |  3924 (100)|          |
    |*  1 |  TABLE ACCESS FULL| T1   |     1 |  1008 |  3924   (1)| 00:00:48 |
    --------------------------------------------------------------------------
    
    Predicate Information (identified by operation id):
    ---------------------------------------------------
    
       1 - filter(("C1" IS NULL AND "C2"='A'))
    
    
    -------------------------------------
    SQL_ID  47718pkvvbts2, child number 0
    -------------------------------------
    select /*+ index(t1(c2)) */* from t1 where c1 is null   and c2 = 'A'
    
    Plan hash value: 1953315082
    
    -------------------------------------------------------------------------------------
    | Id  | Operation                   | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
    -------------------------------------------------------------------------------------
    |   0 | SELECT STATEMENT            |       |       |       | 13528 (100)|          |
    |*  1 |  TABLE ACCESS BY INDEX ROWID| T1    |     1 |  1008 | 13528   (1)| 00:02:43 |
    |*  2 |   INDEX RANGE SCAN          | T1_C2 | 33333 |       |    62   (2)| 00:00:01 |
    -------------------------------------------------------------------------------------
    
    Predicate Information (identified by operation id):
    ---------------------------------------------------
    
       1 - filter("C1" IS NULL)
       2 - access("C2"='A')
    
    
    drop index t1_c1;
    drop index t1_c2;
    
    
    
    -- one indexes on (c1, c2)
    
    create index t1_c1_c2 on t1 (c1, c2);
    
    -------------------------------------
    SQL_ID  4j60u7bc8rv4j, child number 1
    -------------------------------------
    select * from t1 where c1 is null   and c2 = 'A'
    
    Plan hash value: 1347170259
    
    ----------------------------------------------------------------------------------------
    | Id  | Operation                   | Name     | Rows  | Bytes | Cost (%CPU)| Time     |
    ----------------------------------------------------------------------------------------
    |   0 | SELECT STATEMENT            |          |       |       |     2 (100)|          |
    |   1 |  TABLE ACCESS BY INDEX ROWID| T1       |     1 |  1008 |     2   (0)| 00:00:01 |
    |*  2 |   INDEX RANGE SCAN          | T1_C1_C2 |     1 |       |     1   (0)| 00:00:01 |
    ----------------------------------------------------------------------------------------
    
    Predicate Information (identified by operation id):
    ---------------------------------------------------
    
       2 - access("C1" IS NULL AND "C2"='A')
           filter("C2"='A')
    
    
    -------------------------------------
    SQL_ID  fk5hvjmm11q82, child number 1
    -------------------------------------
    select /*+ index(t1(c1)) */* from t1 where c1 is null   and c2 = 'A'
    
    Plan hash value: 1347170259
    
    ----------------------------------------------------------------------------------------
    | Id  | Operation                   | Name     | Rows  | Bytes | Cost (%CPU)| Time     |
    ----------------------------------------------------------------------------------------
    |   0 | SELECT STATEMENT            |          |       |       |     2 (100)|          |
    |   1 |  TABLE ACCESS BY INDEX ROWID| T1       |     1 |  1008 |     2   (0)| 00:00:01 |
    |*  2 |   INDEX RANGE SCAN          | T1_C1_C2 |     1 |       |     1   (0)| 00:00:01 |
    ----------------------------------------------------------------------------------------
    
    Predicate Information (identified by operation id):
    ---------------------------------------------------
    
       2 - access("C1" IS NULL AND "C2"='A')
           filter("C2"='A')
    
    
    drop index t1_c1_c2;
    
    
    
    -- one indexes on (c2, c1)
    
    create index t1_c2_c1 on t1 (c2, c1);
    
    -------------------------------------
    SQL_ID  4j60u7bc8rv4j, child number 1
    -------------------------------------
    select * from t1 where c1 is null   and c2 = 'A'
    
    Plan hash value: 1839694307
    
    ----------------------------------------------------------------------------------------
    | Id  | Operation                   | Name     | Rows  | Bytes | Cost (%CPU)| Time     |
    ----------------------------------------------------------------------------------------
    |   0 | SELECT STATEMENT            |          |       |       |     2 (100)|          |
    |   1 |  TABLE ACCESS BY INDEX ROWID| T1       |     1 |  1008 |     2   (0)| 00:00:01 |
    |*  2 |   INDEX RANGE SCAN          | T1_C2_C1 |     1 |       |     1   (0)| 00:00:01 |
    ----------------------------------------------------------------------------------------
    
    Predicate Information (identified by operation id):
    ---------------------------------------------------
    
       2 - access("C2"='A' AND "C1" IS NULL)
    
    
    -------------------------------------
    SQL_ID  fk5hvjmm11q82, child number 1
    -------------------------------------
    select /*+ index(t1(c1)) */* from t1 where c1 is null   and c2 = 'A'
    
    Plan hash value: 1839694307
    
    ----------------------------------------------------------------------------------------
    | Id  | Operation                   | Name     | Rows  | Bytes | Cost (%CPU)| Time     |
    ----------------------------------------------------------------------------------------
    |   0 | SELECT STATEMENT            |          |       |       |     2 (100)|          |
    |   1 |  TABLE ACCESS BY INDEX ROWID| T1       |     1 |  1008 |     2   (0)| 00:00:01 |
    |*  2 |   INDEX RANGE SCAN          | T1_C2_C1 |     1 |       |     1   (0)| 00:00:01 |
    ----------------------------------------------------------------------------------------
    
    Predicate Information (identified by operation id):
    ---------------------------------------------------
    
       2 - access("C2"='A' AND "C1" IS NULL)
    
    
    -------------------------------------
    SQL_ID  47718pkvvbts2, child number 0
    -------------------------------------
    select /*+ index(t1(c2)) */* from t1 where c1 is null   and c2 = 'A'
    
    Plan hash value: 1839694307
    
    ----------------------------------------------------------------------------------------
    | Id  | Operation                   | Name     | Rows  | Bytes | Cost (%CPU)| Time     |
    ----------------------------------------------------------------------------------------
    |   0 | SELECT STATEMENT            |          |       |       |     2 (100)|          |
    |   1 |  TABLE ACCESS BY INDEX ROWID| T1       |     1 |  1008 |     2   (0)| 00:00:01 |
    |*  2 |   INDEX RANGE SCAN          | T1_C2_C1 |     1 |       |     1   (0)| 00:00:01 |
    ----------------------------------------------------------------------------------------
    
    Predicate Information (identified by operation id):
    ---------------------------------------------------
    
       2 - access("C2"='A' AND "C1" IS NULL)
    
    
    drop index t1_c2_c1;
    
    
    
    -- one indexes on c1 and constant (c1, 1)
    
    create index t1_c1_cons on t1 (c1, 1);
    
    -------------------------------------
    SQL_ID  4j60u7bc8rv4j, child number 1
    -------------------------------------
    select * from t1 where c1 is null   and c2 = 'A'
    
    Plan hash value: 2206770984
    
    ------------------------------------------------------------------------------------------
    | Id  | Operation                   | Name       | Rows  | Bytes | Cost (%CPU)| Time     |
    ------------------------------------------------------------------------------------------
    |   0 | SELECT STATEMENT            |            |       |       |     3 (100)|          |
    |*  1 |  TABLE ACCESS BY INDEX ROWID| T1         |     1 |  1008 |     3   (0)| 00:00:01 |
    |*  2 |   INDEX RANGE SCAN          | T1_C1_CONS |     1 |       |     2   (0)| 00:00:01 |
    ------------------------------------------------------------------------------------------
    
    Predicate Information (identified by operation id):
    ---------------------------------------------------
    
       1 - filter("C2"='A')
       2 - access("C1" IS NULL)
    
    
    -------------------------------------
    SQL_ID  fk5hvjmm11q82, child number 1
    -------------------------------------
    select /*+ index(t1(c1)) */* from t1 where c1 is null   and c2 = 'A'
    
    Plan hash value: 2206770984
    
    ------------------------------------------------------------------------------------------
    | Id  | Operation                   | Name       | Rows  | Bytes | Cost (%CPU)| Time     |
    ------------------------------------------------------------------------------------------
    |   0 | SELECT STATEMENT            |            |       |       |     3 (100)|          |
    |*  1 |  TABLE ACCESS BY INDEX ROWID| T1         |     1 |  1008 |     3   (0)| 00:00:01 |
    |*  2 |   INDEX RANGE SCAN          | T1_C1_CONS |     1 |       |     2   (0)| 00:00:01 |
    ------------------------------------------------------------------------------------------
    
    Predicate Information (identified by operation id):
    ---------------------------------------------------
    
       1 - filter("C2"='A')
       2 - access("C1" IS NULL)
    
    
    drop index t1_c1_cons;
    

    Comment by Paweł — April 1, 2017 @ 11:59 pm BST Apr 1,2017 | Reply


RSS feed for comments on this post. TrackBack URI

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Powered by WordPress.com.