Oracle Scratchpad

December 16, 2010

FBI oddities

Filed under: Function based indexes,Indexing — Jonathan Lewis @ 6:15 pm GMT Dec 16,2010

Function-based indexes are wonderful things – but they don’t always work exactly as expected. Here’s an example of one such anomaly.

Imagine you have some type of “orders” table where most orders are in a “finished with” state, and you have a requirement to access the small number of orders in the “new” state. Here’s a sample data set to emulate this type of data requirement (created in 11.1.0.6, 1MB uniform extents, freelist management and 8KB blocks).


create table t1 (
	state		varchar2(10),
	n1		number,
	v1		varchar2(10),
	padding	varchar2(100)
);

insert into t1
select
	decode(mod(rownum,100),0,'OPEN','CLOSED'),
	rownum,
	lpad(rownum,10,0),
	rpad('x',100,'x')
from
	all_objects
where
	rownum <= 5000
;


I’ve generated this data set so that every 100th row is marked as ‘OPEN’ and all the rest are marked as ‘CLOSED’ – in a real system the percentage of ‘OPEN’ orders would probably be much smaller so we could easily decide to have an index on state to give us an efficient access path to the open orders. But such an index would be very large, because it would also hold entries for the huge number of closed orders; we’d also have to create a histogram on the column (possibly by writing a simple script) so that Oracle could recognise the skewed data distribution.

If we wanted to be clever, though, and if we were able to edit the SQL that addressed this table, we could minimise the size of the index and avoid the need for a histogram by creating a function-based index that held values just for the rows where the state was ‘OPEN’. For example, I could create an index which held the order number only for those rows where the state was open; and there are several ways I could do this, for example:


create index t1_f1 on t1(decode(state,'CLOSED', to_number(null), n1 ));
create index t1_f2 on t1(to_number(decode(state,'CLOSED', null, n1 )));
create index t1_f3 on t1(case when state = 'CLOSED' then to_number(null) else n1 end);
create index t1_f4 on t1(to_number(case when state = 'CLOSED' then null else n1 end));
create index t1_f5 on t1(decode(state,'CLOSED', null, n1 ));
create index t1_f6 on t1(decode(state,'CLOSED', cast(null as number), n1 ));

If you’re wondering why I’ve included a “to_number()” in the first index, remember that NULL is implicitly assumed to be a NULL of type character by Oracle – so I’ve got to do something to tell Oracle that this NULL is supposed to be a numeric NULL. Index t1_f5 is the same as t1_f1, but without the to_number(), and index t1_f6 is the same again but using the more modern cast() to supply the conversion.

You’ll note that I haven’t yet shown any attempt to collect statistics. If we create the indexes AFTER we’ve collected stats on the table we’ll have to collect some extra table stats once the indexes exist because each function-based index will have added a new (hidden) column to the table and, although the “create index” commands will have created statistics for the indexes (from 10g onwards), we will not yet have stats on these hidden columns. So I’m going to wait until after creating the indexes to generate the stats:


begin
	dbms_stats.gather_table_stats(
		ownname		=> user,
		tabname		=>'T1',
		estimate_percent	=> 100,
		method_opt		=> 'for all columns size 1'
	);
end;
/

The question is now this – given the definitions of the indexes above, which of the following six queries – each one designed to be an exact match for one of the index definitions – will use “its” index. (Note that I have hinted the queries to ensure that if the optimizer is allowed to use an index it will use an index – and I’ve included the name of the relevant index as a comment at the end of each hint):


select
	/*+ index(t1) t1_f1 */
	v1
from
	t1
where
	decode(state,'CLOSED', to_number(null), n1 ) = 100
;

select
	/*+ index(t1) t1_f2 */
	v1
from
	t1
where
	to_number(decode(state,'CLOSED', null, n1 )) = 100
;

select
	/*+ index(t1) t1_f3 */
	v1
from
	t1
where
	case when state = 'CLOSED' then to_number(null) else n1 end = 100
;

select
	/*+ index(t1) t1_f4 */
	v1
from
	t1
where
	to_number(case when state = 'CLOSED' then null else n1 end) = 100
;

select
	/*+ index(t1) t1_f5 */
	v1
from
	t1
where
	decode(state,'CLOSED', null, n1 ) = 100
;

select
	/*+ index(t1) t1_f6 */
	v1
from
	t1
where
	decode(state,'CLOSED', cast(null as number), n1 ) = 100
;

The answer depends on the version of Oracle. Under Oracle 11.1.0.6 I got the following results. First, the attempt to create t1_f5 resulted in the following Oracle error (and that’s an important clue to what has happened in another part of the test):

create index t1_f5 on t1(decode(state,'CLOSED', null, n1 ))
                         *
ERROR at line 1:
ORA-01408: such column list already indexed

The index usage was as follows:

t1_f1		not used	(decode)
t1_f2		not used	(decode)
t1_f3		used		(case)
t1_f4		used		(case)
t1_f5		non-existent - but used t1_f1
t1_f6		used		(cast)

If you want it in a sound-bite: newer technologies do better than older technologies. But why do the results look the way they do ? You can find the answer in the index definitions that have been stored in the database:


column index_name format a10		heading "Index"
column column_position format 999	heading "Posn"
column column_expression format a72	heading "Expression"

select
	index_name, column_position, column_expression
from
	user_ind_expressions
where
	table_name = 'T1'
;

Index      Posn Expression
---------- ---- ------------------------------------------------------------------------
T1_F1         1 DECODE("STATE",'CLOSED',NULL,"N1")
T1_F2         1 TO_NUMBER(DECODE("STATE",'CLOSED',NULL,TO_CHAR("N1")))
T1_F3         1 CASE "STATE" WHEN 'CLOSED' THEN NULL ELSE "N1" END
T1_F4         1 TO_NUMBER(TO_CHAR(CASE "STATE" WHEN 'CLOSED' THEN NULL ELSE "N1" END ))
T1_F6         1 DECODE("STATE",'CLOSED',CAST(NULL AS number),"N1")

Compare the stored definition with the orginal definitions. Notice how the decodes and NULLs don’t work happily together.

In t1_f1 the explicit to_number() that I included has disappeared – that’s why I was unable to create index t1_f5 – its definition was identical to the modified t1_f1 definition. Then, of course, my predicate no longer matches the exact index definition.

In the t1_f2 definition, because NULL is implicitly character Oracle has added an explicit to_char() to the n1 column I supplied so that its type agrees with the NULL, thus allowing the final to_number() to work. So, again, my predicate no longer matches the index definition.

In t1_f3 and t1_f4 I didn’t include any explicit conversions, and Oracle didn’t add any implicit conversions – but if you look closely it has transformed the version of the case statement I supplied into the simpler form – and everything happened to work (there was an earlier version of Oracle where Oracle would do this transformation for the predicate at run time but not for the index at index creation time – with the result that the “specially created” index wouldn’t work.

Index t1_f5 was not created because my explicit definition matched Oracle’s implicit conversion of t1_f1 – and then my explicit rendition of the matching predicate allowed the optimizer to use index t1_f1.

Finally, with the cast() operator the decode() wasn’t “clever enough” to eliminate my explicit conversion, so the predicate matched the index definition and the index was used.

So the message is this – be careful how you define your function-based indexes, and check what Oracle has stored as the index definition before you commit too much effort to rewriting code to use your new index.

Footnote: Inevitably there are more questions you could ask to fill in further details here. For example, if you created a “genuine” virtual column in 11g using one of my “unusable” decode() expressions, and then indexed the virtual column, would Oracle use the index ? If I had included some cast() operators in my case expressions and corresponding predicates would Oracle still have been able to use the indexes or would I have found the index definitions and predicates were transformed differently and ceased to match ? Is the behaviour shown consistent across all popular versions of Oracle ? (the answer to that last one is No)

These questions (and others) are left as exercises for the interested reader to carry out in the privacy and quiet of their own workplaces.

6 Comments »

  1. I wonder why not decode(state,’OPEN,nl,null)

    Comment by al0 — December 16, 2010 @ 6:39 pm GMT Dec 16,2010 | Reply

  2. al0,

    Valid point, given the specific example I’ve shown; but doing it in the way I’ve shown it addresses the more generic problem of indexing “everything that isn’t closed”.

    Comment by Jonathan Lewis — December 16, 2010 @ 6:49 pm GMT Dec 16,2010 | Reply

  3. Jonathan,

    Nice explanation and thanks, in particular, for mentioning the caveat about maintaining statistics on hidden columns of FBI.

    p.s. you should have posted this as another quiz… :)

    Comment by Narendra — December 17, 2010 @ 6:07 am GMT Dec 17,2010 | Reply

  4. Jonathan,

    If state is allowed to be null, I would have slightly changed the index syntax creation (and the query accordingly) as follows:

    create index t1_f3 on t1(case when nvl(state,’CLOSED’) = ‘CLOSED’ then to_number(null) else n1 end);

    select
    /*+ index(t1) t1_f3 */
    v1
    from
    t1
    where
    case when nvl(state,’CLOSED’) = ‘CLOSED’ then to_number(null) else n1 end = 100;

    Even if in this case it seems to be of no consequences in the query result

    What do you think?

    Best Regards

    Mohamed Houri

    Comment by Houri Mohamed — December 17, 2010 @ 9:16 am GMT Dec 17,2010 | Reply

    • Mohamed,

      I think the important point in your comment is that it emphasises the need to think carefully about the impact on NULL on such things.

      Whether your suggestion was appropriate or not (as Olekander suggests) is completely down to the business requirements – but it’s the technicians who have to remember that NULL can turn a simple business request into something a little more complex than first thought.

      Comment by Jonathan Lewis — December 21, 2010 @ 8:06 am GMT Dec 21,2010 | Reply

  5. Hi Mohamed,

    Your code treats NULL state exactly the same way as ‘CLOSED’ state, which unlikely to be a desired behavior.

    Regards,
    Oleksandr

    Comment by al0 — December 17, 2010 @ 9:33 am GMT Dec 17,2010 | Reply


RSS feed for comments on this post. TrackBack URI

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Theme: Rubric. Get a free blog at WordPress.com

Follow

Get every new post delivered to your Inbox.

Join 4,306 other followers