Oracle Scratchpad

April 2, 2014

Easy – Oops.

Filed under: Bugs,Function based indexes,Indexing,Oracle — Jonathan Lewis @ 7:47 pm BST Apr 2,2014

A question came up on OTN today asking for suggestions on how to enforce uniqueness on a pair of columns only when the second column was not null. There’s an easy and obvious solution – but I decided to clone the OP’s example and check that I’d typed my definition up before posting it; and the result came as a bit of a surprise. Here’s a demo script (not using the OP’s table):


create table t1  
(  
	col1	int not null,
	col2	varchar2(1)
);  

create unique index t1_i1 on t1( 
--	case col2 when null then cast(null as int) else col1 end,
--	case when col2 is null then cast(null as int) else col1 end,
	case when col2 is not null then col1 end,
	col2
)
;

insert into t1 values(1,null);
insert into t1 values(1,null);
insert into t1 values(1,'x');
insert into t1 values(1,'y');
insert into t1 values(1,'y');

commit;

column ind1_is   format a5
column ind1_when format 9999

set null N/A

select
	case when col2 is null then cast (null as int) else col1 end	ind1_is,
	case col2 when null then cast (null as int)  else col1 end	ind1_when
from 
	t1
;

The strategy is simple, you create a unique function-based index with two columns; the first column of the index id defined to show the first column of the table if the second column of the table is not null, the second column of the index is simply the second column of the table. So if the second column of the table is null, both columns in the index are null and there is no entry in the index; but if the second column of the table is not null then the index copies both columns from the table and a uniqueness test applies.

Based on the requirement and definition you would expect the first 4 of my insert statements to succeed and the last one to fail. The index will then have two entries, corresponding to my 3rd and 4th insertions.

I’ve actually shown three ways to use the case statement to produce the first column of the index. The last version is the cleanest, but the first option is the one I first thought of – it’s virtually a literal translation the original requirement. The trouble is, with my first definition the index acquired an entry it should not have got, and the second insert raised a “duplicate key” error; the error didn’t appear when I switched the syntax of the case statement to the second version.

That’s why the closing query of the demo is there – when you run it the two values reported should be the same as each other for all four rows in the table – but they’re not. This is what I got on 11.2.0.4:


IND1_IS IND1_WHEN
------- ---------
N/A             1
N/A             1
      1         1
      1         1


I’m glad I did a quick test before suggesting my original answer.

Anyone who has other versions of Oracle available is welcome to repeat the test and report back which versions they finding working correctly (or not).

Update

It’s not a bug (see note 2 below from Jason Bucata), it’s expected behaviour.

 

Tweaking

Filed under: Function based indexes,Indexing,Oracle — Jonathan Lewis @ 6:24 pm BST Apr 2,2014

The following question came up on OTN recently:

Which one gives better performance? Could please explain.

1) nvl( my_column, ‘N’) <> ‘Y’

2) nvl( my_column, ‘N’) = ‘N’

It’s a question that can lead to a good 20 minute discussion – if you were in some sort of development environment and had a fairly free hand to do whatever you wanted.

The most direct answer is that you could expect the performance to be the same whichever option you chose – but the results might be different, of course, unless you had a constraint on my_column that ensured that it would hold only null, ‘Y’, or ‘N’.  (Reminder:  the constraint check (my_column in (‘Y’,’N’) will allow nulls in the column).

On the other hand, if you create a function-based index on nvl(my_column,’N’) then the second option would give the optimizer the opportunity to use the index – which could make it the more efficient option if a sufficiently small quantity of the data from a large table matched the predicate. Of course in this case you would need a histogram on the hidden column definition supporting the index so that the optimizer could detect the data skew.

But if a function-based index is a step in the right direction it’s worth thinking about how to pick the best function-based index.


create index t1_i1 on t1(case my_column when 'Y' then null else 'N' end);

execute dbms_stats.gather_table_stats(user,'t1',method_opt=>'for all columns size 1');

select * from t1 where case my_column when 'Y' then null else 'N' end = 'N';

Provided you can change the queries that use the original predicate, you can create a function-based index that is the smallest possible index to assist your query, and you won’t need a histogram to allow the optimizer to get the correct cardinality since there will be just one distinct value in the index (and the underlying column).

It’s possible to go a little further – what if the typical query that started this discussion was really something like:


select  *
from    t1
where   nvl(my_column, 'N') = 'N'
and     other_column = {some value}

If the nvl() predicate always returned a small fraction of the data we might engineer an index to handle the requirement of the entire where clause:


create index t1_i2 on t1(case my_column when 'Y' then null else other_column end);
execute dbms_stats.gather_table_stats(user,'t1',method_opt=>'for all hidden columns size 254');

In this case you might need a histogram on the hidden column definition supporting the index, which is why I’ve changed my method_opt. We’ve constructed an index that is the smallest possible index that will satisfy the query as precisely as possible while giving the optimizer the best chance of producing accurate cardinality estimates. [Update: until the appearance of Kirill's comment (#1) I forgot to point out that once again you have to change the original SQL from using the nvl() expression to using a case expression that matches the index.]

Footnote

It would be easy to fall into the trap of playing around with this type of tweaking every time you hit a performance problem. Bear in mind, though, that even indexes which are as small and accurately targeted as possible can introduce various types of overhead (contention, cost of gathering stats, instability of histogram collection); so always try to think through the potential costs as well as the benefits of such an approach - is it really worth the effort.

It’s also worth noting that things don’t always work the way you expect.

Finally, of course, you might choose to create “proper” virtual columns in 11g so that you can refer to them by name, rather than handling them indirectly as you have to when they are “undocumented” hidden columns generated by function-based indexes.

 

March 24, 2014

Min/Max

Filed under: Function based indexes,Indexing,Oracle — Jonathan Lewis @ 11:11 pm BST Mar 24,2014

One of my most-repeated observations about trouble-shooting Oracle is that things break when you start combining features. Here’s an example that demonstrates the point.

It’s possible to create “descending” indexes – or indexes with descending columns, as I prefer to call them, and there’s a special “min/max range scan” optimizer operation for a particular kind of index usage – demonstrated in the following code fragment (running under 11.2.0.4, and reporting the rowsource execution statistics):


create table t1(
	a number not null,
	b number not null,
	c number not null,
	padding varchar2(100)
);

insert into t1
select
	mod(object_id +   1245,1001),
	mod(object_id +   4545,1111),
	mod(object_id + 774545,  13),
	rpad('x',100,'x')
from
	all_objects
where
	rownum<=10000
;

commit;

create index t1_i1 on t1(b, c, a);

begin
	dbms_stats.gather_table_stats(
		ownname		 => user,
		tabname		 =>'T1',
		method_opt	 => 'for all columns size 1'
	);
end;
/

alter session set statistics_level = all;

select
	max(a)
from	t1
where	b=1
and	c=1
;

select * from table(dbms_xplan.display_cursor(null,null,'allstats last'));

------------------------------------------------------------------------------------------------
| Id  | Operation                    | Name  | Starts | E-Rows | A-Rows |   A-Time   | Buffers |
------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |       |      1 |        |      1 |00:00:00.01 |       2 |
|   1 |  SORT AGGREGATE              |       |      1 |      1 |      1 |00:00:00.01 |       2 |
|   2 |   FIRST ROW                  |       |      1 |      1 |      1 |00:00:00.01 |       2 |
|*  3 |    INDEX RANGE SCAN (MIN/MAX)| T1_I1 |      1 |      1 |      1 |00:00:00.01 |       2 |
------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   3 - access("B"=1 AND "C"=1)

Note how the optimizer is aware that it can find a path aiming for one specific index entry (FIRST ROW), using the (min/max) option on the index.

So what happens when we change the index:


drop index t1_i1;
create index t1_i1 on t1(b, c, a desc);

execute dbms_stats.gather_table_stats(user,'t1',method_opt=>'for all columns size 1')

select
	max(a)
from	t1
where	b=1
and	c=1
;

select * from table(dbms_xplan.display_cursor(null,null,'allstats last'));

-------------------------------------------------------------------------------------
| Id  | Operation         | Name  | Starts | E-Rows | A-Rows |   A-Time   | Buffers |
-------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |       |      1 |        |      1 |00:00:00.01 |       2 |
|   1 |  SORT AGGREGATE   |       |      1 |      1 |      1 |00:00:00.01 |       2 |
|*  2 |   INDEX RANGE SCAN| T1_I1 |      1 |      1 |      1 |00:00:00.01 |       2 |
-------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("B"=1 AND "C"=1)

We’ve changed the index so that the final column is descending and although the optimizer is smart enough to determine that the query can still be satisfied without visiting the table, it can no longer use the min/max optimization, instead it does a range scan the section of the index matching the where clause, using the normal aggregate operation to find max(a).

In this tiny example the difference in the work load is barely perceptible – but there will be cases where the change in plan will make a difference in performance. As ever, when taking advantage of a feature that looks useful you have to try to imagine all the possible cases for the feature that might appear in your application and test them to see whether they introduce an unexpected (and possibly unacceptable) overhead.

Footnote:

There is a workaround in this case – not that I would suggest using it in a production system. If you remember that descending columns are implemented through a function-based index using the sys_op_descend() function, you can write code like this:

select
	utl_raw.cast_to_number(hextoraw(sys_op_undescend(MIN(sys_op_descend(a)))))	a
from
	t1
where
	b = 1
and	c = 1
;

select * from table(dbms_xplan.display_cursor(null,null,'allstats last'));

------------------------------------------------------------------------------------------------
| Id  | Operation                    | Name  | Starts | E-Rows | A-Rows |   A-Time   | Buffers |
------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |       |      1 |        |      1 |00:00:00.01 |       2 |
|   1 |  SORT AGGREGATE              |       |      1 |      1 |      1 |00:00:00.01 |       2 |
|   2 |   FIRST ROW                  |       |      1 |      1 |      1 |00:00:00.01 |       2 |
|*  3 |    INDEX RANGE SCAN (MIN/MAX)| T1_I1 |      1 |      1 |      1 |00:00:00.01 |       2 |
------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   3 - access("B"=1 AND "C"=1)

These results came from an instance of 11.2.0.4, but the limitation is still present in 12.1.0.1

March 2, 2014

Auto Sample Size

Filed under: Function based indexes,Indexing,Infrastructure,IOT,LOBs,Oracle,Statistics — Jonathan Lewis @ 6:38 pm BST Mar 2,2014

In the past I have enthused mightily about the benefits of the approximate NDV mechanism and the benefit of using auto_sample_size to collect statistics in 11g; however, as so often happens with Oracle features, there’s a down-side or boundary condition, or edge case. I’ve already picked this up once as an addendum to an earlier blog note on virtual stats, which linked to an article on OTN describing how the time taken to collect stats on a table increased dramatically after the addition of an index – where the index had this definition:


create bitmap index i_s_rmp_eval_csc_msg_actions on
    s_rmp_evaluation_csc_message (
        decode(instr(xml_message_text,' '),0,0,1)
    )
;

As you might guess from the column name, this is an index based on an XML column, which is stored as a CLOB.

In a similar vein, I showed you a few days ago an old example I had of indexing a CLOB column with a call to dbms_lob.getlength(). Both index examples suffer from the same problem – to support the index Oracle creates a hidden (virtual) column on the table that can be used to hold statistics about the values of the function; actual calculated values for the function call are stored in the index but not on the table itself – but it’s important that the optimizer has the statistics about the non-existent column values.

So what happens when Oracle collects table statistics – if you’ve enable the approximate NDV feature Oracle does a 100% sample, which means it has to call the function for every single row in the table. You will appreciate that the decode(instr()) function on the LOB column is going to read every single LOB in turn from the table – it’s not surprising that the time taken to calculate stats on the table jumped from a few minutes to a couple of hours. What did surprise me was that my call to dbms_lob.getlength() also seemed to read every lob in my example rather than reading the “LOB Locator” data that’s stored in the row – one day I’ll take a look into why that happened.

Looking at these examples it’s probably safe to conclude that if you really need to index some very small piece of “flag” information from a LOB it’s probably best to store it as a real column on the table – perhaps populated through a trigger so you don’t have to trust every single piece of front-end code to keep it up to date. (It would be quite nice if Oracle gave us the option for a “derived” column – i.e. one that could be defined in the same sort of way as a virtual column, with the difference that it should be stored in the table.)

So virtual columns based on LOBs can create a performance problem for the approximate NDV mechanism;  but the story doesn’t stop there because there’s another “less commonly used” feature of Oracle that introduces a different threat – with no workaround – it’s the index organized table (IOT). Here’s a basic example:

create table iot1 (
        id1	number(7,0),
	id2	number(7,0),
	v1	varchar2(10),
	v2	varchar2(10),
	padding	varchar2(500),
        constraint iot1_pk primary key(id1, id2)
)
organization index
including id2
overflow
;

insert into iot1
with generator as (
	select	--+ materialize
		rownum id
	from dual
	connect by
		level <= 1e4
)
select
        mod(rownum,20)                  id1,
        trunc(rownum,100)               id2,
        to_char(mod(rownum,20))         v1,
        to_char(trunc(rownum,100))      v2,
        rpad('x',500,'x')               padding
from
	generator	v1,
	generator	v2
where
	rownum <= 1e5
;

commit;

alter system flush buffer_cache;

alter session set events '10046 trace name context forever';

begin
	dbms_stats.gather_table_stats(
		ownname		 => user,
		tabname		 =>'IOT1',
		method_opt	 => 'for all columns size 1'
	);
end;
/

alter session set events '10046 trace name context off';

You’ll notice I’ve created the table then inserted the data – if I did a “create table as select” Oracle would have sorted the data before inserting it, and that would have helped to hide the problem I’m trying to demonstrate. As it is my overflow segment is very badly ordered relative to the “top” (i.e. index) segment – in fact I can see after I’ve collected stats on the table that the clustering_factor on the index is 100,000 – an exact match for the rows in the table.

Running 11.2.0.4, with a 1MB uniform extent, freelist management, and 8KB block size the index segment held 279 leaf blocks, the overflow segment (reported in view user_tables as SYS_IOT_OVER_81594) held 7,144 data blocks.

So what interesting things do we find in a 10046 trace file after gathering stats – here are the key details from the tkprof results:

SQL ID: 7ak95sy9m1s4f Plan Hash: 1508788224

select /*+  full(t)    no_parallel(t) no_parallel_index(t) dbms_stats
  cursor_sharing_exact use_weak_name_resl dynamic_sampling(0) no_monitoring
  no_substrb_pad  */to_char(count("ID1")),to_char(substrb(dump(min("ID1"),16,
  0,32),1,120)),to_char(substrb(dump(max("ID1"),16,0,32),1,120)),
  to_char(count("ID2")),to_char(substrb(dump(min("ID2"),16,0,32),1,120)),
  to_char(substrb(dump(max("ID2"),16,0,32),1,120)),to_char(count("V1")),
  to_char(substrb(dump(min("V1"),16,0,32),1,120)),
  to_char(substrb(dump(max("V1"),16,0,32),1,120)),to_char(count("V2")),
  to_char(substrb(dump(min("V2"),16,0,32),1,120)),
  to_char(substrb(dump(max("V2"),16,0,32),1,120)),to_char(count("PADDING")),
  to_char(substrb(dump(min("PADDING"),16,0,32),1,120)),
  to_char(substrb(dump(max("PADDING"),16,0,32),1,120))
from
 "TEST_USER"."IOT1" t  /* NDV,NIL,NIL,NDV,NIL,NIL,NDV,NIL,NIL,NDV,NIL,NIL,NDV,
  NIL,NIL*/

call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        1      0.00       0.00          0          0          0           0
Execute      1      0.00       0.00          0          0          0           0
Fetch        1      0.37       0.37       7423     107705          0           1
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total        3      0.37       0.37       7423     107705          0           1

Misses in library cache during parse: 1
Optimizer mode: ALL_ROWS
Parsing user id: 62     (recursive depth: 1)
Number of plan statistics captured: 1

Rows (1st) Rows (avg) Rows (max)  Row Source Operation
---------- ---------- ----------  ---------------------------------------------------
         1          1          1  SORT AGGREGATE (cr=107705 pr=7423 pw=0 time=377008 us)
    100000     100000     100000   APPROXIMATE NDV AGGREGATE (cr=107705 pr=7423 pw=0 time=426437 us cost=10 size=23944 card=82)
    100000     100000     100000    INDEX FAST FULL SCAN IOT1_PK (cr=107705 pr=7423 pw=0 time=298380 us cost=10 size=23944 card=82)(object id 85913)

********************************************************************************

SQL ID: 1ca2ug8s3mm5z Plan Hash: 2571749554

select /*+  no_parallel_index(t, "IOT1_PK")  dbms_stats cursor_sharing_exact
  use_weak_name_resl dynamic_sampling(0) no_monitoring no_substrb_pad
  no_expand index(t,"IOT1_PK") */ count(*) as nrw,count(distinct
  sys_op_lbid(85913,'L',t.rowid)) as nlb,null as ndk,
  sys_op_countchg(sys_op_lbid(85913,'O',"V1"),1) as clf
from
 "TEST_USER"."IOT1" t where "ID1" is not null or "ID2" is not null

call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        1      0.00       0.00          0          0          0           0
Execute      1      0.00       0.00          0          0          0           0
Fetch        1      0.16       0.16          0     100280          0           1
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total        3      0.16       0.16          0     100280          0           1

Misses in library cache during parse: 1
Optimizer mode: ALL_ROWS
Parsing user id: 62     (recursive depth: 1)
Number of plan statistics captured: 1

Rows (1st) Rows (avg) Rows (max)  Row Source Operation
---------- ---------- ----------  ---------------------------------------------------
         1          1          1  SORT GROUP BY (cr=100280 pr=0 pw=0 time=162739 us)
    100000     100000     100000   INDEX FULL SCAN IOT1_PK (cr=100280 pr=0 pw=0 time=164597 us cost=6 size=5900000 card=100000)(object id 85913)

The first query collects table and column stats, and we can see that the approximate NDV method has been used because of the trailing text: /* NDV,NIL,NIL,NDV,NIL,NIL,NDV,NIL,NIL,NDV,NIL,NIL,NDV,NIL,NIL*/. In this statement the hint /*+ full(t) */ has been interpreted to mean an index fast full scan, which is what we see in the execution plan. Although there are only 279 blocks in the index and 7,144 blocks in the overflow we’ve done a little over 100,000 buffer visits because for every index entry in the IOT top we’ve done a “fetch by rowid” into the overflow segment (the session stats records these as “table fetch continued row”). Luckily I had a small table so all those visits were buffer gets; on a very large table it’s quite possible that a significant fraction of those buffer gets will turn into single block physical reads.

Not only have we done one buffer visit per row to allow us to calculate the approximate NDV for the table columns, we’ve done the same all over again so that we can calculate the clustering_factor of the index. This is a little surprising since the “rowid” for an item in the overflow section is stored in the index segment but (as you can see in the second query in the tkprof output) Oracle has used column v1 (the first in the overflow segment) in the call to the sys_op_countchg() function where the equivalent call for an ordinary index would use t.rowid so, presumably, the code HAS to access the overflow segment. The really strange thing about this is that the same SQL statement has a call to sys_op_lbid() which uses the (not supposed to exist in IOTs) rowid – so it looks as if it ought to be possible for sys_op_countchg() to do the same.

So – big warning on upgrading to 11g: if you’ve got IOTs with overflows and you switch to auto_sample_size and enable approximate NDV then the time taken to gather stats on those IOTs may (depending to a large extent on the data clustering) take much longer than it used to.

February 25, 2014

FBI Skip Scan

Filed under: Bugs,Function based indexes,Indexing,Oracle — Jonathan Lewis @ 6:45 pm BST Feb 25,2014

A recent posting on the OTN database forum highlighted a bug (or defect, or limitation) in the way that the optimizer handles index skip scans with “function-based” indexes – it doesn’t do them. The defect has probably been around for a long time and demonstrates a common problem with testing Oracle – it’s very easy for errors in the slightly unusual cases to be missed; it also demonstrates a general principle that it can take some time for a (small) new feature to be applied consistently across the board.

The index definitions in the original posting included expressions like substr(nls_lower(colX), 1, 25), and it’s possible for all sorts of unexpected effects to appear when your code starts running into NLS  settings, so I’ve created a much simpler example. Here’s my table definition, with three index definitions:

create table t1
as
with generator as (
	select	--+ materialize
		rownum id
	from dual
	connect by
		level <= 1e4
)
select
	mod(rownum - 1,10)		mod1,
	rownum				id1,
	1 + mod(rownum - 1,10)		mod2,
	1 + rownum 			id2,
	lpad(rownum,7)			small_vc,
	rpad('x',100)			padding
from
	generator	v1,
	generator	v2
where
	rownum <= 1e5 ; create index t1_normal on t1(mod2, id2); create index t1_split  on t1(mod2, id1 + 1); create index t1_fbi    on t1(mod1 + 1, id1 + 1); begin 	dbms_stats.gather_table_stats( 		ownname		 => user,
		tabname		 =>'T1',
		method_opt	 => 'for all columns size 1'
	);
end;
/

Note that I gathered stats after creating the indexes (generally you would be safe creating the indexes after the gathering the stats from 10g onwards) because I needed to gather stats on the virtual columns that support the function-based indexes. As you can see, mod2 = 1 + mod1, and id2 = 1 + id1, and I’ve created three indexes which (internally) are the same although they are defined in three different ways.

So here are three queries which select the same data although, again, the queries are not all exactly the same. Since the queries are doing the same things with structures that are the same we might hope to see the optimizer using the same strategy for all three. In fact, to make things easier for the optimizer, I’ve even told it exactly what to do in two of the cases:


select	*
from	t1
where
	id2 = 50000
;

select	/*+ index_ss(t1(mod2)) */
	*
from	t1
where
	id1 + 1 = 50000
;

select	/*+ index_ss(t1 t1_fbi) */
	*
from	t1
where
	id1 + 1 = 50000
;

In the first query I’ve referenced only the second column of the “normal” index with a high precision query – expecting the optimizer to find the index skip scan path. In the second and third queries I’ve reference the (id1 + 1) index expression which appears as the second column in two different indexes; to help the optimizer along I’ve hinted the index skip scan for each of the indexes in turn.

If the first query can do an index skip scan then the second and third queries should be able to do the same because the physical structures and the statistics (and the actual stored and requested values) are all the same. Here are the three plans (pulled from memory in an 11.2.0.4 instance).


SQL_ID  bvjstz9xa3n7a, child number 0
-------------------------------------
select * from t1 where  id2 = 50000

Plan hash value: 2078113469

-----------------------------------------------------------------------------------------
| Id  | Operation                   | Name      | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |           |       |       |    12 (100)|          |
|   1 |  TABLE ACCESS BY INDEX ROWID| T1        |     1 |   125 |    12   (0)| 00:00:01 |
|*  2 |   INDEX SKIP SCAN           | T1_NORMAL |     1 |       |    11   (0)| 00:00:01 |
-----------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("ID2"=50000)
       filter("ID2"=50000)

SQL_ID  95573qx3w62q2, child number 0
-------------------------------------
select /*+ index_ss(t1(mod2)) */  * from t1 where  id1 + 1 = 50000

Plan hash value: 1422030023

----------------------------------------------------------------------------------------
| Id  | Operation                   | Name     | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |          |       |       |   271 (100)|          |
|   1 |  TABLE ACCESS BY INDEX ROWID| T1       |     1 |   130 |   271   (2)| 00:00:02 |
|*  2 |   INDEX FULL SCAN           | T1_SPLIT |     1 |       |   270   (2)| 00:00:02 |
----------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("T1"."SYS_NC00007$"=50000)
       filter("T1"."SYS_NC00007$"=50000)

SQL_ID  d4t48va4zrk1q, child number 0
-------------------------------------
select /*+ index_ss(t1 t1_fbi) */  * from t1 where  id1 + 1 = 50000

Plan hash value: 3151334857

--------------------------------------------------------------------------------------
| Id  | Operation                   | Name   | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |        |       |       |   271 (100)|          |
|   1 |  TABLE ACCESS BY INDEX ROWID| T1     |     1 |   130 |   271   (2)| 00:00:02 |
|*  2 |   INDEX FULL SCAN           | T1_FBI |     1 |       |   270   (2)| 00:00:02 |
--------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("T1"."SYS_NC00007$"=50000)
       filter("T1"."SYS_NC00007$"=50000)

The first query automatically does an index skip scan on the normal index.

The second and third queries calculate the cardinality correctly, reference the right “virtual” column correctly, and come close to obeying the hint by using the required index – and then spoil things by doing an index full scan instead of an index skip scan. Checking the 10053 trace file I found that the optimizer hadn’t even considered the possibility of a skip scan for the last two queries – it had jumped straight to the full scan. (If the optimizer had ignored the hint completely I would have said that this behaviour was simply a limitation of FBIs – but since the optimizer has got partway there I think it’s probably a bug.)

Once you’ve got this far, of course, you might wonder if you could work around the problem in 11g by using virtual columns – so that’s the next test.

drop index t1_split;
drop index t1_fbi;

alter table t1 add (mod_virtual generated always as (mod1 + 1) virtual);
alter table t1 add (id_virtual generated always as (id1 + 1) virtual);

begin
	dbms_stats.gather_table_stats(
		ownname		 => user,
		tabname		 =>'T1',
		method_opt	 => 'for all columns size 1'
	);
end;
/

create index t1_virtual on t1(mod_virtual, id_virtual);

select
	/*+ index_ss(t1) */
	*
from	t1
where
	id_virtual = 50000

I started by dropping the two function-based indexes (because I’m about to create “proper” virtual columns which are identical to the hidden “virtual columns” that were supporting the function-based indexes – if I didn’t do this I’d hit Oracle error “ORA-54015: Duplicate column expression was specified”). Then I declared my virtual columns, collected stats on the whole table (I could have used method_opt=>’for all hidden columns size 1′) so that there would be stats on those columns, then created an index using the virtual columns and tried to get a skip scan on that index. Any bets ?

select  /*+ index_ss(t1) */  * from t1 where  id_virtual = 50000

Plan hash value: 4250443541

------------------------------------------------------------------------------------------
| Id  | Operation                   | Name       | Rows  | Bytes | Cost (%CPU)| Time     |
------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |            |       |       |   271 (100)|          |
|   1 |  TABLE ACCESS BY INDEX ROWID| T1         |     1 |   133 |   271   (2)| 00:00:02 |
|*  2 |   INDEX FULL SCAN           | T1_VIRTUAL |     1 |       |   270   (2)| 00:00:02 |
------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("ID_VIRTUAL"=50000)
       filter("ID_VIRTUAL"=50000)

Bad luck – even though “proper” virtual columns can do wonders, they don’t get around this problem. Anyone who’s used multi-column function-based indexes or added virtual columns to indexes might want to look at their code and execution paths to see if there are opportunities for skip scans that the optimizer is failing to take. If there are you might like to raise an SR with Oracle – and point them to this blog as a repeatable test case.

I tried several searches for a relevant bug on MoS – unfortunately it was one of those occasions where the search result was either very small and didn’t have what I wanted, or was so large that I wasn’t going to check all the hits. If anyone does find a bug number, please tell us about it in the comments.

 

February 21, 2014

Indexing LOBs

Filed under: Function based indexes,Indexing,Infrastructure,LOBs,Oracle — Jonathan Lewis @ 6:42 pm BST Feb 21,2014

Many years ago, possibly when most sites were still using Oracle 8i, a possible solution to a particular customer problem was to create a function-based index on a CLOB column using the dbms_lob.getlength() function call. I can’t find the notes explaining why this was necessary (I usually have some sort of clue – such as the client name – in the script, but in this case all I had was a comment that “the manuals say you can’t do this, but it works provided you wrap the dbms_lob call inside a deterministic function”).

I never worked out why the dbms_lob.getlength() function wasn’t declared as deterministic – especially since it came complete with a most restrictive restricts_references pragma – so I had just assumed there was probably some good reason based on strange side effects when national language charactersets came into play. But here’s a little detail I noticed recently about the dbms_lob.getlength() function: it became deterministic in 11g, so if the client decided to implement my suggestion (which included the usual sorts of warnings) it’s now legal !

Footnote – the length() function has been deterministic and usable with LOBs for a long time, certainly since late 9i, but in 8i length(lob_col) will produce Oracle error “ORA-00932: inconsistent datatypes”

November 14, 2013

32K Columns

Filed under: 12c,Function based indexes,Indexing,Infrastructure,Oracle — Jonathan Lewis @ 8:06 am BST Nov 14,2013

Oracle 12c has increased the maximum length of character-based columns to 32K bytes – don’t get too excited, they’re stored out of lines (so similar in cost to LOBs) and need some modification to the parameter file and data dictionary (starting the database in upgrade mode) before you can use them.

Richard Foote has a pair of articles on indexing such columns:

Be cautious about enabling this option and test carefully – there are going to be a number of side effects, and some of them may require a significant investment in time to resolve. The first one that came to my mind was that if you’ve created a function-based index on a pl/sql function that returns a varchar2() type and haven’t explicitly created the index on a substr() of the return value then the data type of the function’s return value will change from the current default of varchar2(4000) to varchar2(32767) – which means the index will become invalid and can’t be rebuilt or recreated.

Obviously you can redefine the index to include an explicit substr() call – but then you have to find all the code that was supposed to use the index and modify it accordingly.

October 2, 2013

FBI decode

Filed under: Bugs,CBO,Function based indexes,Indexing,Oracle — Jonathan Lewis @ 8:26 am BST Oct 2,2013

It probably won’t surprise many people to hear me say that the decode() function can be a bit of a nuisance; and I’ll bet that quite a lot of people have had trouble occasionally trying to get function-based indexes that use this function to behave properly. So (to put it all together and support the general directives that case is probably a better choice than decode() and that the cast() operator is an important thing to learn) here’s an example of how function-based indexes don’t always allow you to work around bad design/code. (Note: this is a model of a problem I picked up at a client site, stripped to a minimum – you have to pretend that I’m not allowed to fix the problem by changing code).

(more…)

January 6, 2013

Blog advert

Filed under: Bugs,CBO,Function based indexes,Indexing,Oracle — Jonathan Lewis @ 9:10 pm BST Jan 6,2013

Just a quick note to say that I found a blog over the weekend with a number of interesting posts, so I thought I’d pass it on: http://www.bobbydurrettdba.com/

There’s a really cute example (complete with test case) of an optimizer bug (possibly only in 11.1)  in the December archive: http://www.bobbydurrettdba.com/2012/12/04/index-causes-poor-performance-in-query-that-doesnt-use-it/

September 11, 2012

FBI Delete

Filed under: Bugs,Function based indexes,Indexing,Oracle — Jonathan Lewis @ 5:56 pm BST Sep 11,2012

A recent post on Oracle-l complained about an oddity when deleting through a function-based index.

I have a function based index but the CBO is not using it. The DML that I expect to have a plan with index range scan is doing a FTS. Its a simple DML that deletes 1000 rows at a time in a loop and is based on the column on which the FBI is created.

Although execution plans are mentioned, we don’t get to see the statement or the plan – and it’s always possible that there will be some clue in the (full) plan that tells us something about the code that the OP has forgotten to mention. However, function-based indexes have a little history of not doing quite what you expect, so I thought I’d take a quick look at the problem, starting with the simplest possible step – do function-based indexes and “normal” b-tree indexes behave differently on a delete. Here’s the data set I created for my test:
(more…)

April 23, 2012

NVL2()

Filed under: Function based indexes,Indexing,Oracle,Performance — Jonathan Lewis @ 5:43 pm BST Apr 23,2012

There are many little bits and pieces lurking in the Oracle code set that would be very useful if only you had had time to notice them. Here’s one that seems to be virtually unknown, yet does a wonderful job of eliminating calls to decode().

The nvl2() function takes three parameters, returning the third if the first is null and returning the second if the first is not null. This is  convenient for all sorts of example where you might otherwise use an expression involving  case or decode(), but most particularly it’s a nice little option if you want to create a function-based index that indexes only those rows where a column is null.
(more…)

December 30, 2011

FBI Bug

Filed under: Bugs,Execution plans,Function based indexes,Indexing,Oracle — Jonathan Lewis @ 5:47 pm BST Dec 30,2011

Here’s a funny little optimizer bug – though one that seems to have been fixed by at least 10.2.0.3. It showed up earlier on today in a thread on the OTN database forum. We’ll start (in 9.2.0.8) with a little table and two indexes – one normal, the other descending.
(more…)

June 30, 2011

Virtual bug

Filed under: Bugs,Function based indexes,Indexing,Oracle,Parallel Execution,Statistics,Troubleshooting — Jonathan Lewis @ 5:37 pm BST Jun 30,2011

I’ve said in the past that one of the best new features, in my view, in 11g was the appearance of proper virtual columns; and I’ve also been very keen on the new “approximate NDV” that makes it viable to collect stats with the “auto_sample_size”.

Who’d have guessed that if you put them both together, then ran a parallel stats collection it would break :(

The bug number Karen quotes (10013177.8) doesn’t (appear to) mention extended stats – but since virtual columns, function-based indexes, and extended stats share a number of implementation details I’d guess that they might be affected as well.

December 16, 2010

FBI oddities

Filed under: Function based indexes,Indexing — Jonathan Lewis @ 6:15 pm BST Dec 16,2010

Function-based indexes are wonderful things – but they don’t always work exactly as expected. Here’s an example of one such anomaly.

Imagine you have some type of “orders” table where most orders are in a “finished with” state, and you have a requirement to access the small number of orders in the “new” state. Here’s a sample data set to emulate this type of data requirement (created in 11.1.0.6, 1MB uniform extents, freelist management and 8KB blocks).


create table t1 (
	state		varchar2(10),
	n1		number,
	v1		varchar2(10),
	padding	varchar2(100)
);

insert into t1
select
	decode(mod(rownum,100),0,'OPEN','CLOSED'),
	rownum,
	lpad(rownum,10,0),
	rpad('x',100,'x')
from
	all_objects
where
	rownum <= 5000
;

(more…)

October 15, 2010

Good Nulls

Filed under: CBO,Function based indexes,Indexing,NULL,Tuning — Jonathan Lewis @ 6:17 pm BST Oct 15,2010

I’ve often been heard to warn people of the accidents that can happen when they forget about the traps that appear when you start allowing columns to be NULL – but sometimes NULLs are good, especially when it helps Oracle understand where the important (e.g. not null) data might be.

An interesting example of this came up on OTN a few months ago where someone was testing the effects of changing a YES/NO column into a YES/NULL column (which is a nice idea because it allows you to create a very small index on the YESes, and avoid creating a histogram to tell the optimizer that the number of YESes is small).

They were a little puzzled, though, about why their tests showed Oracle using an index to find data in the YES/NO case, but not using the index in the YES/NULL case. I supplied a short explanation on the thread, and was planning to post a description on the blog, but someone on the thread supplied a link to AskTom where Tom Kyte had already answered the question, so I’m just going to leave you with a link to his explanation.

Next Page »

The Rubric Theme Blog at WordPress.com.

Follow

Get every new post delivered to your Inbox.

Join 3,454 other followers