Oracle Scratchpad

September 9, 2014

Quiz Night

Filed under: Execution plans,Indexing,Oracle — Jonathan Lewis @ 6:46 pm BST Sep 9,2014

I have a table with several indexes on it, and I have two versions of a query that I might run against that table. Examine them carefully, then come up with some plausible reason why it’s possible (with no intervening DDL, DML, stats collection, parameter fiddling etc., etc., etc.) for the second form of the query to be inherently more efficient than the first.


select
        bit_1, id, small_vc, rowid
from
        bit_tab
where
        bit_1 between 1 and 3
;

prompt  ===========
prompt  Split query
prompt  ===========

select
        bit_1, id, small_vc, rowid
from
        bit_tab
where
        bit_1 = 1
or      bit_1 > 1 and bit_1 <= 3
;

Update / Answers

I avoided giving any details about the data and indexes in this example as I wanted to allow free rein to readers’ imagination  – and I haven’t been disappointed with the resulting suggestions. The general principles of allowing more options to the optimizer, effects of partitioning, and effects of skew are all worth considering when the optimizer CAN’T use an execution path that you think makes sense.  (Note: I didn’t say make it clear in my original question, but I wasn’t looking for cases where you could get a better path by hinting (or profiling) I was after cases where Oracle literally could not do what you wanted.)

The specific strategy I was thinking of when I posed the question was based on a follow-up to some experiments I had done with the cluster_by_rowid() hint. and (there was a little hint in the “several indexes” and more particularly the column name “bit_1″) I was looking at a data warehouse table with a number of bitmap indexes. So here’s the execution plan for the first version of the query  when there’s a simple bitmap index on bit_1.


------------------------------------------------------------------------
| Id  | Operation                    | Name    | Rows  | Bytes | Cost  |
------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |         |   600 | 18000 |    96 |
|   1 |  TABLE ACCESS BY INDEX ROWID | BIT_TAB |   600 | 18000 |    96 |
|   2 |   BITMAP CONVERSION TO ROWIDS|         |       |       |       |
|*  3 |    BITMAP INDEX RANGE SCAN   | BT1     |       |       |       |
------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   3 - access("BIT_1">=1 AND "BIT_1"<=3)

And here’s the plan for the second query:


------------------------------------------------------------------------
| Id  | Operation                    | Name    | Rows  | Bytes | Cost  |
------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |         |   560 | 16800 |    91 |
|   1 |  TABLE ACCESS BY INDEX ROWID | BIT_TAB |   560 | 16800 |    91 |
|   2 |   BITMAP CONVERSION TO ROWIDS|         |       |       |       |
|   3 |    BITMAP OR                 |         |       |       |       |
|*  4 |     BITMAP INDEX SINGLE VALUE| BT1     |       |       |       |
|   5 |     BITMAP MERGE             |         |       |       |       |
|*  6 |      BITMAP INDEX RANGE SCAN | BT1     |       |       |       |
------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   4 - access("BIT_1"=1)
   6 - access("BIT_1">1 AND "BIT_1"<=3)

Clearly the second plan is more complex than the first – moreover the added complexity had resulted in the optimizer getting a different cardinality estimate – but, with my data set, there’s a potential efficiency gain. Notice how lines 5 and 6 show a bitmap range scan followed by a bitmap merge: to do the merge Oracle has to “superimpose” the bitmaps for the different key values in the range scan to produce a single bitmap that it can then OR with the bitmap for bit_1 = 1 (“bitmap merge” is effectively the same as “bitmap or” except all the bitmaps come from the same index). The result of this is that when we convert to rowids the rowids are in table order. You can see the consequences in the ordering of the result set or, more importantly for my demo, in the autotrace statistics:


For the original query:
Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
        604  consistent gets
          0  physical reads
          0  redo size
      27153  bytes sent via SQL*Net to client
        777  bytes received via SQL*Net from client
         25  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
        600  rows processed


For the modified query
Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
        218  consistent gets
          0  physical reads
          0  redo size
      26714  bytes sent via SQL*Net to client
        777  bytes received via SQL*Net from client
         25  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
        600  rows processed

Note, particularly, the change in the number of consistent gets. Each table block I visited held two or three rows that I needed, in the first query I visit the data in order of (bit_1, rowid) and get each table block 3 time; in the second case I visit the data in order of rowid and only get each table block once (with a “buffer is pinned count” for subsequent rows from the same block).

Here’s the starting output from each query, I’ve added the rowid to the original select statements so that you can see the block ordering:


Original query
     BIT_1         ID SMALL_VC   ROWID
---------- ---------- ---------- ------------------
         1          2 2          AAAmeCAAFAAAAEBAAB
         1         12 12         AAAmeCAAFAAAAECAAB
         1         22 22         AAAmeCAAFAAAAEDAAB
         1         32 32         AAAmeCAAFAAAAEEAAB
         1         42 42         AAAmeCAAFAAAAEFAAB
         1         52 52         AAAmeCAAFAAAAEGAAB

Modified query
     BIT_1         ID SMALL_VC   ROWID
---------- ---------- ---------- ------------------
         1          2 2          AAAmeCAAFAAAAEBAAB
         2          3 3          AAAmeCAAFAAAAEBAAC
         3          4 4          AAAmeCAAFAAAAEBAAD
         1         12 12         AAAmeCAAFAAAAECAAB
         2         13 13         AAAmeCAAFAAAAECAAC
         3         14 14         AAAmeCAAFAAAAECAAD
         1         22 22         AAAmeCAAFAAAAEDAAB
         2         23 23         AAAmeCAAFAAAAEDAAC
         3         24 24         AAAmeCAAFAAAAEDAAD

By rewriting the query I’ve managed to force a “cluster by rowid” on the data access. Of course, the simpler solution would be to add the /*+ cluster_by_rowid() */ hint to the original query – but it doesn’t work for bitmap indexes, and when I found that it worked for B-tree indexes the next test I did was to try a single bitmap index, which resulted in my writing this note.

Footnote: I don’t really expect Oracle Corp. to modify their code to make the hint work with bitmaps, after all it’s only relevant in the special case of using a bitmap index with a range scan and no subsequent bitmap AND/OR/MINUS operations where it would be needed – and you’re not really expected to use a single bitmap index to access a table, we engineer bitmaps to take advantage of combinations.

August 21, 2014

Quiz night

Filed under: CBO,Indexing,NULL,Oracle,Troubleshooting,Tuning — Jonathan Lewis @ 6:05 pm BST Aug 21,2014

Here’s a script to create a table, with index, and collect stats on it. Once I’ve collected stats I’ve checked the execution plan to discover that a hint has been ignored (for a well-known reason):

create table t2
as
select
        mod(rownum,200)         n1,
        mod(rownum,200)         n2,
        rpad(rownum,180)        v1
from
        all_objects
where
        rownum <= 3000
;

create index t2_i1 on t2(n1);

begin
        dbms_stats.gather_table_stats(
                user,
                't2',
                method_opt => 'for all columns size 1'
        );
end;
/

explain plan for
select  /*+ index(t2) */
        n1
from    t2
where   n2 = 45
;

select * from table(dbms_xplan.display);

----------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost  |
----------------------------------------------------------
|   0 | SELECT STATEMENT  |      |    15 |   120 |    15 |
|*  1 |  TABLE ACCESS FULL| T2   |    15 |   120 |    15 |
----------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter("N2"=45)

Of course we don’t expect the optimizer to use the index because we didn’t declare n1 to be not null, so there may be rows in the table which do not appear in the index. The only option the optimizer has for getting the right answer is to use a full tablescan. So the question is this – how come Oracle will obey the hint in the following SQL statement:


explain plan for
select
        /*+
                leading (t2 t1)
                index(t2) index(t1)
                use_nl(t1)
        */
        t2.n1, t1.n2
from
        t2      t2,
        t2      t1
where
        t2.n2 = 45
and     t2.n1 = t1.n1
;

select * from table(dbms_xplan.display);

-------------------------------------------------------------------------------
| Id  | Operation                             | Name  | Rows  | Bytes | Cost  |
-------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                      |       |   225 |  3600 |  3248 |
|   1 |  NESTED LOOPS                         |       |   225 |  3600 |  3248 |
|   2 |   NESTED LOOPS                        |       |   225 |  3600 |  3248 |
|*  3 |    TABLE ACCESS BY INDEX ROWID BATCHED| T2    |    15 |   120 |  3008 |
|   4 |     INDEX FULL SCAN                   | T2_I1 |  3000 |       |     8 |
|*  5 |    INDEX RANGE SCAN                   | T2_I1 |    15 |       |     1 |
|   6 |   TABLE ACCESS BY INDEX ROWID         | T2    |    15 |   120 |    16 |
-------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   3 - filter("T2"."N2"=45)
   5 - access("T2"."N1"="T1"."N1")

I ran this on 11.2.0.4, but it does the same on earlier versions.

Update:

This was clearly too easy – posted at 18:04, answered correctly at 18:21. At some point in it’s evolution the optimizer acquired a rule that allowed it to infer unwritten “is not null” predicates from the join predicate.

 

 

 

June 19, 2014

Delete Costs

Filed under: Bugs,CBO,Execution plans,Hints,Indexing,Oracle,Performance — Jonathan Lewis @ 6:18 pm BST Jun 19,2014

One of the quirky little anomalies of the optimizer is that it’s not allowed to select rows from a table after doing an index fast full scan (index_ffs) even if it is obviously the most efficient (or, perhaps, least inefficient) strategy. For example:


create table t1
as
with generator as (
	select	--+ materialize
		rownum id
	from dual
	connect by
		level <= 1e4
)
select
	rownum			id,
	mod(rownum,100)		n1,
	rpad('x',100)		padding
from
	generator	v1,
	generator	v2
where
	rownum <= 1e5
;

create index t1_i1 on t1(id, n1);
alter table t1 modify id not null;

begin
	dbms_stats.gather_table_stats(
		ownname		 => user,
		tabname		 =>'T1',
		method_opt	 => 'for all columns size 1'
	);
end;
/

explain plan for
select /*+ index_ffs(t1) */ max(padding) from t1 where n1 = 0;

select * from table(dbms_xplan.display(null,null,'outline -note'));

In this case we can see that there are going to be 1,000 rows where n1 = 0 spread evenly across the whole table so a full tablescan is likely to be the most efficient strategy for the query, but we can tell the optimizer to do an index fast full scan with the hint that I’ve shown, and if the hint is legal (which means there has to be at least one column in it declared as not null) the optimizer should obey it. So here’s the plan my hinted query produced:


---------------------------------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |     1 |   104 |   207   (4)| 00:00:02 |
|   1 |  SORT AGGREGATE    |      |     1 |   104 |            |          |
|*  2 |   TABLE ACCESS FULL| T1   |  1000 |   101K|   207   (4)| 00:00:02 |
---------------------------------------------------------------------------

We’d have to examine the 10053 trace file to be certain, but it seems the optimizer won’t consider doing an index fast full scan followed by a trip to the table for a select statement (in passing, Oracle would have obeyed the skip scan – index_ss() – hint). It’s a little surprising then that the optimizer will obey the hint for a delete:


explain plan for
delete /*+ index_ffs(t1) cluster_by_rowid(t1) */ from t1 where n1 = 0;

select * from table(dbms_xplan.display(null,null,'outline -note'));

-------------------------------------------------------------------------------
| Id  | Operation             | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------
|   0 | DELETE STATEMENT      |       |  1000 |  8000 |    38  (11)| 00:00:01 |
|   1 |  DELETE               | T1    |       |       |            |          |
|*  2 |   INDEX FAST FULL SCAN| T1_I1 |  1000 |  8000 |    38  (11)| 00:00:01 |
-------------------------------------------------------------------------------

You might note three things from this plan. First, the optimizer can consider a fast full scan followed by a table visit (so why can’t we do that for a select); secondly that the cost of the delete statement is only 38 whereas the cost of the full tablescan in the earlier query was much larger at 207 – surprisingly Oracle had to be hinted to consider this fast full scan path, despite the fact that the cost was cheaper than the cost of the tablescan path it would have taken if I hadn’t included the hint; finally you might note the cluster_by_rowid() hint in the SQL – there’s no matching “Sort cluster by rowid” operation in the plan, even though this plan came from 11.2.0.4 where the mechanism and hint are available.

The most interesting of the three points is this: there is a bug recorded for the second one (17908541: CBO DOES NOT CONSIDER INDEX_FFS) reported as fixed in 12.2 – I wonder if this means that an index fast full scan followed by table access by rowid will also be considered for select statements in 12.2.

Of course, there is a trap – and something to be tested when the version (or patch) becomes available. Why is the cost of the delete so low (only 38, the cost of the index fast full scan) when the number of rows to be deleted is 1,000 and they’re spread evenly through the table ? It’s because the cost of a delete is actually calculated as the cost of the query: “select the rowids of the rows I want to delete but don’t worry about the cost of going to the rows to delete them (or the cost of updating the indexes that will have to be maintained, but that’s a bit irrelevant to the choice anyway)”.

So when Oracle does do a delete following an index fast full scan in 12.2, will it be doing it because it’s the right thing to do, or because it’s the wrong thing ?

To be continued … (after the next release/patch).

 

June 17, 2014

Cluster Nulls

Filed under: clusters,Indexing,Infrastructure,NULL,Oracle — Jonathan Lewis @ 7:39 am BST Jun 17,2014

Yesterday’s posting was a reminder that bitmap indexes, unlike B-tree indexes in Oracle,  do store entries where every column in the index is null. The same is true for cluster indexes – which are implemented as basic B-tree indexes. Here’s a test case I wrote to demonstrate the point.

drop table tc1;
drop cluster c including tables;

purge recyclebin;

create cluster c(val number);
create index c_idx on cluster c;
create table tc1 (val number, n1 number, padding varchar2(100)) cluster c(val);

insert into tc1
select
	decode(rownum,1,to_number(null),rownum), rownum, rpad('x',100)
from
	all_objects
where
	rownum <= 100
;

insert into tc1 select * from tc1;
insert into tc1 select * from tc1;
insert into tc1 select * from tc1;
insert into tc1 select * from tc1;
insert into tc1 select * from tc1;
commit;

analyze cluster c compute statistics;
execute dbms_stats.gather_table_stats(user,'tc1');

set autotrace traceonly explain

select * from tc1 where val = 2;
select * from tc1 where val is null;

set autotrace off

Here are the two execution plans from the above queries:


------------------------------------------------------------------------------
| Id  | Operation            | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
------------------------------------------------------------------------------
|   0 | SELECT STATEMENT     |       |    32 |  3424 |     1   (0)| 00:00:01 |
|   1 |  TABLE ACCESS CLUSTER| TC1   |    32 |  3424 |     1   (0)| 00:00:01 |
|*  2 |   INDEX UNIQUE SCAN  | C_IDX |     1 |       |     0   (0)| 00:00:01 |
------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("VAL"=2)

--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      |    32 |  3424 |    14   (0)| 00:00:01 |
|*  1 |  TABLE ACCESS FULL| TC1  |    32 |  3424 |    14   (0)| 00:00:01 |
--------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter("VAL" IS NULL)

Spot the problem: the second query doesn’t use the index. So, despite the fact that I said that fully null index entries are stored in cluster indexes you might (as the first obvious question) wonder whether or not I was right. So here’s a piece of the symbolic dump of the index.


kdxconro 100                -- Ed: 100 entries (rows) in the leaf block

row#0[8012] flag: -----, lock: 0, data:(8):  02 00 02 0b 00 00 01 00
col 0; len 2; (2):  c1 03

row#1[7999] flag: -----, lock: 0, data:(8):  02 00 02 0c 00 00 01 00
col 0; len 2; (2):  c1 04

...

row#98[6738] flag: -----, lock: 2, data:(8):  02 00 02 6d 00 00 01 00
col 0; len 2; (2):  c2 02

row#99[8025] flag: -----, lock: 0, data:(8):  02 00 02 0a 00 00 01 00
col 0; NULL

The NULL entry is right there – sorted as the high value – just as it is in a bitmap index. But the optimizer won’t use it, even if you hint it.

Of course, Oracle uses it internally when inserting rows (how else, otherwise, would it rapidly find which the heap block needed for the next NULL insertion) – but it won’ use it to retrieve the data, it always uses a full table (cluster) scan. That’s really a little annoying if you’ve made the mistake of allowing nulls into your cluster.

There is a workaround, of course – in this case (depending on version) you could create a virtual column with index or a function-based index that (for example) uses the nvl2() function to convert nulls to a know value and all non-null entries to null; this would give you an index on just the original nulls which would be as small as possible and also have a very good clustering_factor. You’d have to change the code from “column is not null” to a predicate that matched the index, though. For example:


create index tc1_null on tc1(nvl2(val,null,0));

execute dbms_stats.gather_table_stats(user,'tc1', method_opt=>'for all hidden columns size 1');

select * from tc1 where nvl2(val,null,0) = 0;

----------------------------------------------------------------------------------------
| Id  | Operation                   | Name     | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |          |    32 |  3456 |     2   (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID| TC1      |    32 |  3456 |     2   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN          | TC1_NULL |    32 |       |     1   (0)| 00:00:01 |
----------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access(NVL2("VAL",NULL,0)=0)

I’ve included the compress keyword (which implicitly means “all columns” for a non-unique index) in the definition since you only need 4 repetitions of the single-byte entry that is the internal representation of zero before you start to win on the storage trade-off – but since it’s likely to be a small index anyway that’s not particularly important. (Reminder: although Oracle collects index stats on index creation, you still need to collect column stats on the hidden column underlying the function-based columns on the index to give the optimizer full information.)

June 16, 2014

Bitmap Nulls

Filed under: bitmaps,Indexing,Infrastructure,NULL,Oracle — Jonathan Lewis @ 9:08 am BST Jun 16,2014

It’s fairly well known that in Oracle B-tree indexes on heap tables don’t hold entries where all the indexed columns are all null, but that bitmap indexes will hold such entries and execution plans can for predicates like “column is null” can use bitmap indexes. Here’s a little test case to demonstrate the point (I ran this last on 12.1.0.1):


create table t1 (val number, n1 number, padding varchar2(100));

insert into t1
select
	decode(rownum,1,to_number(null),rownum), rownum, rpad('x',100)
from
	all_objects
where
	rownum <= 1000
;

insert into t1 select * from t1;
insert into t1 select * from t1;
insert into t1 select * from t1;
insert into t1 select * from t1;
insert into t1 select * from t1;

commit;

create bitmap index t1_b1 on t1(val);

execute dbms_stats.gather_table_stats(user,'t1');

set autotrace traceonly explain

select * from t1 where val is null;

set autotrace off

The execution plan for this query, in the system I happened to be using, looked like this:


Execution Plan
----------------------------------------------------------
Plan hash value: 1201576309

---------------------------------------------------------------------------------------------
| Id  | Operation                           | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                    |       |    32 |  3488 |     8   (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID BATCHED| T1    |    32 |  3488 |     8   (0)| 00:00:01 |
|   2 |   BITMAP CONVERSION TO ROWIDS       |       |       |       |            |          |
|*  3 |    BITMAP INDEX SINGLE VALUE        | T1_B1 |       |       |            |          |
---------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   3 - access("VAL" IS NULL)

Note that the predicate section shows us that we used the “column is null” predicate as an access predicate into the index.

Of course, this is a silly little example – I’ve only published it to make a point and to act as a reference case if you ever need to support a claim. Normally we don’t expect a single bitmap index to be a useful way to access a table, we tend to use combinations of bitmap indexes to combine a number of predicates so that we can minimise the trips to a table as efficiently as possible. (And we certainly DON’T create a bitmap index on an OLTP system because it lets us access NULLs by index — OLTP and bitmaps don’t get on well together.)

If you do a symbolic block dump, by the way, you’ll find that the NULL is represented by the special “length byte” of 0xFF with no following data.

May 7, 2014

Quiz Night

Filed under: Indexing,Oracle — Jonathan Lewis @ 2:18 pm BST May 7,2014

Okay – so it’s not night time in my home time-zone, but I’m in Singapore at the moment so it’s night time.

A very simple little quiz – so I’ve disabled comments for the moment and will re-enable them tomorrow morning to allow more people to have a chance to see the question without the solution.

Explain the anomaly displayed in the following “cut-n-paste” from a session running SQL*Plus on 11.2.0.4:

SQL> create unique index t1_i1 on t1(v1 desc);
create unique index t1_i1 on t1(v1 desc)
                                *
ERROR at line 1:
ORA-01452: cannot CREATE UNIQUE INDEX; duplicate keys found

SQL> create unique index t1_i1 on t1(v1);

Index created.

Answer

Well it didn’t take long for an answer and several bits of related infomration to show up – as Martin pointed out, all I have to do is insert NULL into the table twice.

To create an entry in a descending index, Oracle takes the 1’s-complement of each column and appends an 0xFF byte to each column – except in the case of a null column where the null is replaced with a 0x00. (And, as Sayan points out, funny things happen if you have a varchar2() column which has already reached the 4,000 byte limit)

The point of the 1’s-complement is that if you walk through the stored values in ascending order you’re walking through the original values in descending – provided you have the 0xFF on the end of each non-null entry.

 

April 25, 2014

Modify PK – 2

Filed under: Indexing,Oracle — Jonathan Lewis @ 5:48 pm BST Apr 25,2014

In an earlier posting I described how we can play games with primary key indexes in 12c because you can create multiple indexes on a table for the same (ordered) column list provided they have some differences in attributes and only one of them is visible at a time. But how, if you’re not on 12c, can you change a primary key index from unique to non-unique (or vice versa, as this question on OTN wants) without any down-time ?

Of course you can’t “change” the uniqueness of an index – that attribute embedded in the way that Oracle stores rowids for the index – but you can create a new index for the constraint; and you can’t avoid the little bit of “not quite down”-time that it takes to start and finish an online rebuild. But how do you get around the limitation of Oracle error: ORA-01408: such column list already indexed

Easy – though it requires you to do some work you would prefer to avoid – just work through an intermediate step using an index with an extra column. Here’s an example where I start with a non-unique index supporting the PK and end up with a unique index.


create table t1 as select * from all_objects where rownum <= 10000;

create index t1_pk on t1(object_id);

alter table t1 add constraint t1_pk primary key(object_id);

create index t1_i1 on t1(object_id, 0) online;

alter table t1 modify primary key using index t1_i1;

drop index t1_pk;

create unique index t1_pk on t1(object_id) online;

alter table t1 modify primary key using index t1_pk;

drop index t1_i1;

April 18, 2014

Bitmap loading

Filed under: bitmaps,Indexing,Oracle — Jonathan Lewis @ 12:43 pm BST Apr 18,2014

Everyone “knows” that bitmap indexes are a disaster (compared to B-tree indexes) when it comes to DML. But at an event I spoke at recently someone made the point that they had observed that their data loading operations were faster when the table being loaded had bitmap indexes on it than when it had the equivalent B-tree indexes in place.

There’s a good reason why this can be the case.  No prizes for working out what it is – and I’ll supply an answer in a couple of days time.  (Hint – it may also be the reason why Oracle doesn’t use bitmap indexes to avoid the “foreign key locking” problem).

Answer

As Martin (comment 3) points out, there’s a lot of interesting information in the statistics once you start doing the experiment. So here’s some demonstration code, first we create a table with one of two possible indexes:


create table t1
nologging
as
with generator as (
	select	--+ materialize
		rownum id
	from dual
	connect by
		level <= 1e4
)
select
	rownum			id,
	mod(rownum,1000)	btree_col,
	mod(rownum,1000)	bitmap_col,
	rpad('x',100)		padding
from
	generator	v1,
	generator	v2
where
	rownum <= 1e6
;

begin
	dbms_stats.gather_table_stats(
		ownname		 => user,
		tabname		 =>'T1',
		method_opt	 => 'for all columns size 1'
	);
end;
/

create        index t1_btree on t1(btree_col) nologging;
-- create bitmap index t1_bitmap on t1(bitmap_col) nologging;

You’ll note that the two columns I’m going to build indexes on hold the same data in the same order – and it’s an order with maximum scatter because of the mod() function I’ve used to create it. It’s also very repetitive data, having 1000 distinct values over 1,000,0000 rows. With the data and (one of) the indexes in place I’m going to insert another 10,000 rows:

execute snap_my_stats.start_snap

insert /* append */ into t1
with generator as (
	select	--+ materialize
		rownum id
	from dual
	connect by
		level <= 1e4
)
select
	1e6 + rownum		id,
	mod(rownum,1000)	btree_col,
	mod(rownum,1000)	bitmap_col,
	rpad('x',100)		padding
from
	generator
;

execute snap_my_stats.end_snap

You’ll note that I’ve got an incomplete append hint in the code – I’ve tested the mechanism about eight different ways, and left the append in as a convenience, but the results I want to talk about (first) are with the hint disabled so that the insert is a standard insert. The snap_my_stats calls are my standard mechanism to capture deltas of my session statistics (v$mystat) – one day I’ll probably get around to using Tanel’s snapper routine everywhere – and here are some of the key results produced in the two tests:


11.2.0.4 with btree
===================
Name                                                                     Value
----                                                                     -----
session logical reads                                                   31,403
DB time                                                                     64
db block gets                                                           31,195
consistent gets                                                            208
db block changes                                                        21,511
redo entries                                                            10,873
redo size                                                            3,591,820
undo change vector size                                                897,608
sorts (memory)                                                               2
sorts (rows)                                                                 1

11.2.0.4 with bitmap
====================
Name                                                                     Value
----                                                                     -----
session logical reads                                                   13,204
DB time                                                                     42
db block gets                                                            8,001
consistent gets                                                          5,203
db block changes                                                         5,911
redo entries                                                             2,880
redo size                                                            4,955,896
undo change vector size                                              3,269,932
sorts (memory)                                                               3
sorts (rows)                                                            10,001

As Martin has pointed out, there are a number of statistics that show large differences between the B-tree and bitmap approaches, but the one he didn’t mention was the key: sorts (rows). What is this telling us, and why could it matter so much ? If the B-tree index exists when the insert takes place Oracle locates the correct place for the new index entry as each row is inserted which is why you end up with so many redo entries, block gets and block changes; if the bitmap index exists, Oracle postpones index maintenance until the table insert is complete, but accumulates the keys and rowids as it goes then sorts them to optimize the rowid to bitmap conversion and walks the index in order updating each modified key just once.

The performance consequences of the two different strategies depends on the number of indexes affected, the number of rows modified, the typical number of rows per key value, and the ordering of the new data as it arrives; but it’s possible that the most significant impact could come from ordering.  As each row arrives, the relevant B-tree indexes are modified – but if you’re unlucky, or have too many indexes on the table, then each index maintenance operation could result in a random disk I/O to read the necessary block (how many times have you seen complaints like: “we’re only inserting 2M rows but it’s taking 45 minutes and we’re always waiting on db file sequential reads”). If Oracle sorts the index entries before doing the updates it minimises the random I/O because it need only update each index leaf block once and doesn’t run the risk of re-reading many leaf blocks many times for a big insert.

Further Observations

The delayed maintenance for bitmap indexes (probably) explains why they aren’t used to avoid the foreign key locking problem.  On a large insert, the table data will be arriving, the b-tree indexes will be maintained in real time, but a new child row of some parent won’t appear in the bitmap index until the entire insert is complete – so another session could delete the parent of a row that exists, is not yet committed, but is not yet visible. Try working out a generic strategy to deal with that type of problem.

It’s worth noting, of course, that when you add the /*+ append */ hint to the insert then Oracle uses exactly the same optimization strategy for B-trees as it does for bitmaps – i.e. postpone the index maintenance, remember all the keys and rowids, then sort and bulk insert them.  And when you’ve remembered that, you may also remember that the hint is (has to be) ignored if there are any enabled foreign key constraints on the table. The argument for why the hint has to be ignored and why bitmap indexes don’t avoid the locking problem is (probably) the same argument.

You may also recall, by the way, that when you have B-tree indexes on a table you can choose the optimal update or delete strategy by selecting a tablescan or index range scan as the execution path.  If you update or delete through an index range scan the same “delayed maintenance” trick is used to optimize the index updates … except for any indexes being used to support foreign key constraints, and they are maintained row by row.

In passing, while checking the results for this note I re-ran some tests that I had originally done in 2006 and added one more test that I hadn’t considered at the time; as a result I can also point out that index will see delayed maintenance if you drive the update or delete with an index() hint, but not if you drive it with an index_desc() hint.

 

April 2, 2014

Easy – Oops.

Filed under: Bugs,Function based indexes,Indexing,Oracle — Jonathan Lewis @ 7:47 pm BST Apr 2,2014

A question came up on OTN today asking for suggestions on how to enforce uniqueness on a pair of columns only when the second column was not null. There’s an easy and obvious solution – but I decided to clone the OP’s example and check that I’d typed my definition up before posting it; and the result came as a bit of a surprise. Here’s a demo script (not using the OP’s table):


create table t1  
(  
	col1	int not null,
	col2	varchar2(1)
);  

create unique index t1_i1 on t1( 
--	case col2 when null then cast(null as int) else col1 end,
--	case when col2 is null then cast(null as int) else col1 end,
	case when col2 is not null then col1 end,
	col2
)
;

insert into t1 values(1,null);
insert into t1 values(1,null);
insert into t1 values(1,'x');
insert into t1 values(1,'y');
insert into t1 values(1,'y');

commit;

column ind1_is   format a5
column ind1_when format 9999

set null N/A

select
	case when col2 is null then cast (null as int) else col1 end	ind1_is,
	case col2 when null then cast (null as int)  else col1 end	ind1_when
from 
	t1
;

The strategy is simple, you create a unique function-based index with two columns; the first column of the index id defined to show the first column of the table if the second column of the table is not null, the second column of the index is simply the second column of the table. So if the second column of the table is null, both columns in the index are null and there is no entry in the index; but if the second column of the table is not null then the index copies both columns from the table and a uniqueness test applies.

Based on the requirement and definition you would expect the first 4 of my insert statements to succeed and the last one to fail. The index will then have two entries, corresponding to my 3rd and 4th insertions.

I’ve actually shown three ways to use the case statement to produce the first column of the index. The last version is the cleanest, but the first option is the one I first thought of – it’s virtually a literal translation the original requirement. The trouble is, with my first definition the index acquired an entry it should not have got, and the second insert raised a “duplicate key” error; the error didn’t appear when I switched the syntax of the case statement to the second version.

That’s why the closing query of the demo is there – when you run it the two values reported should be the same as each other for all four rows in the table – but they’re not. This is what I got on 11.2.0.4:


IND1_IS IND1_WHEN
------- ---------
N/A             1
N/A             1
      1         1
      1         1


I’m glad I did a quick test before suggesting my original answer.

Anyone who has other versions of Oracle available is welcome to repeat the test and report back which versions they finding working correctly (or not).

Update

It’s not a bug (see note 2 below from Jason Bucata), it’s expected behaviour.

 

Tweaking

Filed under: Function based indexes,Indexing,Oracle — Jonathan Lewis @ 6:24 pm BST Apr 2,2014

The following question came up on OTN recently:

Which one gives better performance? Could please explain.

1) nvl( my_column, ‘N’) <> ‘Y’

2) nvl( my_column, ‘N’) = ‘N’

It’s a question that can lead to a good 20 minute discussion – if you were in some sort of development environment and had a fairly free hand to do whatever you wanted.

The most direct answer is that you could expect the performance to be the same whichever option you chose – but the results might be different, of course, unless you had a constraint on my_column that ensured that it would hold only null, ‘Y’, or ‘N’.  (Reminder:  the constraint check (my_column in (‘Y’,’N’) will allow nulls in the column).

On the other hand, if you create a function-based index on nvl(my_column,’N’) then the second option would give the optimizer the opportunity to use the index – which could make it the more efficient option if a sufficiently small quantity of the data from a large table matched the predicate. Of course in this case you would need a histogram on the hidden column definition supporting the index so that the optimizer could detect the data skew.

But if a function-based index is a step in the right direction it’s worth thinking about how to pick the best function-based index.


create index t1_i1 on t1(case my_column when 'Y' then null else 'N' end);

execute dbms_stats.gather_table_stats(user,'t1',method_opt=>'for all columns size 1');

select * from t1 where case my_column when 'Y' then null else 'N' end = 'N';

Provided you can change the queries that use the original predicate, you can create a function-based index that is the smallest possible index to assist your query, and you won’t need a histogram to allow the optimizer to get the correct cardinality since there will be just one distinct value in the index (and the underlying column).

It’s possible to go a little further – what if the typical query that started this discussion was really something like:


select  *
from    t1
where   nvl(my_column, 'N') = 'N'
and     other_column = {some value}

If the nvl() predicate always returned a small fraction of the data we might engineer an index to handle the requirement of the entire where clause:


create index t1_i2 on t1(case my_column when 'Y' then null else other_column end);
execute dbms_stats.gather_table_stats(user,'t1',method_opt=>'for all hidden columns size 254');

In this case you might need a histogram on the hidden column definition supporting the index, which is why I’ve changed my method_opt. We’ve constructed an index that is the smallest possible index that will satisfy the query as precisely as possible while giving the optimizer the best chance of producing accurate cardinality estimates. [Update: until the appearance of Kirill's comment (#1) I forgot to point out that once again you have to change the original SQL from using the nvl() expression to using a case expression that matches the index.]

Footnote

It would be easy to fall into the trap of playing around with this type of tweaking every time you hit a performance problem. Bear in mind, though, that even indexes which are as small and accurately targeted as possible can introduce various types of overhead (contention, cost of gathering stats, instability of histogram collection); so always try to think through the potential costs as well as the benefits of such an approach – is it really worth the effort.

It’s also worth noting that things don’t always work the way you expect.

Finally, of course, you might choose to create “proper” virtual columns in 11g so that you can refer to them by name, rather than handling them indirectly as you have to when they are “undocumented” hidden columns generated by function-based indexes.

 

March 24, 2014

Min/Max

Filed under: Function based indexes,Indexing,Oracle — Jonathan Lewis @ 11:11 pm BST Mar 24,2014

One of my most-repeated observations about trouble-shooting Oracle is that things break when you start combining features. Here’s an example that demonstrates the point.

It’s possible to create “descending” indexes – or indexes with descending columns, as I prefer to call them, and there’s a special “min/max range scan” optimizer operation for a particular kind of index usage – demonstrated in the following code fragment (running under 11.2.0.4, and reporting the rowsource execution statistics):


create table t1(
	a number not null,
	b number not null,
	c number not null,
	padding varchar2(100)
);

insert into t1
select
	mod(object_id +   1245,1001),
	mod(object_id +   4545,1111),
	mod(object_id + 774545,  13),
	rpad('x',100,'x')
from
	all_objects
where
	rownum<=10000
;

commit;

create index t1_i1 on t1(b, c, a);

begin
	dbms_stats.gather_table_stats(
		ownname		 => user,
		tabname		 =>'T1',
		method_opt	 => 'for all columns size 1'
	);
end;
/

alter session set statistics_level = all;

select
	max(a)
from	t1
where	b=1
and	c=1
;

select * from table(dbms_xplan.display_cursor(null,null,'allstats last'));

------------------------------------------------------------------------------------------------
| Id  | Operation                    | Name  | Starts | E-Rows | A-Rows |   A-Time   | Buffers |
------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |       |      1 |        |      1 |00:00:00.01 |       2 |
|   1 |  SORT AGGREGATE              |       |      1 |      1 |      1 |00:00:00.01 |       2 |
|   2 |   FIRST ROW                  |       |      1 |      1 |      1 |00:00:00.01 |       2 |
|*  3 |    INDEX RANGE SCAN (MIN/MAX)| T1_I1 |      1 |      1 |      1 |00:00:00.01 |       2 |
------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   3 - access("B"=1 AND "C"=1)

Note how the optimizer is aware that it can find a path aiming for one specific index entry (FIRST ROW), using the (min/max) option on the index.

So what happens when we change the index:


drop index t1_i1;
create index t1_i1 on t1(b, c, a desc);

execute dbms_stats.gather_table_stats(user,'t1',method_opt=>'for all columns size 1')

select
	max(a)
from	t1
where	b=1
and	c=1
;

select * from table(dbms_xplan.display_cursor(null,null,'allstats last'));

-------------------------------------------------------------------------------------
| Id  | Operation         | Name  | Starts | E-Rows | A-Rows |   A-Time   | Buffers |
-------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |       |      1 |        |      1 |00:00:00.01 |       2 |
|   1 |  SORT AGGREGATE   |       |      1 |      1 |      1 |00:00:00.01 |       2 |
|*  2 |   INDEX RANGE SCAN| T1_I1 |      1 |      1 |      1 |00:00:00.01 |       2 |
-------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("B"=1 AND "C"=1)

We’ve changed the index so that the final column is descending and although the optimizer is smart enough to determine that the query can still be satisfied without visiting the table, it can no longer use the min/max optimization, instead it does a range scan the section of the index matching the where clause, using the normal aggregate operation to find max(a).

In this tiny example the difference in the work load is barely perceptible – but there will be cases where the change in plan will make a difference in performance. As ever, when taking advantage of a feature that looks useful you have to try to imagine all the possible cases for the feature that might appear in your application and test them to see whether they introduce an unexpected (and possibly unacceptable) overhead.

Footnote:

There is a workaround in this case – not that I would suggest using it in a production system. If you remember that descending columns are implemented through a function-based index using the sys_op_descend() function, you can write code like this:

select
	utl_raw.cast_to_number(hextoraw(sys_op_undescend(MIN(sys_op_descend(a)))))	a
from
	t1
where
	b = 1
and	c = 1
;

select * from table(dbms_xplan.display_cursor(null,null,'allstats last'));

------------------------------------------------------------------------------------------------
| Id  | Operation                    | Name  | Starts | E-Rows | A-Rows |   A-Time   | Buffers |
------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |       |      1 |        |      1 |00:00:00.01 |       2 |
|   1 |  SORT AGGREGATE              |       |      1 |      1 |      1 |00:00:00.01 |       2 |
|   2 |   FIRST ROW                  |       |      1 |      1 |      1 |00:00:00.01 |       2 |
|*  3 |    INDEX RANGE SCAN (MIN/MAX)| T1_I1 |      1 |      1 |      1 |00:00:00.01 |       2 |
------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   3 - access("B"=1 AND "C"=1)

These results came from an instance of 11.2.0.4, but the limitation is still present in 12.1.0.1

March 13, 2014

Shrink Space

Filed under: Bugs,deadlocks,fragmentation,Index Rebuilds,Indexing,Locks,Oracle — Jonathan Lewis @ 7:08 am BST Mar 13,2014

Here’s a lovely effect looking at v$lock (on 11.2.0.4)

select
        sid, type, id1, id2, lmode, request, ctime, block
from
        V$lock
where   sid in (
                select  sid
                from    V$session
                where   username = 'TEST_USER'
        )
order by
        sid, type desc
;

       SID TY        ID1        ID2      LMODE    REQUEST      CTIME      BLOCK
---------- -- ---------- ---------- ---------- ---------- ---------- ----------
       145 TX     589833       7518          6          0        489          0
           TM      92275          0          2          6        489          1
           SK          7   25165955          6          0        489          0
           AE        100          0          4          0        582          0

       148 TX     524308       7383          6          0        490          0
           TM      92275          0          2          6        490          1
           SK          7   25173379          6          0        490          0
           AE        100          0          4          0        998          0

You’ll notice I’ve got two sessions holding a TM lock on the same table (object 92275) in mode 2 (sub-share) and waiting for an exclusive lock on the same table. Both sessions are recording the fact that they are blocking something else. You’ll have trust me when I say there are no other user sessions on the system at this point, and none of the background sessions is doing anything with that table.

The clue to what’s happening is the SK lock – it’s the “segment shrink” lock. I had two sessions start an “alter index I_n shrink space” (two different indexes on the same table) at the same time. The problem is that “shrink space” without the “compact” tries to drop the highwater mark on the index’ allocated space after completing the shrink phase – and there’s a defect in the internal code that tries to get the required exclusive lock on the underlying table: it doesn’t seem to allow for all the possible ways you can fail to get the lock. If you look at v$session_wait_history for either of these sessions, you’ll see something like the following:


  SID  SEQ#     EVENT# EVENT                            P1         P2         P3  WAIT_TIME WAIT_TIME_MICRO TIME_SINCE_LAST_WAIT_MICRO
----- ----- ---------- ------------------------ ---------- ---------- ---------- ---------- --------------- --------------------------
  145     1        798 Wait for shrink lock              0          0          0         10           99954                      70137
          2        235 enq: TM - contention     1414332422      92275          0        600         6002022                        101
          3        798 Wait for shrink lock              0          0          0         10          100723                      69335
          4        235 enq: TM - contention     1414332422      92275          0        600         6001589                         81
          5        798 Wait for shrink lock              0          0          0         10          100500                      69542
          6        235 enq: TM - contention     1414332422      92275          0        600         6002352                         86
          7        798 Wait for shrink lock              0          0          0         10          100618                      69145
          8        235 enq: TM - contention     1414332422      92275          0        600         6001545                        144
          9        798 Wait for shrink lock              0          0          0         10          100996                      69445
         10        235 enq: TM - contention     1414332422      92275          0        600         6002744                        310

The attempt to acquire the TM enqueue (lock) times out every three seconds – and I think the session then releases and re-acquires the SK lock before trying to re-acquire the TM lock – and it’s never going to get it.

The alert log, by the way, looked like this:


Wed Mar 12 12:53:27 2014
ORA-00060: Deadlock detected. More info in file /u01/app/oracle/diag/rdbms/test/test/trace/test_ora_2242.trc.
Wed Mar 12 12:53:30 2014
ORA-00060: Deadlock detected. More info in file /u01/app/oracle/diag/rdbms/test/test/trace/test_ora_1832.trc.
ORA-00060: Deadlock detected. More info in file /u01/app/oracle/diag/rdbms/test/test/trace/test_ora_2242.trc.
ORA-00060: Deadlock detected. More info in file /u01/app/oracle/diag/rdbms/test/test/trace/test_ora_1832.trc.
ORA-00060: Deadlock detected. More info in file /u01/app/oracle/diag/rdbms/test/test/trace/test_ora_2242.trc.
ORA-00060: Deadlock detected. More info in file /u01/app/oracle/diag/rdbms/test/test/trace/test_ora_1832.trc.
Wed Mar 12 12:53:40 2014
ORA-00060: Deadlock detected. More info in file /u01/app/oracle/diag/rdbms/test/test/trace/test_ora_2242.trc.
ORA-00060: Deadlock detected. More info in file /u01/app/oracle/diag/rdbms/test/test/trace/test_ora_1832.trc.
ORA-00060: Deadlock detected. More info in file /u01/app/oracle/diag/rdbms/test/test/trace/test_ora_2242.trc.
Wed Mar 12 12:53:43 2014
ORA-00060: Deadlock detected. More info in file /u01/app/oracle/diag/rdbms/test/test/trace/test_ora_1832.trc.
ORA-00060: Deadlock detected. More info in file /u01/app/oracle/diag/rdbms/test/test/trace/test_ora_2242.trc.

I’ve said it before, and I keep repeating it when people say “Oracle resolves deadlocks automatically”: Oracle does NOT resolve deadlocks automatically – one of the sessions will rollback its last DML statement to clear the deadlock, but the other session will (almost invariably) still be waiting. It’s up to the application to do something sensible to resolve the deadlock after it receives the ORA-00060 error.

Don’t ask about the trace files – but they had both reached 400MB by the time I finished this note.

Strategy Note

If you are going to shrink objects, it’s probably best to do it in two steps: “shrink space compact, followed by “shrink space”.
If you’re going to try to use multiple sessions to shrink several indexes as quickly as possible, make sure there’s no way that two sessions can try to “shrink space” on the indexes on the same table at the same time.

Footnote:

This article was prompted by the recent appearance of bug 18388128 on MoS.

March 2, 2014

Auto Sample Size

Filed under: Function based indexes,Indexing,Infrastructure,IOT,LOBs,Oracle,Statistics — Jonathan Lewis @ 6:38 pm BST Mar 2,2014

In the past I have enthused mightily about the benefits of the approximate NDV mechanism and the benefit of using auto_sample_size to collect statistics in 11g; however, as so often happens with Oracle features, there’s a down-side or boundary condition, or edge case. I’ve already picked this up once as an addendum to an earlier blog note on virtual stats, which linked to an article on OTN describing how the time taken to collect stats on a table increased dramatically after the addition of an index – where the index had this definition:


create bitmap index i_s_rmp_eval_csc_msg_actions on
    s_rmp_evaluation_csc_message (
        decode(instr(xml_message_text,' '),0,0,1)
    )
;

As you might guess from the column name, this is an index based on an XML column, which is stored as a CLOB.

In a similar vein, I showed you a few days ago an old example I had of indexing a CLOB column with a call to dbms_lob.getlength(). Both index examples suffer from the same problem – to support the index Oracle creates a hidden (virtual) column on the table that can be used to hold statistics about the values of the function; actual calculated values for the function call are stored in the index but not on the table itself – but it’s important that the optimizer has the statistics about the non-existent column values.

So what happens when Oracle collects table statistics – if you’ve enable the approximate NDV feature Oracle does a 100% sample, which means it has to call the function for every single row in the table. You will appreciate that the decode(instr()) function on the LOB column is going to read every single LOB in turn from the table – it’s not surprising that the time taken to calculate stats on the table jumped from a few minutes to a couple of hours. What did surprise me was that my call to dbms_lob.getlength() also seemed to read every lob in my example rather than reading the “LOB Locator” data that’s stored in the row – one day I’ll take a look into why that happened.

Looking at these examples it’s probably safe to conclude that if you really need to index some very small piece of “flag” information from a LOB it’s probably best to store it as a real column on the table – perhaps populated through a trigger so you don’t have to trust every single piece of front-end code to keep it up to date. (It would be quite nice if Oracle gave us the option for a “derived” column – i.e. one that could be defined in the same sort of way as a virtual column, with the difference that it should be stored in the table.)

So virtual columns based on LOBs can create a performance problem for the approximate NDV mechanism;  but the story doesn’t stop there because there’s another “less commonly used” feature of Oracle that introduces a different threat – with no workaround – it’s the index organized table (IOT). Here’s a basic example:

create table iot1 (
        id1	number(7,0),
	id2	number(7,0),
	v1	varchar2(10),
	v2	varchar2(10),
	padding	varchar2(500),
        constraint iot1_pk primary key(id1, id2)
)
organization index
including id2
overflow
;

insert into iot1
with generator as (
	select	--+ materialize
		rownum id
	from dual
	connect by
		level <= 1e4
)
select
        mod(rownum,20)                  id1,
        trunc(rownum,100)               id2,
        to_char(mod(rownum,20))         v1,
        to_char(trunc(rownum,100))      v2,
        rpad('x',500,'x')               padding
from
	generator	v1,
	generator	v2
where
	rownum <= 1e5
;

commit;

alter system flush buffer_cache;

alter session set events '10046 trace name context forever';

begin
	dbms_stats.gather_table_stats(
		ownname		 => user,
		tabname		 =>'IOT1',
		method_opt	 => 'for all columns size 1'
	);
end;
/

alter session set events '10046 trace name context off';

You’ll notice I’ve created the table then inserted the data – if I did a “create table as select” Oracle would have sorted the data before inserting it, and that would have helped to hide the problem I’m trying to demonstrate. As it is my overflow segment is very badly ordered relative to the “top” (i.e. index) segment – in fact I can see after I’ve collected stats on the table that the clustering_factor on the index is 100,000 – an exact match for the rows in the table.

Running 11.2.0.4, with a 1MB uniform extent, freelist management, and 8KB block size the index segment held 279 leaf blocks, the overflow segment (reported in view user_tables as SYS_IOT_OVER_81594) held 7,144 data blocks.

So what interesting things do we find in a 10046 trace file after gathering stats – here are the key details from the tkprof results:

SQL ID: 7ak95sy9m1s4f Plan Hash: 1508788224

select /*+  full(t)    no_parallel(t) no_parallel_index(t) dbms_stats
  cursor_sharing_exact use_weak_name_resl dynamic_sampling(0) no_monitoring
  no_substrb_pad  */to_char(count("ID1")),to_char(substrb(dump(min("ID1"),16,
  0,32),1,120)),to_char(substrb(dump(max("ID1"),16,0,32),1,120)),
  to_char(count("ID2")),to_char(substrb(dump(min("ID2"),16,0,32),1,120)),
  to_char(substrb(dump(max("ID2"),16,0,32),1,120)),to_char(count("V1")),
  to_char(substrb(dump(min("V1"),16,0,32),1,120)),
  to_char(substrb(dump(max("V1"),16,0,32),1,120)),to_char(count("V2")),
  to_char(substrb(dump(min("V2"),16,0,32),1,120)),
  to_char(substrb(dump(max("V2"),16,0,32),1,120)),to_char(count("PADDING")),
  to_char(substrb(dump(min("PADDING"),16,0,32),1,120)),
  to_char(substrb(dump(max("PADDING"),16,0,32),1,120))
from
 "TEST_USER"."IOT1" t  /* NDV,NIL,NIL,NDV,NIL,NIL,NDV,NIL,NIL,NDV,NIL,NIL,NDV,
  NIL,NIL*/

call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        1      0.00       0.00          0          0          0           0
Execute      1      0.00       0.00          0          0          0           0
Fetch        1      0.37       0.37       7423     107705          0           1
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total        3      0.37       0.37       7423     107705          0           1

Misses in library cache during parse: 1
Optimizer mode: ALL_ROWS
Parsing user id: 62     (recursive depth: 1)
Number of plan statistics captured: 1

Rows (1st) Rows (avg) Rows (max)  Row Source Operation
---------- ---------- ----------  ---------------------------------------------------
         1          1          1  SORT AGGREGATE (cr=107705 pr=7423 pw=0 time=377008 us)
    100000     100000     100000   APPROXIMATE NDV AGGREGATE (cr=107705 pr=7423 pw=0 time=426437 us cost=10 size=23944 card=82)
    100000     100000     100000    INDEX FAST FULL SCAN IOT1_PK (cr=107705 pr=7423 pw=0 time=298380 us cost=10 size=23944 card=82)(object id 85913)

********************************************************************************

SQL ID: 1ca2ug8s3mm5z Plan Hash: 2571749554

select /*+  no_parallel_index(t, "IOT1_PK")  dbms_stats cursor_sharing_exact
  use_weak_name_resl dynamic_sampling(0) no_monitoring no_substrb_pad
  no_expand index(t,"IOT1_PK") */ count(*) as nrw,count(distinct
  sys_op_lbid(85913,'L',t.rowid)) as nlb,null as ndk,
  sys_op_countchg(sys_op_lbid(85913,'O',"V1"),1) as clf
from
 "TEST_USER"."IOT1" t where "ID1" is not null or "ID2" is not null

call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        1      0.00       0.00          0          0          0           0
Execute      1      0.00       0.00          0          0          0           0
Fetch        1      0.16       0.16          0     100280          0           1
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total        3      0.16       0.16          0     100280          0           1

Misses in library cache during parse: 1
Optimizer mode: ALL_ROWS
Parsing user id: 62     (recursive depth: 1)
Number of plan statistics captured: 1

Rows (1st) Rows (avg) Rows (max)  Row Source Operation
---------- ---------- ----------  ---------------------------------------------------
         1          1          1  SORT GROUP BY (cr=100280 pr=0 pw=0 time=162739 us)
    100000     100000     100000   INDEX FULL SCAN IOT1_PK (cr=100280 pr=0 pw=0 time=164597 us cost=6 size=5900000 card=100000)(object id 85913)

The first query collects table and column stats, and we can see that the approximate NDV method has been used because of the trailing text: /* NDV,NIL,NIL,NDV,NIL,NIL,NDV,NIL,NIL,NDV,NIL,NIL,NDV,NIL,NIL*/. In this statement the hint /*+ full(t) */ has been interpreted to mean an index fast full scan, which is what we see in the execution plan. Although there are only 279 blocks in the index and 7,144 blocks in the overflow we’ve done a little over 100,000 buffer visits because for every index entry in the IOT top we’ve done a “fetch by rowid” into the overflow segment (the session stats records these as “table fetch continued row”). Luckily I had a small table so all those visits were buffer gets; on a very large table it’s quite possible that a significant fraction of those buffer gets will turn into single block physical reads.

Not only have we done one buffer visit per row to allow us to calculate the approximate NDV for the table columns, we’ve done the same all over again so that we can calculate the clustering_factor of the index. This is a little surprising since the “rowid” for an item in the overflow section is stored in the index segment but (as you can see in the second query in the tkprof output) Oracle has used column v1 (the first in the overflow segment) in the call to the sys_op_countchg() function where the equivalent call for an ordinary index would use t.rowid so, presumably, the code HAS to access the overflow segment. The really strange thing about this is that the same SQL statement has a call to sys_op_lbid() which uses the (not supposed to exist in IOTs) rowid – so it looks as if it ought to be possible for sys_op_countchg() to do the same.

So – big warning on upgrading to 11g: if you’ve got IOTs with overflows and you switch to auto_sample_size and enable approximate NDV then the time taken to gather stats on those IOTs may (depending to a large extent on the data clustering) take much longer than it used to.

February 25, 2014

FBI Skip Scan

Filed under: Bugs,Function based indexes,Indexing,Oracle — Jonathan Lewis @ 6:45 pm BST Feb 25,2014

A recent posting on the OTN database forum highlighted a bug (or defect, or limitation) in the way that the optimizer handles index skip scans with “function-based” indexes – it doesn’t do them. The defect has probably been around for a long time and demonstrates a common problem with testing Oracle – it’s very easy for errors in the slightly unusual cases to be missed; it also demonstrates a general principle that it can take some time for a (small) new feature to be applied consistently across the board.

The index definitions in the original posting included expressions like substr(nls_lower(colX), 1, 25), and it’s possible for all sorts of unexpected effects to appear when your code starts running into NLS  settings, so I’ve created a much simpler example. Here’s my table definition, with three index definitions:

(more…)

February 21, 2014

Indexing LOBs

Filed under: Function based indexes,Indexing,Infrastructure,LOBs,Oracle — Jonathan Lewis @ 6:42 pm BST Feb 21,2014

Many years ago, possibly when most sites were still using Oracle 8i, a possible solution to a particular customer problem was to create a function-based index on a CLOB column using the dbms_lob.getlength() function call. I can’t find the notes explaining why this was necessary (I usually have some sort of clue – such as the client name – in the script, but in this case all I had was a comment that “the manuals say you can’t do this, but it works provided you wrap the dbms_lob call inside a deterministic function”).

I never worked out why the dbms_lob.getlength() function wasn’t declared as deterministic – especially since it came complete with a most restrictive restricts_references pragma – so I had just assumed there was probably some good reason based on strange side effects when national language charactersets came into play. But here’s a little detail I noticed recently about the dbms_lob.getlength() function: it became deterministic in 11g, so if the client decided to implement my suggestion (which included the usual sorts of warnings) it’s now legal !

Footnote – the length() function has been deterministic and usable with LOBs for a long time, certainly since late 9i, but in 8i length(lob_col) will produce Oracle error “ORA-00932: inconsistent datatypes”

Next Page »

Theme: Rubric. Get a free blog at WordPress.com

Follow

Get every new post delivered to your Inbox.

Join 4,267 other followers