Well, it surprised me!
I’ve said for a very long time that in principle, ignoring bugs and restrictions, the optimizer will always choose the lowest cost option during its search for an execution path. It turns out that this isn’t true. In a comment attached to a note I had written about a possible bug relating to function-based indexes I was told that there are cases where the optimizer follows a rule that allows it to ignore the lowest cost path if it is derived from a range-based predicate involving unpeekable bind variables.
The trouble is, any statement made about “bind variables” may also apply in any circumstances where the optimizer sees: “unknown value”. Here’s a simplified example that I find a little worrying (running on 11.1):
create table t1
as
with generator as (
select --+ materialize
rownum id
from dual
connect by
rownum <= 10000
)
select
rownum id,
mod(rownum-1,20) scattered,
trunc(rownum / 500) clustered,
lpad(mod(rownum,2),10) ind_pad,
lpad(rownum,10,'0') small_vc,
rpad('x',100) padding
from
generator v1,
generator v2
where
rownum <= 40000
;
create index t1_equi on t1(scattered, ind_pad, clustered);
create index t1_range on t1(clustered, scattered);
-- call dbms_stats to compute table stats with no histograms
set autotrace traceonly explain
select
small_vc
from
t1
where
scattered = 10
and clustered between (select 40 from dual)
and (select 41 from dual)
;
select
/*+ index(t1 t1_range) */
small_vc
from
t1
where
scattered = 10
and clustered between (select 40 from dual)
and (select 41 from dual)
;
The SQL with its “select constant from dual” may look a little artificial – but it represents a strategy that is used quite commonly; and it’s just one of several code patterns that can kick the optimizer into using the arithmetic for “unknown value at compile time” (another would the case where you use sys_context(), possibly in fine-grained access control – a.k.a. row-level security or virtual private database).
I’ve got two possible indexes on the table that could be used to assist this query – and one of them is significantly larger than the other because it has an extra column (ind_pad) in the middle which does not appear in the where clause. which index is the optimizer going to use ?
-- Default path
-----------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost |
-----------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 5 | 85 | 17 |
| 1 | TABLE ACCESS BY INDEX ROWID| T1 | 5 | 85 | 13 |
|* 2 | INDEX RANGE SCAN | T1_EQUI | 9 | | 9 |
| 3 | FAST DUAL | | 1 | | 2 |
| 4 | FAST DUAL | | 1 | | 2 |
-----------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - access("SCATTERED"=10 AND "CLUSTERED">= (SELECT 40 FROM
"SYS"."DUAL" "DUAL") AND "CLUSTERED"<= (SELECT 41 FROM "SYS"."DUAL"
"DUAL"))
filter("CLUSTERED">= (SELECT 40 FROM "SYS"."DUAL" "DUAL") AND
"CLUSTERED"<= (SELECT 41 FROM "SYS"."DUAL" "DUAL"))
-- Hinted to user t1_range index
------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost |
------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 5 | 85 | 10 |
| 1 | TABLE ACCESS BY INDEX ROWID| T1 | 5 | 85 | 6 |
|* 2 | INDEX RANGE SCAN | T1_RANGE | 9 | | 2 |
| 3 | FAST DUAL | | 1 | | 2 |
| 4 | FAST DUAL | | 1 | | 2 |
------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - access("CLUSTERED">= (SELECT 40 FROM "SYS"."DUAL" "DUAL") AND
"SCATTERED"=10 AND "CLUSTERED"<= (SELECT 41 FROM "SYS"."DUAL" "DUAL"))
filter("SCATTERED"=10)
The optimizer has selected the path with the higher cost – and it’s hard-coded to do this in response to the nature of the predicates and the available choice of indexes. We have two indexes, with predicate usage as follows:
t1_equi (scattered, ind_pad, clustered) with predicates that are: (equality, omitted, range)
t1_range (clustered, scattered) with predicates that are: (range, equality)
The important point is that we have a predicate on column clustered which is range-based with values that are unknown at optimisation time (even though the human eye can see that the values are going to be 40 and 41). Because of this uncertainty the optimizer is coded to bypass the index that starts with this suspect column and use the (higher cost) index where there is a known starting predicate. In effect the optimizer seems to be saying: “in the worst case scenario t1_range might end up doing an index full scan but t1_equi will only have to do a partial range scan”.
I’ve still got a lot of examples I want to work through to see how far ranging this rule is and if it ever gets ignored (what happens, for example, to joins with range-based predicates), but I hope that this preliminary note acts a useful clue when you next see Oracle ignoring the lowest cost path.


I’ve csme across this once in a complex statement that I didn’t have time to figure out what odd CBO “quirk” I was dealing with- Excellent examples and clear explanations, thank you, Jonathan!
Comment by Kellyn Pedersen — September 8, 2010 @ 2:20 am UTC Sep 8,2010 |
Jonathan,
I am afraid revelations like these are making me “atheist” about Oracle
On a serious note, thanks for sharing this.
Comment by Narendra — September 8, 2010 @ 10:38 am UTC Sep 8,2010 |
Hi Jonathan,
I tested it on 11.2 and it seems to be fixed in 11.2.
See my test (I changed the name of t1 to t1xx)
SQL> startup force ORACLE instance started. Total System Global Area 1068937216 bytes Fixed Size 2220200 bytes Variable Size 977276760 bytes Database Buffers 83886080 bytes Redo Buffers 5554176 bytes Database mounted. Database opened. SQL> SQL> connect scott/regit Connected. SQL> set line 200 SQL> set pagesize 0 SQL> alter session set optimizer_use_sql_plan_baselines = false 2 ; Session altered. SQL> select small_vc from t1xx where scattered = 10 and clustered between (select 40 from dual) and (select 41 from dual) 2 3 4 5 6 7 8 ; 0000020011 0000020031 0000020051 …….. 0000020071 50 rows selected. SQL> select * from table(dbms_xplan.display_cursor) 2 ; SQL_ID bsvcwujxd02p6, child number 0 ------------------------------------- select small_vc from t1xx where scattered = 10 and clustered between (select 40 from dual) and (select 41 from dual) Plan hash value: 1241395774 --------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | --------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | | | 17 (100)| | | 1 | TABLE ACCESS BY INDEX ROWID| T1XX | 5 | 140 | 13 (0)| 00:00:01 | |* 2 | INDEX RANGE SCAN | T1_EQUI | 9 | | 9 (0)| 00:00:01 | | 3 | FAST DUAL | | 1 | | 2 (0)| 00:00:01 | | 4 | FAST DUAL | | 1 | | 2 (0)| 00:00:01 | --------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 2 - access("SCATTERED"=10 AND "CLUSTERED">= AND "CLUSTERED"= AND "CLUSTERED" select /*+ index(t1 t1_range) */ small_vc from t1xx where scattered = 10 and clustered between (select 40 from dual) and (select 41 from dual) 2 3 4 5 6 7 8 9 ; 0000020011 0000020031 ….. 0000020951 0000020971 0000020991 50 rows selected. SQL> select * from table(dbms_xplan.display_cursor) ; SQL_ID da0446kv1h1v8, child number 0 ------------------------------------- select /*+ index(t1 t1_range) */ small_vc from t1xx where scattered = 10 and clustered between (select 40 from dual) and (select 41 from dual) Plan hash value: 1241395774 --------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | --------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | | | 17 (100)| | | 1 | TABLE ACCESS BY INDEX ROWID| T1XX | 5 | 140 | 13 (0)| 00:00:01 | |* 2 | INDEX RANGE SCAN | T1_EQUI | 9 | | 9 (0)| 00:00:01 | | 3 | FAST DUAL | | 1 | | 2 (0)| 00:00:01 | | 4 | FAST DUAL | | 1 | | 2 (0)| 00:00:01 | --------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 2 - access("SCATTERED"=10 AND "CLUSTERED">= AND "CLUSTERED"= AND "CLUSTERED"<=)) 24 rows selected.Regards Hans-Peter
Comment by Hans-Peter — September 9, 2010 @ 11:57 am UTC Sep 9,2010 |
Dear Hans-Peter,
the problem reproduces even on 11.2.0.1.2. You have accidentally not given the right table name/alias in the hint for the second example. This is why the optimizer did not honour your hint:
should be :
/*+ index(t1xx t1_range) */
instead of:
/*+ index(t1 t1_range) */
Regards,
Martin
Comment by Martin Decker — October 1, 2010 @ 6:56 am UTC Oct 1,2010 |
Good that you noticed it Martin.
Thanks
Comment by Hans-Peter — October 1, 2010 @ 5:27 pm UTC Oct 1,2010 |
[...] Update (31st Oct 2010): I did a little extra work on this issue a few weeks later ago (see comments and pingback below), but forgot to link forward to the new article. My original conclusions were wrong; for more details see: http://jonathanlewis.wordpress.com/2010/09/07/cbo-surprise-3/ [...]
Pingback by FBI Bug « Oracle Scratchpad — October 31, 2010 @ 8:31 am UTC Oct 31,2010 |
[...] this version of Oracle this resulted in the predicates being treated as “guesses on an index range scan” – which resulted in the optimizer ignoring the appropriate index (until hinted) even though [...]
Pingback by FBI Bug « Oracle Scratchpad — December 30, 2011 @ 5:48 pm UTC Dec 30,2011 |