Oracle Scratchpad

June 14, 2013

Hints again

Filed under: CBO,Hints,Ignoring Hints,Oracle — Jonathan Lewis @ 6:17 pm UTC Jun 14,2013

A recent posting on OTN came up with a potentially interesting problem – it started roughly like this:

I have two queries like this:

select * from emp where dept_id=10 and emp_id=15;
select * from emp where dept_id=10 and emp_id=16;

When I run them separately I get the execution plan I want, but when I run a union of the two the plans change.

This, of course, is extremely unlikely – even if we assume that the two queries are more complex than the text shown. On the other hand you might, after a little thought, come up with the idea that perhaps the optimizer had done something really clever like join factorization (moving a join that’s common to the two parts of the UNION from inside to outside the UNION), or maybe there’s some really new trick the optimizer had played because a UNION ultimately requires a SORT UNIQUE, and the optimizer had chosen a different path that returned the data from each part of the UNION in sorted order to decrease the cost of that final sort.

In fact it turned out to be a lot simpler than that. The query looked more like this:


select
	/*+
		index(@qb_view_a t1)
		index(@qb_view_b t1)
	*/
	*
from
	t2, qb_view
where
	t2.n1 = 10
and	qb_view.n2  = t2.n2
union
select
	/*+
		index(@qb_view_a t1)
		index(@qb_view_b t1)
	*/
	*
from
	t2, qb_view
where
	t2.n1 = 12                    -- the real code referenced an alternative column here.
and	qb_view.n2  = t2.n2
;

Of particular note is the fact that it’s a join, the join involves a view (guessing from the names in the FROM clause) and there are hints that reference query block names for query blocks that don’t exist – but perhaps are present inside the view. So what does the view look like.


create or replace view qb_view
as
select /*+ qb_name(qb_view_a) */ * from t1
union all
select /*+ qb_name(qb_view_b) */ * from t1
;

It’s a union all view – and the two query blocks named from the outside query are the two halves of the inner union.

Here’s an important thought – it’s quite easy to get Oracle to do what you want in a simple query (at least in the short term) by sticking in a few hints – especially if you create and reference query block names; but when you start compounding queries by combining bits of code that currently do what you want, you may find that Oracle introduces extra query blocks during transformation, and perhaps some of the query blocks you’ve referenced originally cease to exist, so the hints no longer apply.

Let’s look at the execution plan – including the ALIAS and OUTLINE sections – for the final query, and compare it with the execution plan for just one of the two pieces; starting with the single piece first:

explain plan for
select
	/*+
		index(@qb_view_a t1)
		index(@qb_view_b t1)
	*/
	*
from
	t2, qb_view
where
	t2.n1 = 10
and	qb_view.n2  = t2.n2
;

select * from table(dbms_xplan.display(null,null,'basic +outline +alias'));

--------------------------------------------------
| Id  | Operation                      | Name    |
--------------------------------------------------
|   0 | SELECT STATEMENT               |         |
|   1 |  HASH JOIN                     |         |
|   2 |   TABLE ACCESS FULL            | T2      |
|   3 |   VIEW                         | QB_VIEW |
|   4 |    UNION-ALL                   |         |
|   5 |     TABLE ACCESS BY INDEX ROWID| T1      |
|   6 |      INDEX FULL SCAN           | T1_I1   |
|   7 |     TABLE ACCESS BY INDEX ROWID| T1      |
|   8 |      INDEX FULL SCAN           | T1_I1   |
--------------------------------------------------

Query Block Name / Object Alias (identified by operation id):
-------------------------------------------------------------
   1 - SEL$1
   2 - SEL$1     / T2@SEL$1
   3 - SET$1     / QB_VIEW@SEL$1
   4 - SET$1
   5 - QB_VIEW_A / T1@QB_VIEW_A
   6 - QB_VIEW_A / T1@QB_VIEW_A
   7 - QB_VIEW_B / T1@QB_VIEW_B
   8 - QB_VIEW_B / T1@QB_VIEW_B

Outline Data
-------------
  /*+
      BEGIN_OUTLINE_DATA
      INDEX(@"QB_VIEW_A" "T1"@"QB_VIEW_A" ("T1"."N1"))
      INDEX(@"QB_VIEW_B" "T1"@"QB_VIEW_B" ("T1"."N1"))
      USE_HASH(@"SEL$1" "QB_VIEW"@"SEL$1")
      LEADING(@"SEL$1" "T2"@"SEL$1" "QB_VIEW"@"SEL$1")
      NO_ACCESS(@"SEL$1" "QB_VIEW"@"SEL$1")
      FULL(@"SEL$1" "T2"@"SEL$1")
      OUTLINE(@"QB_VIEW_B")
      OUTLINE(@"QB_VIEW_A")
      OUTLINE_LEAF(@"SEL$1")
      OUTLINE_LEAF(@"SET$1")
      OUTLINE_LEAF(@"QB_VIEW_B")
      OUTLINE_LEAF(@"QB_VIEW_A")
      ALL_ROWS
      DB_VERSION('11.2.0.3')
      OPTIMIZER_FEATURES_ENABLE('11.2.0.3')
      IGNORE_OPTIM_EMBEDDED_HINTS
      END_OUTLINE_DATA
  */

The plan shows us that we have used an index to access table t1 (with an index full scan) in both halves of the QB_VIEW’s union all; and the alias section shows us that we have a table t1 in a query block name qb_view_a, and the outline section shows that we have a hint applied in that query block that directs the optimizer to use an index on that table: INDEX(@”QB_VIEW_A” “T1″@”QB_VIEW_A” (“T1″.”N1″)); and we can see the same strategy appearing for a table t1 in query block qb_view_b. By the way, I checked the plan without the hints, and the optimizer chose to do full tablescans on t1 – so the hints were actually having an effect.

So what happens when we check the plan for the UNION of the two variants of the query:


------------------------------------------
| Id  | Operation              | Name    |
------------------------------------------
|   0 | SELECT STATEMENT       |         |
|   1 |  SORT UNIQUE           |         |
|   2 |   UNION-ALL            |         |
|   3 |    HASH JOIN           |         |
|   4 |     TABLE ACCESS FULL  | T2      |
|   5 |     VIEW               | QB_VIEW |
|   6 |      UNION-ALL         |         |
|   7 |       TABLE ACCESS FULL| T1      |
|   8 |       TABLE ACCESS FULL| T1      |
|   9 |    HASH JOIN           |         |
|  10 |     TABLE ACCESS FULL  | T2      |
|  11 |     VIEW               | QB_VIEW |
|  12 |      UNION-ALL         |         |
|  13 |       TABLE ACCESS FULL| T1      |
|  14 |       TABLE ACCESS FULL| T1      |
------------------------------------------

Query Block Name / Object Alias (identified by operation id):
-------------------------------------------------------------
   1 - SET$1
   3 - SEL$1
   4 - SEL$1 / T2@SEL$1
   5 - SET$2 / QB_VIEW@SEL$1
   6 - SET$2
   7 - SEL$2 / T1@SEL$2
   8 - SEL$3 / T1@SEL$3
   9 - SEL$4
  10 - SEL$4 / T2@SEL$4
  11 - SET$3 / QB_VIEW@SEL$4
  12 - SET$3
  13 - SEL$5 / T1@SEL$5
  14 - SEL$6 / T1@SEL$6

Outline Data
-------------
  /*+
      BEGIN_OUTLINE_DATA
      FULL(@"SEL$5" "T1"@"SEL$5")
      FULL(@"SEL$6" "T1"@"SEL$6")
      FULL(@"SEL$2" "T1"@"SEL$2")
      FULL(@"SEL$3" "T1"@"SEL$3")
      USE_HASH(@"SEL$1" "QB_VIEW"@"SEL$1")
      LEADING(@"SEL$1" "T2"@"SEL$1" "QB_VIEW"@"SEL$1")
      NO_ACCESS(@"SEL$1" "QB_VIEW"@"SEL$1")
      FULL(@"SEL$1" "T2"@"SEL$1")
      USE_HASH(@"SEL$4" "QB_VIEW"@"SEL$4")
      LEADING(@"SEL$4" "T2"@"SEL$4" "QB_VIEW"@"SEL$4")
      NO_ACCESS(@"SEL$4" "QB_VIEW"@"SEL$4")
      FULL(@"SEL$4" "T2"@"SEL$4")
      OUTLINE_LEAF(@"SET$1")
      OUTLINE_LEAF(@"SEL$4")
      OUTLINE_LEAF(@"SET$3")
      OUTLINE_LEAF(@"SEL$6")
      OUTLINE_LEAF(@"SEL$5")
      OUTLINE_LEAF(@"SEL$1")
      OUTLINE_LEAF(@"SET$2")
      OUTLINE_LEAF(@"SEL$3")
      OUTLINE_LEAF(@"SEL$2")
      ALL_ROWS
      DB_VERSION('11.2.0.3')
      OPTIMIZER_FEATURES_ENABLE('11.2.0.3')
      IGNORE_OPTIM_EMBEDDED_HINTS
      END_OUTLINE_DATA
  */

The plan is completely different. We’ve lost the index full scans and we’ve reverted to the tablescans that we would have got from an unhinted query. When you look at the outline you can see why – the query blocks qb_view_a and qb_view_b have disappeared – so the hints are no longer valid. As you can see we now have four occurrences of table t1, but as the alias section shows they come from query blocks sel$2, sel$3, sel$5 and sel$6).

Is this a bug ? I don’t think so. When the optimizer produces the outline information (which can be stored as an SQL Baseline in 11g) it’s producing a set of hints that will be applied at the outermost query block, with hints that point, as necessary, to inner query blocks; this means you can’t use the same query block name twice in a query or the optimizer wouldn’t be able to identify which query block a hint was supposed to apply to. So Oracle has eliminated duplicate query block names and replaced them with the standard internally generated ones – the user’s hints no longer apply.

Footnote: Checking the clock – it took me about 15 minutes to create a simplified model based on the information available on OTN: it’s taken me 75 minutes to describe what I did and what I learned as a result. With a little practice you can get very good at creating models that help you to identify and solve problems very quickly.

June 12, 2013

Not In Nasty

Filed under: Bugs,CBO,Oracle — Jonathan Lewis @ 5:31 pm UTC Jun 12,2013

Actually it’s probably not the NOT IN that’s nasty, it’s the thing you get if you don’t use NOT IN that’s more likely to be nasty. Just another odd little quirk of the optimizer, which I’ll demonstrate with a simple example (running under 11.2.0.3 in this case):

create table t1
as
select	*
from	all_objects
where	rownum <= 10000
;

update t1 set
	object_type = null
where
	object_type = 'TABLE'
;

commit;

begin
	dbms_stats.gather_table_stats(
		ownname		 => user,
		tabname		 =>'T1',
		method_opt	 => 'for all columns size 1'
	);
end;
;

select
	sample_size, num_nulls, num_distinct, histogram
from
	user_tab_columns
where
	table_name = 'T1'
and	column_name = 'OBJECT_TYPE'
;

This code is going to give you 10,000 rows but the number of distinct values for object_type and the number of nulls will depend on the version and options installed; however you should get about 15 distinct values for object_type and about 1,000 nulls in that column.

If you want to run a query to count the rows where the object_type is not one of: ‘INDEX’,'SYNONYM’, or ‘VIEW’, there are two obvious ways of writing it, so let’s put them into a single query with a UNION ALL, and see what Oracle predicts as the cardinality (cut-n-pasted from an SQL*Plus session):

SQL> set autotrace traceonly explain
SQL> select
  2     count(*)
  3  from       t1
  4  where      object_type != 'INDEX'
  5  and        object_type != 'SYNONYM'
  6  and        object_type != 'VIEW'
  7  union all
  8  select
  9     count(*)
 10  from       t1
 11  where      object_type not in ('INDEX', 'SYNONYM', 'VIEW')
 12  ;

Execution Plan
----------------------------------------------------------
Plan hash value: 575959041

----------------------------------------------------------------------------
| Id  | Operation           | Name | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------
|   0 | SELECT STATEMENT    |      |     2 |    12 |    38  (53)| 00:00:01 |
|   1 |  UNION-ALL          |      |       |       |            |          |
|   2 |   SORT AGGREGATE    |      |     1 |     6 |            |          |
|*  3 |    TABLE ACCESS FULL| T1   |  5903 | 35418 |    19   (6)| 00:00:01 |
|   4 |   SORT AGGREGATE    |      |     1 |     6 |            |          |
|*  5 |    TABLE ACCESS FULL| T1   |  7308 | 43848 |    19   (6)| 00:00:01 |
----------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

3 - filter("OBJECT_TYPE"<>'INDEX' AND "OBJECT_TYPE"<>'SYNONYM' AND "OBJECT_TYPE"<>'VIEW')
5 - filter("OBJECT_TYPE"<>'INDEX' AND "OBJECT_TYPE"<>'SYNONYM' AND "OBJECT_TYPE"<>'VIEW')

Look at operations 3 and 5: the way you write the query makes a significant difference to the cardinality predicted by the optimizer – despite the fact that the NOT IN predicate is transformed internally so that the predicate sections of the two parts of the UNION ALL are exactly the same.

If you’re wondering, the NOT IN option is the one that does the rational arithmetic – it has calculated the number of rows where object_type is not null, then allowed for filtering out 3/15ths of those rows because the query rejects 3 rows out of the 15 distinct values reported by the statistics. The list of inequalities has applied some arithmetic that tries to allow for the nulls three times – the more values you reject the worse the estimate gets.

It’s quite likely that you won’t notice the effect in many cases – but if you run queries against a column with a large percentage of nulls, then the differences can be very large and knock-on effects of this error could be dramatic.

June 9, 2013

10053

Filed under: CBO,Oracle,Troubleshooting — Jonathan Lewis @ 9:24 am UTC Jun 9,2013

I thought I’d try to spend some of today catching up on old comments – first the easier ones, then the outstanding questions on Oracle Core.
The very first one I looked at was about pushing predicates, and the specific comment prompted me to jot down this little note about the 10053 trace file (the CBO trace).

In “real-life” I don’t often look at 10053 trace files because they tend to be very long and messy and usually you can see all you need from an execution plan – perhaps through SQL Monitoring, or possible with the rowsource execution stats enabled. However there are questions that can sometimes be resolved very quickly by a simple text search of a trace file; questions of the form: “why isn’t the optimizer using feature X”, for example: “why didn’t the optimizer use predicate push for this view?” If you have a question of this type, then search the trace file for the word “bypass” – you may find comments like the following:

        JPPD:     JPPD bypassed: Outer query references right side of outer join
        JPPD:     JPPD bypassed: View has a single or group set function.
        JPPD:     JPPD bypassed: OLD_PUSH_PRED hint specified
        JPPD:     JPPD bypassed: User-defined operator
        JPPD:     JPPD bypassed: Push-down not enabled
        JPPD:     JPPD bypassed: View not on right-side of outer join
        JPPD:     JPPD bypassed: View contains a group by.
        JPPD:     JPPD bypassed: View contains a window function.
        JPPD:     JPPD bypassed: View contains a MODEL clause.
        JPPD:     JPPD bypassed: View contains a DISTINCT.
        JPPD:     JPPD bypassed: View contains a rownum reference.
        JPPD:     JPPD bypassed: START WITH query block.
        JPPD:     JPPD bypassed: View is a set query block.
        JPPD:     JPPD bypassed: Negative hint found
        JPPD:     JPPD bypassed: Outline does not contain hint
        JPPD:     JPPD bypassed: Base table missing statistics
        JPPD:     JPPD bypassed: Remote table
        JPPD:     JPPD bypassed: Security violation
        JPPD:     JPPD bypassed: Possible security violation.
        JPPD bypassed: Table level NO_PUSH_PRED hint.
        JPPD bypassed: Table-level hint.
        JPPD bypassed: Query block NO_PUSH_PRED hint.
        JPPD bypassed: View semijoined to table
        JPPD bypassed: Contained view has no table in fro list.
        JPPD bypassed: View contains a START WITH query block.
        JPPD bypassed: Does not contain a view.
        JPPD bypassed: Found branches of different types

According to a note that I’ve got with this list, by the way, I used “strings -a” on the Oracle executable for 10.2.0.1 to generate it. It’s probably about time I did the same for newer versions of Oracle.

Footnote:

My plans for addressing comments have been disrupted somewhat. Just as I published this note, an email holding two 10053 trace files arrived. (The author had asked before sending them, and I was sufficiently curious that I had agreed to take a quick look). So I’ve spent most of the last hour doing what I’ve just said I hardly ever to – looking at 10053 trace files.

The question was “why does this query run serially if I have a particular scalar subquery in the select list, but run parallel if I replace it with a function”.  The immediate answer, after I’d seen the query and thought about it for a bit, was: “because the manuals (10.2 – the relevant version) say that you don’t parallelize if you have scalar subqueries in the select list”; but this changed the question to: “why is it just this one scalar subquery that causes serialization when the other two scalar subqueries don’t”. Of the three scalar subqueries, only one of them cause the query to serialize.

The answer to that question is a little more subtle – and I’ll blog about it when I can find time to model the scenario.

June 7, 2013

Same Plan

Filed under: CBO,Execution plans,Oracle,Tuning — Jonathan Lewis @ 5:11 pm UTC Jun 7,2013

An interesting little problem appeared on the Oracle-L mailing list earlier on this week – a query ran fairly quickly when statistics hadn’t been collected on the tables, but then ran rather slowly after stats collection even though the plan hadn’t changed, and the tkprof results were there to prove the point. Here are the two outputs (edited slightly for width – the original showed three sets of row stats, the 1st, avg and max, but since the query had only been run once the three columns showed the same results in each case):


 Rows (max)  Row Source Operation
 ----------  ---------------------------------------------------
          0  UPDATE  CXT_FAKT_PROVISIONSBUCHUNG (cr=2039813 pr=3010 pw=0 time=47745718 us)
      15456   TABLE ACCESS FULL CXT_FAKT_PROVISIONSBUCHUNG (cr=1328 pr=1325 pw=0 time=40734 us cost=370 size=880992 card=15456)
      11225   VIEW  CXV_HAUPT_VU_SPARTE (cr=2038477 pr=1684 pw=0 time=47297497 us cost=10 size=4293 card=1)
      11225    SORT UNIQUE (cr=2038477 pr=1684 pw=0 time=47284436 us cost=10 size=824 card=1)
     126457     VIEW  (cr=2038167 pr=1669 pw=0 time=27753402 us cost=9 size=824 card=1)
     126457      WINDOW SORT (cr=2038167 pr=1669 pw=0 time=27667835 us cost=9 size=853 card=1)
     126457       WINDOW SORT (cr=2038167 pr=1669 pw=0 time=26884699 us cost=9 size=853 card=1)
     126457        NESTED LOOPS  (cr=2038167 pr=1669 pw=0 time=26342292 us)
     126457         NESTED LOOPS  (cr=1995241 pr=1615 pw=0 time=26192173 us cost=7 size=853 card=1)
     141581          NESTED LOOPS  (cr=642683 pr=1066 pw=0 time=3039331 us cost=5 size=499 card=1)
      11225           TABLE ACCESS BY INDEX ROWID POP_INFO (cr=22473 pr=32 pw=0 time=109767 us cost=2 size=16 card=1)
      11225            INDEX UNIQUE SCAN PK_POP_INFO (cr=11248 pr=4 pw=0 time=51796 us cost=1 size=0 card=1)(object id 1790009)
     141581           TABLE ACCESS BY INDEX ROWID TMP_VU_SPARTE (cr=620210 pr=1034 pw=0 time=2889850 us cost=3 size=483 card=1)
    1732978            INDEX RANGE SCAN IDX_TMP_VU_SPARTE (cr=140952 pr=204 pw=0 time=2094982 us cost=2 size=0 card=1)(object id 1795724)
     126457          INDEX RANGE SCAN IDX_TMP_VU_SPARTE (cr=1352558 pr=549 pw=0 time=23078816 us cost=1 size=0 card=1)(object id 1795724)
     126457         TABLE ACCESS BY INDEX ROWID TMP_VU_SPARTE (cr=42926 pr=54 pw=0 time=94791 us cost=2 size=354 card=1)

 Rows (max)  Row Source Operation
 ----------  ---------------------------------------------------
          0  UPDATE  CXT_FAKT_PROVISIONSBUCHUNG (cr=89894995 pr=1701 pw=0 time=318766975 us)
      15456   TABLE ACCESS FULL CXT_FAKT_PROVISIONSBUCHUNG (cr=1328 pr=1031 pw=0 time=46975 us cost=370 size=880992 card=15456)
      11225   VIEW  CXV_HAUPT_VU_SPARTE (cr=89893656 pr=670 pw=0 time=1553653734 us cost=11 size=4293 card=1)
      11225    SORT UNIQUE (cr=89893656 pr=670 pw=0 time=1553640071 us cost=11 size=419 card=1)
     126457     VIEW  (cr=89893656 pr=670 pw=0 time=1533733864 us cost=10 size=419 card=1)
     126457      WINDOW SORT (cr=89893656 pr=670 pw=0 time=1533646166 us cost=10 size=155 card=1)
     126457       WINDOW SORT (cr=89893656 pr=670 pw=0 time=1532847028 us cost=10 size=155 card=1)
     126457        NESTED LOOPS  (cr=89893656 pr=670 pw=0 time=1532238656 us)
    3501658         NESTED LOOPS  (cr=89167767 pr=665 pw=0 time=1529652013 us cost=8 size=155 card=1)
    5300707          NESTED LOOPS  (cr=1339312 pr=480 pw=0 time=9657987 us cost=5 size=81 card=1)
      11225           TABLE ACCESS BY INDEX ROWID POP_INFO (cr=22473 pr=32 pw=0 time=119070 us cost=2 size=16 card=1)
      11225            INDEX UNIQUE SCAN PK_POP_INFO (cr=11248 pr=3 pw=0 time=54707 us cost=1 size=0 card=1)(object id 1790009)
    5300707           TABLE ACCESS BY INDEX ROWID TMP_VU_SPARTE (cr=1316839 pr=448 pw=0 time=8359987 us cost=3 size=65 card=1)
    5300707            INDEX RANGE SCAN IDX_TMP_VU_SPARTE (cr=140971 pr=87 pw=0 time=3603882 us cost=2 size=0 card=1)(object id 1795724)
    3501658          INDEX RANGE SCAN IDX_TMP_VU_SPARTE (cr=87828455 pr=185 pw=0 time=1518016475 us cost=2 size=0 card=1)(object id 1795724)
     126457         TABLE ACCESS BY INDEX ROWID TMP_VU_SPARTE (cr=725889 pr=5 pw=0 time=1829196 us cost=3 size=74 card=1)

As you can see, the first run took 48 seconds (time=47,745,718 us in the first line) while the second execution took 319 seconds (time=318766975 us). If you check the execution plans carefully they appear to be the same plan and, in fact, the trace file showed that the two plans had the same plan hash value. Clearly, though, they do vastly different amounts of work – the most eye-catching detail, perhaps, is the way the bad plan blows the row count up to 5 million before collapsing it back to 126,000). What do you have to do to get the change in performance (and it’s a totally reproducible change) – create or drop stats: if you have stats on the tables you get a slow execution, if you delete the stats you get the fast execution.

So what’s the problem ? Look carefully at the plan(s) – they’re not actually the same plan, but you can’t see the difference you can only see clues that they must be different. Notice in the last four lines that you access the same table (TMP_VU_SPARTE) twice using the same index (IDX_TMP_VU_SPARTE) – when you collect stats you access the two tables in the opposite order, and it makes a difference to the work you do.

To demonstrate the point I’ve created a simplified model of the problem, based on some extra information supplied in the mail thread. The model requires a correlated update, based on a view which joins a table to itself, and range-based predicates. Here’s the data generation:


execute dbms_random.seed(0)

create table t1
as
with generator as (
	select	--+ materialize
		rownum id
	from dual
	connect by
		level <= 1e4
)
select
	trunc(dbms_random.value(0,1e4)) contract,
	sysdate - 300 + trunc(dbms_random.value(0,300))	date_from,
	sysdate - 300 + trunc(dbms_random.value(0,300))	date_to,
	sysdate - 300 + trunc(dbms_random.value(0,300))	created,
	sysdate - 300 + trunc(dbms_random.value(0,300))	replaced
from
	generator	v1,
	generator	v2
where
	rownum <= 1e5
;

create index t1_i1 on t1(contract, date_from, date_to);

create or replace view v1
as
select
	t1a.*
from
	t1	t1a,
	t1	t1b
where
	t1b.contract    = t1a.contract
and	t1b.date_from  <= t1a.date_from 
and	t1b.date_to    >  t1a.date_from
and	t1b.created    <  t1a.replaced 
and	t1b.replaced   >  t1a.created
;

rem
rem	We are going to update t2 from v1 using
rem	equality on all columns in the index
rem

create table t2
as
with generator as (
	select	--+ materialize
		rownum id
	from dual
	connect by
		level <= 1e4
)
select
	trunc(dbms_random.value(0,1e4)) 		contract,
	sysdate - 300 + trunc(dbms_random.value(0,300))	date_from,
	sysdate - 300 + trunc(dbms_random.value(0,300))	date_to,
	sysdate - 300 + trunc(dbms_random.value(0,300))	replaced
from
	generator	v1,
	generator	v2
where
	rownum <= 1e5 ; 

begin 	dbms_stats.gather_table_stats(
 		ownname		 => user,
		tabname		 =>'T1',
		method_opt	 =>'for all columns size 1'
	);

	dbms_stats.gather_table_stats(
		ownname		 => user,
		tabname		 =>'T2',
		method_opt	 =>'for all columns size 1'
	);

end;
/

I haven’t crafted my data particularly carefully and from the point of view of realism the content is a little bizarre (I’ve got “to” dates earlier than “from” dates, for example) – but all I’m interested in is getting the right sort of shape to demonstrate a point about the plan, I’m not trying to model the actual variation in activity.

I’ve created a view with a self-join that starts with equality on first column of the index I’ve created, but uses range-based predicates on the other columns in the table – that’s probably quite important as far as the real-world performance was concerned – partly because of the nature of the data (interesting skews when comparing “valid to/from dates”) and partly because Oracle is going to have to use its 5% guess for join selectivities for range based selectivities.

And now the update – two versions with their execution plans; starting with the plan where the correlated subquery visits t1 aliased as t1a first:

explain plan for
update t2 set
	replaced = (
		select
			/*+ leading(v1.t1a v1.t1b) */
			max(replaced)
		from	v1
		where
			v1.contract = t2.contract
		and	v1.date_from = t2.date_from
		and	v1.date_to = t2.date_to
	)
;

select * from table(dbms_xplan.display(null,null,'alias'));

PLAN_TABLE_OUTPUT
-------------------------------------------------------------------------------------------
Plan hash value: 2328412027

-----------------------------------------------------------------------------------------
| Id  | Operation                       | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------------
|   0 | UPDATE STATEMENT                |       |   100K|  2734K|   500K (20)| 00:41:41 |
|   1 |  UPDATE                         | T2    |       |       |            |          |
|   2 |   TABLE ACCESS FULL             | T2    |   100K|  2734K|    64   (8)| 00:00:01 |
|   3 |   SORT AGGREGATE                |       |     1 |    72 |            |          |
|   4 |    NESTED LOOPS                 |       |       |       |            |          |
|   5 |     NESTED LOOPS                |       |     1 |    72 |     4   (0)| 00:00:01 |
|   6 |      TABLE ACCESS BY INDEX ROWID| T1    |     1 |    36 |     2   (0)| 00:00:01 |
|*  7 |       INDEX RANGE SCAN          | T1_I1 |     1 |       |     1   (0)| 00:00:01 |
|*  8 |      INDEX RANGE SCAN           | T1_I1 |     1 |       |     1   (0)| 00:00:01 |
|*  9 |     TABLE ACCESS BY INDEX ROWID | T1    |     1 |    36 |     2   (0)| 00:00:01 |
-----------------------------------------------------------------------------------------

Query Block Name / Object Alias (identified by operation id):
-------------------------------------------------------------
   1 - UPD$1
   2 - UPD$1        / T2@UPD$1
   3 - SEL$F5BB74E1
   6 - SEL$F5BB74E1 / T1A@SEL$2
   7 - SEL$F5BB74E1 / T1A@SEL$2
   8 - SEL$F5BB74E1 / T1B@SEL$2
   9 - SEL$F5BB74E1 / T1B@SEL$2

Predicate Information (identified by operation id):
---------------------------------------------------
   7 - access("T1A"."CONTRACT"=:B1 AND "T1A"."DATE_FROM"=:B2 AND
              "T1A"."DATE_TO"=:B3)
   8 - access("T1B"."CONTRACT"=:B1 AND "T1B"."DATE_TO">:B2 AND
              "T1B"."DATE_FROM"<=:B3)
       filter("T1B"."DATE_TO">:B1 AND "T1B"."CONTRACT"="T1A"."CONTRACT" AND
              "T1B"."DATE_FROM"<="T1A"."DATE_FROM" AND "T1B"."DATE_TO">"T1A"."DATE_FROM")
   9 - filter("T1B"."CREATED"<"T1A"."REPLACED" AND 
              "T1B"."REPLACED">"T1A"."CREATED")

Notice that there’s nothing in the body of the plan that tells you which copy of t1 is visited first – you can’t tell unless you look carefully at the predicate information, or unless you’ve requested the alias section that lets you see very easily that the t1 at line 6 (the first one accessed) is aliased as t1a and the t1 at line 9 is aliased as t1b. While you’re at it, check the plan hash value from the top of the plan output.

Now the plan when we hint the two tables into the reverse order:

PLAN_TABLE_OUTPUT
-------------------------------------------------------------------------------------------
Plan hash value: 2328412027

-----------------------------------------------------------------------------------------
| Id  | Operation                       | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------------
|   0 | UPDATE STATEMENT                |       |   100K|  2734K|   600K (17)| 00:50:01 |
|   1 |  UPDATE                         | T2    |       |       |            |          |
|   2 |   TABLE ACCESS FULL             | T2    |   100K|  2734K|    64   (8)| 00:00:01 |
|   3 |   SORT AGGREGATE                |       |     1 |    72 |            |          |
|   4 |    NESTED LOOPS                 |       |       |       |            |          |
|   5 |     NESTED LOOPS                |       |     1 |    72 |     5   (0)| 00:00:01 |
|   6 |      TABLE ACCESS BY INDEX ROWID| T1    |     1 |    36 |     3   (0)| 00:00:01 |
|*  7 |       INDEX RANGE SCAN          | T1_I1 |     1 |       |     2   (0)| 00:00:01 |
|*  8 |      INDEX RANGE SCAN           | T1_I1 |     1 |       |     1   (0)| 00:00:01 |
|*  9 |     TABLE ACCESS BY INDEX ROWID | T1    |     1 |    36 |     2   (0)| 00:00:01 |
-----------------------------------------------------------------------------------------

Query Block Name / Object Alias (identified by operation id):
-------------------------------------------------------------
   1 - UPD$1
   2 - UPD$1        / T2@UPD$1
   3 - SEL$F5BB74E1
   6 - SEL$F5BB74E1 / T1B@SEL$2
   7 - SEL$F5BB74E1 / T1B@SEL$2
   8 - SEL$F5BB74E1 / T1A@SEL$2
   9 - SEL$F5BB74E1 / T1A@SEL$2

Predicate Information (identified by operation id):
---------------------------------------------------
   7 - access("T1B"."CONTRACT"=:B1 AND "T1B"."DATE_TO">:B2 AND
              "T1B"."DATE_FROM"<=:B3) 
       filter("T1B"."DATE_TO">:B1)
   8 - access("T1B"."CONTRACT"="T1A"."CONTRACT" AND "T1A"."DATE_FROM"=:B1 AND
              "T1A"."DATE_TO"=:B2)
       filter("T1A"."CONTRACT"=:B1 AND "T1B"."DATE_FROM"<="T1A"."DATE_FROM" AND 
              "T1B"."DATE_TO">"T1A"."DATE_FROM")
   9 - filter("T1B"."CREATED"<"T1A"."REPLACED" AND 
              "T1B"."REPLACED">"T1A"."CREATED")

The plan hash value is the same – but if you look at the alias section you can see that the t1 we access first is the one with alias t1b.
Now compare the predicate sections, just for line 7 (the initial index access line):

First Plan
   7 - access("T1A"."CONTRACT"=:B1 AND "T1A"."DATE_FROM"=:B2 AND "T1A"."DATE_TO"=:B3)

Second Plan
   7 - access("T1B"."CONTRACT"=:B1 AND "T1B"."DATE_TO">:B2 AND "T1B"."DATE_FROM"<=:B3) 
       filter("T1B"."DATE_TO">:B1)

Although my model data is random garbage, it’s easy to imagine that with a large data set the selectivities of these two predicates could be dramatically different, and there is clearly some “real-world” meaning to the date_to/date_from columns for a given row that the optimizer is unlikely to recognise when looking at the individual column stats for the table. It’s not surprising that a plan with no stats (which results in dynamic sampling) could find a better plan than a table with stats that leaves the optimizer using its default “call it 5% and hope for the best” strategy for range-based joins.

Conclusion

When you reference a table more than once in an execution plan, make sure you look very carefully at the predicate section – or even call for the alias section – so that you know exactly which copy of the table appears at which point in the plan.

Footnote

Although the results don’t mean much for my example, here’s my output from tracing my queries – rather than report the whole query in each case, I’ve just given the hint to show the table order:


/*+ leading(v1.t1a v1.t1b) */

call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        1      0.00       0.00          0          0          0           0
Execute      1      6.01       0.00          0     200722     204119      100000
Fetch        0      0.00       0.00          0          0          0           0
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total        2      6.01       0.00          0     200722     204119      100000

Rows (1st) Rows (avg) Rows (max)  Row Source Operation
---------- ---------- ----------  ---------------------------------------------------
         0          0          0  UPDATE  T2 (cr=200773 pr=0 pw=0 time=0 us)
    100000     100000     100000   TABLE ACCESS FULL T2 (cr=459 pr=0 pw=0 time=0 us cost=64 size=2800000 card=100000)
    100000     100000     100000   SORT AGGREGATE (cr=200248 pr=0 pw=0 time=0 us)
         0          0          0    NESTED LOOPS  (cr=200248 pr=0 pw=0 time=0 us)
         0          0          0     NESTED LOOPS  (cr=200248 pr=0 pw=0 time=0 us cost=4 size=72 card=1)
         0          0          0      TABLE ACCESS BY INDEX ROWID T1 (cr=200248 pr=0 pw=0 time=0 us cost=2 size=36 card=1)
         0          0          0       INDEX RANGE SCAN T1_I1 (cr=200248 pr=0 pw=0 time=0 us cost=1 size=0 card=1)(object id 81082)
         0          0          0      INDEX RANGE SCAN T1_I1 (cr=0 pr=0 pw=0 time=0 us cost=1 size=0 card=1)(object id 81082)
         0          0          0     TABLE ACCESS BY INDEX ROWID T1 (cr=0 pr=0 pw=0 time=0 us cost=2 size=36 card=1)

/*+ leading(v1.t1b v1.t1a) */

call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        1      0.00       0.00          0          0          0           0
Execute      1      6.09       0.00          0     613372     200000      100000
Fetch        0      0.00       0.00          0          0          0           0
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total        2      6.09       0.00          0     613372     200000      100000

Rows (1st) Rows (avg) Rows (max)  Row Source Operation
---------- ---------- ----------  ---------------------------------------------------
         0          0          0  UPDATE  T2 (cr=613372 pr=0 pw=0 time=1 us)
    100000     100000     100000   TABLE ACCESS FULL T2 (cr=459 pr=0 pw=0 time=0 us cost=64 size=2800000 card=100000)
    100000     100000     100000   SORT AGGREGATE (cr=612913 pr=0 pw=0 time=1 us)
         0          0          0    NESTED LOOPS  (cr=612913 pr=0 pw=0 time=1 us)
         0          0          0     NESTED LOOPS  (cr=612913 pr=0 pw=0 time=1 us cost=5 size=72 card=1)
    166078     166078     166078      TABLE ACCESS BY INDEX ROWID T1 (cr=368051 pr=0 pw=0 time=0 us cost=3 size=36 card=1)
    166078     166078     166078       INDEX RANGE SCAN T1_I1 (cr=202108 pr=0 pw=0 time=0 us cost=2 size=0 card=1)(object id 81082)
         0          0          0      INDEX RANGE SCAN T1_I1 (cr=244862 pr=0 pw=0 time=0 us cost=1 size=0 card=1)(object id 81082)
         0          0          0     TABLE ACCESS BY INDEX ROWID T1 (cr=0 pr=0 pw=0 time=0 us cost=2 size=36 card=1)

There isn’t a lot of difference in the run-time (which is mostly due to the tablescan) – but there’s an obvious difference in the number of buffer visits and the amount of data found for the first t1 access.
Just like the OP my plan varied with stats – though in my case I got the better plan when I had stats, and the worse plan when I deleted stats and the optimizer used dynamic sampling.

May 28, 2013

How to hint

Filed under: CBO,Hints,Oracle — Jonathan Lewis @ 5:25 pm UTC May 28,2013

Here’s a live example demonstrating a point I’ve often made – you have to be very detailed in your hinting or Oracle will find a way to obey your hints and do the wrong thing.  A recent posting on the OTN database forum gave use the following query and execution plan:

SELECT
	ERO.DVC_EVT_ID,
	E.DVC_EVT_DTTM
FROM D1_DVC_EVT E,
     D1_DVC_EVT_REL_OBJ ERO
WHERE
	ERO.MAINT_OBJ_CD = 'D1-DEVICE'
AND	ERO.PK_VALUE1 = :H1
AND	ERO.DVC_EVT_ID = E.DVC_EVT_ID
AND	E.DVC_EVT_TYPE_CD IN (
		'END-GSMLOWLEVEL-EXCP-SEV-1',
		'STR-GSMLOWLEVEL-EXCP-SEV-1'
	)
ORDER BY
	E.DVC_EVT_DTTM DESC

-----------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                            | Name       | Starts | Cost (%CPU)| Pstart| Pstop | A-Rows |   A-Time   | Buffers |Reads  |
-----------------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                     |            |      1 |  3196 (100)|       |       |    134 |00:00:13.85 |    3195 |  2136 |
|   1 |  SORT ORDER BY                       |            |      1 |  3196   (1)|       |       |    134 |00:00:13.85 |    3195 |  2136 |
|   2 |   NESTED LOOPS                       |            |      1 |            |       |       |    134 |00:00:13.85 |    3195 |  2136 |
|   3 |    NESTED LOOPS                      |            |      1 |  3195   (1)|       |       |   1059 |00:00:07.77 |    2138 |  1197 |
|*  4 |     INDEX RANGE SCAN                 | TEST1      |      1 |    30   (0)|       |       |   1059 |00:00:00.07 |      11 |    11 |
|   5 |     PARTITION RANGE ITERATOR         |            |   1059 |     1   (0)|   KEY |   KEY |   1059 |00:00:07.69 |    2127 |  1186 |
|*  6 |      INDEX UNIQUE SCAN               | D1T400P0   |   1059 |     1   (0)|   KEY |   KEY |   1059 |00:00:07.67 |    2127 |  1186 |
|*  7 |    TABLE ACCESS BY GLOBAL INDEX ROWID| D1_DVC_EVT |   1059 |     2   (0)| ROWID | ROWID |    134 |00:00:06.08 |    1057 |   939 |
-----------------------------------------------------------------------------------------------------------------------------------------

You’ll notice that something close to half the time spent came from the table access in line 7 (This is 11g, and we have a plan which shows the “double nested loop” of an index access followed by a table access – for each rowid returned in line 3 (totalling 7.77 seconds) we access the table through the nested loop driven by line 2 which totals 13.85 seconds).

After a little chat, the suggestion arose to introduce an index that avoided the table access – it’s doing a fairly large amount of random I/O, and we might be able to run the query roughly twice as fast if we didn’t visit it. So the DBA set up a suitable test index (called test2) on the D1_DVC_EVT table, and found that the optimizer didn’t use it (perhaps because the index was larger then the alternative, perhaps because the clustering_factor was much bigger) – so he added a hint to the code: /*+ index (e test2) */ which made Oracle use the index to produce the following plan:

----------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                   | Name  | Starts | Cost (%CPU)| Pstart| Pstop | A-Rows |   A-Time   | Buffers | Reads  |
----------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |       |      1 | 98415 (100)|       |       |    134 |00:04:11.82 |     100K|  96848 |
|   1 |  SORT ORDER BY              |       |      1 | 98415   (1)|       |       |    134 |00:04:11.82 |     100K|  96848 |
|*  2 |   HASH JOIN                 |       |      1 | 98414   (1)|       |       |    134 |00:04:11.82 |     100K|  96848 |
|*  3 |    INDEX RANGE SCAN         | TEST1 |      1 |    30   (0)|       |       |   1059 |00:00:00.01 |      11 |      0 |
|   4 |    PARTITION RANGE ALL      |       |      1 | 98249   (1)|1048575|     1 |   7566K|00:03:34.58 |     100K|  96848 |
|   5 |     PARTITION RANGE SUBQUERY|       |    287 | 98249   (1)|KEY(SQ)|KEY(SQ)|   7566K|00:03:10.87 |     100K|  96848 |
|*  6 |      INDEX FULL SCAN        | TEST2 |   2296 | 98249   (1)|1048575|     1 |   7566K|00:02:45.47 |   97412 |  96848 |
----------------------------------------------------------------------------------------------------------------------------

Unfortunately, although Oracle obeyed the hint – it had to, since it was legal and in-context – it didn’t take the path the DBA expected.

When you hint, you have to make it impossible for Oracle find any path you don’t want, and that can take a lot of hints. In this case the DBA simply wanted to use the same nested loop path that he’d originally seen, but using the new index instead. To get the path safely he needed at least 4 hints: one to specify the join order, one to specify the join method, and one for each table to specify the access method. In this case:

/*+
        leading(ero e)
        use_nl(ero e)
        index(ero test1)
        index(e test2)
*/

Once you’ve hinted some SQL and got it working the safe thing to do, in 11g, is to check the outline section of the actual execution plan to see if you’ve missed any important hints and then, if you can’t change the production code, attach the SQL Baseline from your hinted code to the SQL text from the original. (See – for example: http://jonathanlewis.wordpress.com/2011/01/12/fake-baselines/ )

It’s hard to create a full set of hints by hand – and I often see hinted SQL in production systems where the plan that appears happens to be the right one but it’s not the only plan that could be derived from the hints. So my 11g mantra for hinting is this: if you can hint it, baseline it”.

 

May 23, 2013

Dynamic Sampling – 2

Filed under: CBO,Hints,Oracle,Troubleshooting — Jonathan Lewis @ 12:46 pm UTC May 23,2013

I’ve written about dynamic sampling in the past, but here’s a little wrinkle that’s easy to miss. How do you get the optimizer to work out the correct cardinality for a query like (the table creation statement follows the query):

select	count(*)
from	t1
where	n1 = n2
;

create table t1
as
with generator as (
	select	--+ materialize
		rownum id
	from dual
	connect by
		level <= 1e4
)
select
	mod(rownum, 1000)	n1,
	mod(rownum, 1000)	n2
from
	generator	v1,
	generator	v2
where
	rownum <= 1e6 ; 

If you’re running 11g and can changed the code there are a couple of easy options – adding a virtual column, or applying extended stats and then modifying the SQL accordingly would be appropriate.

 -- Virtual Column 
alter table t1 add (
 	n3	generated always as ( case n1 when n2 then 1 end) virtual 
) 
; 

execute dbms_stats.gather_table_stats(user,'t1',method_opt=>'for columns n3 size 1')

-- Extended Stats

begin
	dbms_output.put_line(
		dbms_stats.create_extended_stats(
			ownname		=> user,
			tabname		=> 'T1',
			extension	=> '(case n1  when n2 then 1 else null end)'
		)
	);

	dbms_stats.gather_table_stats(
		ownname		 => user,
		tabname		 =>'T1',
		block_sample 	 => true,
		method_opt 	 => 'for columns (case n1  when n2 then 1 else null end) size 1'
	);
end;
/

select	count(*)
from	t1
where	(case n1 when n2 then 1 else null end)= 1
;

If you can’t change the SQL statement, there’s always the option for bypassing the problem by fixing a suitable execution plan with an SQL Baseline, of course. Alternatively, if you can think of the right hint you could create an “SQL Patch” for the statement – but what hint might be appropriate ? I’ll answer that question in a minute.

Here’s another option, though: get Oracle to use dynamic sampling. (You probably guessed that from the title of the post.) So which level would you use to make this work ? Left to its own devices, Oracle would calculate the selectivity of the predicate n1 = n2 as the smaller of the two separate predicates “n1 = unknown” and “n2 = unknown”. So you might hope that level 3 (Oracle is “guessing”) or level 4 (more than one predicate on a single table) might be appropriate. It’s the latter that works. If you execute “alter session set optimizer_dynamic_sampling=4;” before executing this query, Oracle will sample the table before optimising.

The method works, but can you apply it ? Possibly not, if you’re not allowed to inject any extra SQL anywhere – after all, you probably don’t want to set the parameter at the system level (spfile or init.ora) because it may affect lots of other queries – introducing more work because of the sample, and then risking unexpected changes in execution plans. Setting the parameter for a session is often no better. And this brings me back to the SQL Patch approach – if you don’t want to create a baseline for the query then perhaps a patch with the hint /*+ opt_param(‘optimizer_dynamic_sampling’ 4) */ will do the trick. Don’t forget all the doubling of single quotes that you’ll need, though (this is the code fragment I used):

begin
	sys.dbms_sqldiag_internal.i_create_patch(
		sql_text	=>
'
select
	count(*)
from	t1
where	n1 = n2
',
		hint_text	=> 'opt_param(''optimizer_dynamic_sampling'' 4)'

	);
end;
/

For more analysis and commentary on the SQL Patch mechanism, you might like to read Dominic Brooks’ mini-series:

May 9, 2013

Clustering_factor

Filed under: CBO,Indexing,Oracle — Jonathan Lewis @ 8:14 am UTC May 9,2013

Cost Based Oracle – Fundamentals (November 2005)

But the most interesting function for our purposes is sys_op_countchg(). Judging from its name, this function is probably counting changes, and the first input parameter is the block ID portion (object_id, relative file number, and block number) of the table’s rowid, so the function is clearly matching our notional description of how the clustering_factor is calculated. But what is that 1 we see as the second parameter?

When I first understood how the clustering_factor was defined, I soon realized that its biggest flaw was that Oracle wasn’t remembering recent history as it walked the index; it only remembered the previous table block so that it could check whether the latest row was in the same table block as last time or in a new table block. So when I saw this function, my first guess (or hope) was that the second parameter was a method of telling Oracle to remember a list of previous block visits as it walked the index.

And finally, Oracle Corp. had implemented an official interface to the second parameter of sys_op_countchg() – provided you install the right patch – through a new table (or schema, or database) preference type available to the dbms_stats.set_table_prefs() procedure.

I’ve been meaning to write this post for two or three months, ever since Sean Molloy sent me an email about short blog note from Martin Decker describing Bug 13262857  Enh: provide some control over DBMS_STATS index clustering factor computation. Unfortunately I’ve not yet had time to investigate the patch, but I don’t think I need to any more because Richard Foote has written it up in his latest blog post.

Read Richard’s post – it’s important.

Update 10th May

Richard’s post has, unsurprisingly, produced a buzz of excitement in his reader – and started up the discussion of how best to use this capability; so here’s another quote from the book (p.111 – available in the download of chapter 5):

So using Oracle’s own function for calculating the clustering_factor, but substituting the freelists value for the table, may be a valid method for correcting some errors in the clustering_factor for indexes on strongly sequenced data. (The same strategy applies if you use multiple freelist groups—but multiply freelists by freelist groups to set the second parameter.)

Can a similar strategy be used to find a modified clustering_factor in other circumstances? I think the answer is a cautious “yes” for tables that are in ASSM tablespaces. Remember that Oracle currently allocates and formats 16 new blocks at a time when using automatic segment space management (even when the extent sizes are very large, apparently). This means that new data will be roughly scattered across groups of 16 blocks, rather than being tightly packed.

Calling Oracle’s sys_op_countchg() function with a parameter of 16 could be enough to produce a reasonable clustering_factor where Oracle currently produces a meaningless one. The value 16 should, however, be used as an upper bound. If your real degree of concurrency is typically less than 16, then your actual degree of concurrency would probably be more appropriate.

Whatever you do when experimenting with this function—don’t simply apply it across the board to all indexes, or even all indexes on a particular table. There will probably be just a handful of critical indexes where it is a good way of telling Oracle a little more of the truth about your system—in other cases you will simply be confusing the issue.

Note particularly the comments about how the best value depends on the data in the indexed columns, the table configuration, and the degree of concurrency - you don’t necessarily want to use the same value for every index on a given table. That’s a shame, since Oracle has defined the interface as a TABLE preference, so if you set it then you get the same for every index. Despite this, if you’re prepared to put in a little control work, it does mean that you can use an official Oracle mechanism to play the game I was suggesting in the book – for each “special” index, set the preference, collect the stats, then clear the preference.

March 31, 2013

Index Selectivity

Filed under: CBO,Oracle,Troubleshooting — Jonathan Lewis @ 6:42 pm UTC Mar 31,2013

Here’s a summary of a recent posting on OTN:

I have two indexes (REFNO, REFTYPESEQNO) and (REFNO,TMSTAMP,REFTYPESEQNO). When I run the following query the optimizer uses the second index rather than the first index – which is an exact match for the predicates, unless I hint it otherwise:

select *
from   RefTable
where  RefTypeSeqNo = :1
and    RefNo = :2;

Default plan:
---------------------------------------------------------------------------------------------
| Id  | Operation                   | Name          | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |               |     3 |   126 |     6   (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID| REFTABLE      |     3 |   126 |     6   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN          | REFTABLE_CX03 |     3 |       |     4   (0)| 00:00:01 |
---------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("REFNO"=TO_NUMBER(:2) AND "REFTYPESEQNO"=TO_NUMBER(:1))
       filter("REFTYPESEQNO"=TO_NUMBER(:1))

Hinted plan:
---------------------------------------------------------------------------------------------
| Id  | Operation                   | Name          | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |               |     3 |   126 |    15   (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID| REFTABLE      |     3 |   126 |    15   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN          | REFTABLE_CX02 |    14 |       |     4   (0)| 00:00:01 |
---------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("REFNO"=TO_NUMBER(:2) AND "REFTYPESEQNO"=TO_NUMBER(:1))

This is actually an example of a feature of the optimizer that I described a few years ago. The original note described a change as you moved from 10.1 to 10.2 and on to 11.1 – but once you’ve seen the basic issue there are a number of variations on how it might appear. In this case the OP seems to be using 10gR2, where the distinct_keys value from an index is used to calculate the cost and row estimate for the index and for the table access cost when that specific indexed access path is being considered.

So with the hint to use the accurate index we see an index cardinality estimate of 14 rows with a table cardinality of 3 despite the fact that the plan shows no extra predicates applied at the table; the cost of accessing the table is also clearly related to the cardinality estimate on the index line.

In the default plan when the wrong index is used, the optimizer doesn’t pay any attention to the distinct_keys from the other index, and simply uses the standard “product of column selectivities”.

11g introduces two changes – when calculating the table cardinality the distinct_keys value for the index is carried forward (so the plan with the high index cardinality but low table cardinality would report the same cardinality for both operations), and the distinct_keys from the first index would be used when doing the calculations for the second index – which would increase the cost of using the wrong index.

There’s really very little you can do to find a strategic fix for this type of problem in 10g – obviously you could add hints whenever Oracle used the wrong index, but that’s not reallya desirable approach, and it is possible to adjust column statistics in such a way that the calculations the optimizer uses give better approximations, but that’s not always very easy to do well. Ultimately you just have to be very careful about your choice of indexes – and when you think that two indexes show a significant overlap in columns consider the possibility that one carefully defined index may be able to do the job of both.

 

 

January 3, 2013

Skip Scan 2

Filed under: CBO,Indexing,Oracle — Jonathan Lewis @ 6:42 pm UTC Jan 3,2013

Here’s a question that is NOT a trick question, it’s demonstrating an example of optimizer behaviour that might come as a surprise.
I have an index (addr_id0050, effective_date), the first column is numeric, the second is a date. Here’s a query with an execution plan that uses that index:

define m_date='30-Jan-2013'

select
	small_vc
from	t1
where
	addr_id0050 between 24 and 26
and	effective_date = to_date('&m_date', 'dd-mon-yyyy')
;

------------------------------------------------------------------------
| Id  | Operation                   | Name     | Rows  | Bytes | Cost  |
------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |          |    18 |   396 |    23 |
|   1 |  TABLE ACCESS BY INDEX ROWID| T1       |    18 |   396 |    23 |
|*  2 |   INDEX SKIP SCAN           | T1_I0050 |    18 |       |     5 |
------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("ADDR_ID050">=24 AND "EFFECTIVE_DATE"=TO_DATE(' 2013-01-30'
              00:00:00', 'syyyy-mm-dd hh24:mi:ss') AND "ADDR_ID050"<=26)
       filter("EFFECTIVE_DATE"=TO_DATE(' 2013-01-30 00:00:00', 'syyyy-mm-dd
              hh24:mi:ss'))

So here’s the question – given that my where clause includes a predicate on the first column of the index that would allow an index range scan to take place, wouldn’t you expect Oracle to do a range scan, and how does a skip scan work in this case ?

To push the point a little further I have another column, also numeric, in the same table which appears in a similar index (addr_id2500, effective_date). Here’s the equivalent query with its execution plan.

select
	small_vc
from	t1
where
	addr_id2500 between 24 and 26
and	effective_date = to_date('&m_date', 'dd-mon-yyyy')
;

------------------------------------------------------------------------
| Id  | Operation                   | Name     | Rows  | Bytes | Cost  |
------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |          |     1 |    23 |     5 |
|   1 |  TABLE ACCESS BY INDEX ROWID| T1       |     1 |    23 |     5 |
|*  2 |   INDEX RANGE SCAN          | T1_I2500 |     1 |       |     4 |
------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("ADDR_ID2500">=24 AND "EFFECTIVE_DATE"=TO_DATE('
              2013-01-30 00:00:00', 'syyyy-mm-dd hh24:mi:ss') AND "ADDR_ID2500"<=26)
       filter("EFFECTIVE_DATE"=TO_DATE(' 2013-01-30 00:00:00',
              'syyyy-mm-dd hh24:mi:ss'))

In this case, with the same starting predicate (but referencing the alternative column), the optimizer produces the index range that we might expect.

The index skip scan isn’t just for cases where the first column of an index is missing from the list of predicates. The basic principle is that the optimizer has the option to do a small number of tiny range scans on an index when the alternative is to do a tablescan or a very large index range scan.

As you might guess from my column names, addr_id0050 holds 50 distinct values, so the range 24 to 26 actually accounts for 6% of the total volume of the index. On the other hand, there are over 800 distinct values for effective_date. There is even an index on (effective_date) but that’s going to pick up lots of rows that are widely scattered throughout the table then throw away 94% of them so Oracle has decided it’s too expensive to use.

So the optimizer has worked out that it can probe for (24, 30th Jan), (25, 30th Jan), and (26, 30th Jan) as the most efficient access path. Of course, it doesn’t know that it’s probing for exactly those values but statistically it assumes that there are only a few possible values that will show up if it probes the index on the first column – in effect using an “inlist iterator” on the first column without knowing in advance what it’s going to find in the list.

When the optimizer sees the query using addr_id2500 (which holds 2,500 distinct values) the arithmetic faviours the option we are familiar with. A simple range scan based on values between 24 and 26 is going to range through roughly l/800th of the index – which is a tiny number of leaf blocks (just one or two, in my case) so the optimizer decides to do that and check every index entry on the way to see if the effective_date holds a suitable value.

If you want to repeat the experiment, I was using 11.2.0.3 with 1MB uniform extents and freelist management. The SQL to create the table and indexes is as follows:

create table t1
as
with generator as (
	select	--+ materialize
		rownum 	id
	from	all_objects
	where	rownum <= 3000
)
select
	mod(rownum,2500)				addr_id2500,
	mod(rownum,50)					addr_id0050,
	trunc(sysdate) + trunc(mod(rownum,2501)/3)	effective_date,
	lpad(rownum,10,'0')				small_vc,
	rpad('x',100)					padding
from
	generator	v1,
	generator	v2
where
	rownum <= 250000
;

create index t1_i1 on t1(effective_date);
create index t1_i2500 on t1(addr_id2500, effective_date);
create index t1_i0050 on t1(addr_id0050, effective_date);

-- collect stats, no histograms, 11g auto sample size

I had disabled CPU costing to get repeatable results – depending on parameter settings whether you have system stats enabled or not then you may need to tweak the code to change the relative numbers of distinct values in the numeric columns before you see the switch between range scan and skip scan.

October 23, 2012

Skip Scan

Filed under: CBO,Indexing,Oracle — Jonathan Lewis @ 5:55 pm UTC Oct 23,2012

A recent question on OTN asked how you could model a case where Oracle had the choice between a “perfect” index for a range scan and an index that could be used for an index skip scan and choose the latter path even though it was clearly (to the human eye) the less sensible choice. There have been a number of wierd and wonderful anomalies with the index skip scan and bad choice over the years, and this particular case is just one of many oddities I have seen in the past – so I didn’t think it would be hard to model one (in fact, I thought I already had at least two examples somewhere in my library – but I couldn’t find them).

Take a data set with two columns, call them id1 and id2, and create indexes on (id1), and (id2, id1). Generate the id1 column as a wide range of cyclic values, generate the id2 set with a small number of repetitive values so that a large number of physically adjacent rows hold the same value. The clustering_factor on the (id1) index will be very large, the clustering_factor on the (id2, id1) index will be relatively small because it will be controlled largely by the repetitive id2 value. Here’s the data set:
(more…)

August 19, 2012

Compression Units – 5

Filed under: CBO,Exadata,Indexing,Oracle — Jonathan Lewis @ 6:02 pm UTC Aug 19,2012

The Enkitec Extreme Exadata Expo (E4) event is over, but I still have plenty to say about the technology. The event was a great success, with plenty of interesting speakers and presentations. As I said in a previous note, I was particularly keen to hear  Frits Hoogland’s comments  on Exadata and OLTP, Richard Foote on Indexes, and Maria Colgan’s comments on how Oracle is making changes to the optimizer to understand Exadata a little better.

All three presentations were interesting – but Maria’s was possiby the most important (and entertaining). In particular she told us about two patches for 11.2.0.3, one current and one that is yet to be released (unfortunately I forgot to take  note of the patch numbers – ed: but they’ve been supplied by readers’ comment below).
(more…)

July 23, 2012

Compression Units – 2

Filed under: CBO,Exadata,Indexing,Oracle — Jonathan Lewis @ 4:41 pm UTC Jul 23,2012

When I arrived in Edinburgh for the UKOUG Scotland conference a little while ago Thomas Presslie, one of the organisers and chairman of the committee, asked me if I’d sign up on the “unconference” timetable to give a ten-minute talk on something. So I decided to use Hybrid Columnar Compression to make a general point about choosing and testing features. For those of you who missed this excellent conference, here’s a brief note of what I said.
(more…)

July 16, 2012

ANSI Outer 2

Filed under: Execution plans,Oracle,CBO,ANSI Standard — Jonathan Lewis @ 4:55 pm UTC Jul 16,2012

A comment on a recent post of mine pointed me to a question on the OTN SQL and PL/SQL Forum where someone had presented a well-written test case of an odd pattern of behaviour in ANSI SQL. I made a couple of brief comments on the thread, but thought it worth highlighting here as well. The scripts to create the required tables (plus a few extras) are all available on OTN. If you create only the four tables needed and all their indexes you will need about 1.3GB of space.

The core of the problem is this: there is a three table join which does a hash join involving an index fast full scan on a particular index; when you add a fourth table to the join this fast full scan turns into a full tablescan for no obvious reason. Here are the queries, with the plans that I got when running 10.2.0.3. (My final plan is slightly different from the plan shown on OTN – I have a right outer hash join to the last table where the OP had a nested loop outer – but the difference is not significant). The queries, with their execution plans, are below- the three table join first:
(more…)

July 10, 2012

Random Plans

Filed under: CBO,humour,Oracle — Jonathan Lewis @ 5:32 pm UTC Jul 10,2012

Have you ever felt that the optimizer was persecuting you by picking plans at random ? Perhaps you’re not paranoid, perhaps it’s part of Oracle Corp’s. master plan to take over the world. If you look closely at the list of hidden parameters you’ll find that some details of this cunning plan have leaked. In 10.1.0.2 Oracle created a new parameter _optimizer_random_plan with the description “optimizer seed value for random plans”. Who knows what terrible effects we may see when the default value of this parameter changes.

June 8, 2012

Unique Fail

Filed under: Bugs,CBO,Execution plans,Oracle,Statistics — Jonathan Lewis @ 5:54 pm UTC Jun 8,2012

As in – how come a unique (or primary key) index is predicted to return more than one row using a unique scan. Here’s and example (running on 10.2.0.3 – but the same type of thing happens on newer versions):
(more…)

Next Page »

Theme: Rubric. Blog at WordPress.com.

Follow

Get every new post delivered to your Inbox.

Join 1,437 other followers