Oracle Scratchpad

July 24, 2013

Linear Decay

Filed under: CBO,Oracle,Statistics — Jonathan Lewis @ 6:11 pm BST Jul 24,2013

I’ve mentioned “linear decay” in several posts when explaining a problem that someone has seen with an execution path – but I’ve recently realised that I don’t have a post describing what it is and how it works – although it’s in Cost Based Oracle – Fundamentals, of course, if you want some detail – so here’s a brief introduction (based on simple stats with no histograms).

When you have a predicate that involves values outside the known high or low values, Oracle bases its estimate for equality predicates on “column = unknown value” and then scales this down by comparing the known range of values for that column with the distance outside the range that your supplied value falls. So, for example, if your low/high values for a column are 1 and 100 respectively with 100 distinct value and Oracle thinks that there are 10 rows per distinct value the cardinality estimate for the predicate “column = 180″ will be calculate (approximately) by the following algorithm:

  • Known range 99 : (high – low) — call it 100 to keep the arithmetic simple
  • Cardinality estimate for values in range 10 (as stated above)
  • Distance outside range 80 : (180 – 100)
  • Percentage distance outside range 80%, so scale cardinality down by 80%
  • Cardinality estimate = 2

There are many other factors to handling values outside the range – one of the most awkward is that the estimate of cardinality for an out-of-range inequality predicate (e.g. column > 101) is defined to be the same as “column = unknown value in range” and that leads to some counter-intuitive results.

Just for reference and play, here’s a little script to create some data (100 distinct values, dates and integers) 16 rows per distinct value, with some queries demonstrating the point. First we generate the data:

create table t1 as
select
	trunc(sysdate) + rownum	d1,
	rownum			n1
from    all_objects
where   rownum <= 100
;

insert into t1 select * from t1;
insert into t1 select * from t1;
insert into t1 select * from t1;
insert into t1 select * from t1;

commit;

execute dbms_stats.gather_table_stats(user,'t1');

column d_80 new_value m_80
column d_40 new_value m_40
column d40  new_value m40
column d80  new_value m80

select
	min(d1), max(d1), max(d1) - min(d1) + 1 d_range,
	min(n1), max(n1), max(n1) - min(n1) + 1 n_range,
	to_char(max(d1) - 80,'dd-mon-yyyy')	d_80,
	to_char(max(d1) - 40,'dd-mon-yyyy')	d_40,
	to_char(max(d1) + 40,'dd-mon-yyyy')	d40,
	to_char(max(d1) + 80,'dd-mon-yyyy')	d80
from
	t1
;

Then we run some sample queries – note that the numeric ranges and the date ranges match in size and distance above the known high value.

prompt	===========
prompt	d1 in range
prompt	===========

select * from t1 where d1 = to_date('&m_80');
select * from t1 where d1 between to_date('&m_80') and to_date('&m_40');

prompt	===========
prompt	n1 in range
prompt	===========

select * from t1 where n1 = 40;
select * from t1 where n1 between 40 and 80;

prompt	================
prompt	d1 out of range
prompt	================

select * from t1 where d1 = to_date('&m40');
select * from t1 where d1 = to_date('&m80');
select * from t1 where d1 between to_date('&m40') and to_date('&m80');

prompt	================
prompt	n1 out of range
prompt	================

select * from t1 where n1 = 140;
select * from t1 where n1 = 180;
select * from t1 where n1 between 140 and 180;

And now the execution plans – of which we’re only really interested in the cardinality estimates:


===========
d1 in range
===========
--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      |    16 |   176 |     2   (0)| 00:00:01 |
|*  1 |  TABLE ACCESS FULL| T1   |    16 |   176 |     2   (0)| 00:00:01 |
--------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter("D1"=TO_DATE(' 2013-01-24 00:00:00', 'syyyy-mm-dd
              hh24:mi:ss'))

--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      |   678 |  7458 |     2   (0)| 00:00:01 |
|*  1 |  TABLE ACCESS FULL| T1   |   678 |  7458 |     2   (0)| 00:00:01 |
--------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter("D1"<=TO_DATE(' 2013-03-05 00:00:00', 'syyyy-mm-dd
              hh24:mi:ss') AND "D1">=TO_DATE(' 2013-01-24 00:00:00', 'syyyy-mm-dd
              hh24:mi:ss'))

===========
n1 in range
===========
--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      |    16 |   176 |     2   (0)| 00:00:01 |
|*  1 |  TABLE ACCESS FULL| T1   |    16 |   176 |     2   (0)| 00:00:01 |
--------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter("N1"=40)

--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      |   678 |  7458 |     2   (0)| 00:00:01 |
|*  1 |  TABLE ACCESS FULL| T1   |   678 |  7458 |     2   (0)| 00:00:01 |
--------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter("N1">=40 AND "N1"<=80)

Note that when “in-range”: equality gives a cardinality of 16 (as expected); while a range of 40 (from a total range of 99), gives a cardinality of 678 : (1600 * 40 / 99)


================
d1 out of range
================
--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      |    10 |   110 |     2   (0)| 00:00:01 |
|*  1 |  TABLE ACCESS FULL| T1   |    10 |   110 |     2   (0)| 00:00:01 |
--------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter("D1"=TO_DATE(' 2013-05-24 00:00:00', 'syyyy-mm-dd
              hh24:mi:ss'))

--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      |     3 |    33 |     2   (0)| 00:00:01 |
|*  1 |  TABLE ACCESS FULL| T1   |     3 |    33 |     2   (0)| 00:00:01 |
--------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter("D1"=TO_DATE(' 2013-07-03 00:00:00', 'syyyy-mm-dd
              hh24:mi:ss'))

--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      |    16 |   176 |     2   (0)| 00:00:01 |
|*  1 |  TABLE ACCESS FULL| T1   |    16 |   176 |     2   (0)| 00:00:01 |
--------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter("D1">=TO_DATE(' 2013-05-24 00:00:00', 'syyyy-mm-dd
              hh24:mi:ss') AND "D1"<=TO_DATE(' 2013-07-03 00:00:00', 'syyyy-mm-dd
              hh24:mi:ss'))

================
n1 out of range
================
--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      |    10 |   110 |     2   (0)| 00:00:01 |
|*  1 |  TABLE ACCESS FULL| T1   |    10 |   110 |     2   (0)| 00:00:01 |
--------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter("N1"=140)

--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      |     3 |    33 |     2   (0)| 00:00:01 |
|*  1 |  TABLE ACCESS FULL| T1   |     3 |    33 |     2   (0)| 00:00:01 |
--------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter("N1"=180)

--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      |    16 |   176 |     2   (0)| 00:00:01 |
|*  1 |  TABLE ACCESS FULL| T1   |    16 |   176 |     2   (0)| 00:00:01 |
--------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter("N1">=140 AND "N1"<=180)

In this second set of results we see that for equality: when we are 40 out of range, the cardinality is the base cardinality * (total range – distance outside ) / (total range). For the value 140 this gives: 16 * (99 – 40)/ 99 = 9.53, for the value 180 this gives: 16 * (99 – 80) / 99 = 3.07.

For the out-of-range range-based predicate, the cardinality is simply the base cardinality of an in-range “column = unknown constant”, viz: 16.

You might then want to work out (or simply test) the cardinalities for: n1 = 110, n1 in (110, 140, 180), and n1 between 110 and 180. It should give you an interesting insight into one of the problems of statistics going stale on columns that are sequence or time-based.

2 Comments »

  1. […] #DATABASE #ORACLE #CBO by Jonathan Lewis : Linear Decay […]

    Pingback by #DATABASE #ORACLE #TUNING by Jonathan Lewis : Linear Decay | Database Scene — July 26, 2013 @ 12:27 pm BST Jul 26,2013 | Reply

  2. […] my nvl() predicate was using a value that I knew couldn’t be in the table and was so far outside the known range of values that its impact on the optimizer;s cardinality calculation was negligible. For a value […]

    Pingback by NVL() | Oracle Scratchpad — August 13, 2013 @ 7:14 am BST Aug 13,2013 | Reply


RSS feed for comments on this post. TrackBack URI

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

The Rubric Theme. Blog at WordPress.com.

Follow

Get every new post delivered to your Inbox.

Join 4,014 other followers