Linear Decay

July 24, 2013

Linear Decay

Filed under: CBO,Oracle,Statistics — Jonathan Lewis @ 6:11 pm BST Jul 24,2013

I’ve mentioned “linear decay” in several posts when explaining a problem that someone has seen with an execution path – but I’ve recently realised that I don’t have a post describing what it is and how it works – although it’s in Cost Based Oracle – Fundamentals, of course, if you want some detail – so here’s a brief introduction (based on simple stats with no histograms).

When you have a predicate that involves values outside the known high or low values, Oracle bases its estimate for equality predicates on “column = unknown value” and then scales this down by comparing the known range of values for that column with the distance outside the range that your supplied value falls. So, for example, if your low/high values for a column are 1 and 100 respectively with 100 distinct value and Oracle thinks that there are 10 rows per distinct value the cardinality estimate for the predicate “column = 180” will be calculated (approximately) by the following algorithm:

Known range 99 : (high – low) — but I’ll call it 100 to keep the arithmetic simple
Cardinality estimate for values in range 10 (as stated above)
Distance outside range 80 : (180 – 100)
Percentage distance outside range 80%, so scale cardinality down by 80%
Cardinality estimate = 2

There are many other factors to handling values outside the range – one of the most awkward is that the estimate of cardinality for an out-of-range inequality predicate (e.g. “column > 101”) is defined to be the same as “column = unknown value in range” and that leads to some counter-intuitive results.

Just for reference and play, here’s a little script to create some data (100 distinct values, dates and integers) 16 rows per distinct value, with some queries demonstrating the point. First we generate the data:

rem
rem     Script:         linear_decay.sql
rem     Author:         Jonathan Lewis
rem     Dated:          July 2013
rem

create table t1 as
select
        trunc(sysdate) + rownum d1,
        rownum                  n1
from    all_objects
where   rownum <= 100 --> comment to avoid WordPress format issue
;

insert into t1 select * from t1;
insert into t1 select * from t1;
insert into t1 select * from t1;
insert into t1 select * from t1;

commit;

execute dbms_stats.gather_table_stats(user,'t1');

column d_80 new_value m_80
column d_40 new_value m_40
column d40  new_value m40
column d80  new_value m80

select
        min(d1), max(d1), max(d1) - min(d1) + 1 d_range,
        min(n1), max(n1), max(n1) - min(n1) + 1 n_range,
        to_char(max(d1) - 80,'dd-mon-yyyy')     d_80,
        to_char(max(d1) - 40,'dd-mon-yyyy')     d_40,
        to_char(max(d1) + 40,'dd-mon-yyyy')     d40,
        to_char(max(d1) + 80,'dd-mon-yyyy')     d80
from
        t1
;

Then we run some sample queries – note that the numeric ranges and the date ranges match in size and distance above the known high value.

prompt	===========
prompt	d1 in range
prompt	===========

select * from t1 where d1 = to_date('&m_80');
select * from t1 where d1 between to_date('&m_80') and to_date('&m_40');

prompt	===========
prompt	n1 in range
prompt	===========

select * from t1 where n1 = 40;
select * from t1 where n1 between 40 and 80;

prompt	================
prompt	d1 out of range
prompt	================

select * from t1 where d1 = to_date('&m40');
select * from t1 where d1 = to_date('&m80');
select * from t1 where d1 between to_date('&m40') and to_date('&m80');

prompt	================
prompt	n1 out of range
prompt	================

select * from t1 where n1 = 140;
select * from t1 where n1 = 180;
select * from t1 where n1 between 140 and 180;

And now the execution plans – of which we’re only really interested in the cardinality estimates:


===========
d1 in range
===========
--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      |    16 |   176 |     2   (0)| 00:00:01 |
|*  1 |  TABLE ACCESS FULL| T1   |    16 |   176 |     2   (0)| 00:00:01 |
--------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter("D1"=TO_DATE(' 2013-01-24 00:00:00', 'syyyy-mm-dd
              hh24:mi:ss'))

--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      |   678 |  7458 |     2   (0)| 00:00:01 |
|*  1 |  TABLE ACCESS FULL| T1   |   678 |  7458 |     2   (0)| 00:00:01 |
--------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter("D1"<=TO_DATE(' 2013-03-05 00:00:00', 'syyyy-mm-dd hh24:mi:ss') AND "D1">=TO_DATE(' 2013-01-24 00:00:00', 'syyyy-mm-dd
              hh24:mi:ss'))

===========
n1 in range
===========
--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      |    16 |   176 |     2   (0)| 00:00:01 |
|*  1 |  TABLE ACCESS FULL| T1   |    16 |   176 |     2   (0)| 00:00:01 |
--------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter("N1"=40)

--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      |   678 |  7458 |     2   (0)| 00:00:01 |
|*  1 |  TABLE ACCESS FULL| T1   |   678 |  7458 |     2   (0)| 00:00:01 |
--------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter("N1">=40 AND "N1"<=80)

Note that when “in-range”: equality gives a cardinality of 16 (as expected); while a range of 40 (from a total range of 99), gives a cardinality of 678 : (1600 * 40 / 99)


================
d1 out of range
================
--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      |    10 |   110 |     2   (0)| 00:00:01 |
|*  1 |  TABLE ACCESS FULL| T1   |    10 |   110 |     2   (0)| 00:00:01 |
--------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter("D1"=TO_DATE(' 2013-05-24 00:00:00', 'syyyy-mm-dd
              hh24:mi:ss'))

--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      |     3 |    33 |     2   (0)| 00:00:01 |
|*  1 |  TABLE ACCESS FULL| T1   |     3 |    33 |     2   (0)| 00:00:01 |
--------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter("D1"=TO_DATE(' 2013-07-03 00:00:00', 'syyyy-mm-dd
              hh24:mi:ss'))

--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      |    16 |   176 |     2   (0)| 00:00:01 |
|*  1 |  TABLE ACCESS FULL| T1   |    16 |   176 |     2   (0)| 00:00:01 |
--------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter("D1">=TO_DATE(' 2013-05-24 00:00:00', 'syyyy-mm-dd
              hh24:mi:ss') AND "D1"<=TO_DATE(' 2013-07-03 00:00:00',   --> edit to avoid WordPress format issue 
              'syyyy-mm-dd hh24:mi:ss'))

================
n1 out of range
================
--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      |    10 |   110 |     2   (0)| 00:00:01 |
|*  1 |  TABLE ACCESS FULL| T1   |    10 |   110 |     2   (0)| 00:00:01 |
--------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter("N1"=140)

--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      |     3 |    33 |     2   (0)| 00:00:01 |
|*  1 |  TABLE ACCESS FULL| T1   |     3 |    33 |     2   (0)| 00:00:01 |
--------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter("N1"=180)

--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      |    16 |   176 |     2   (0)| 00:00:01 |
|*  1 |  TABLE ACCESS FULL| T1   |    16 |   176 |     2   (0)| 00:00:01 |
--------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter("N1">=140 AND "N1"<=180)   --> edit to avoid WordPress format issue

In this second set of results we see that for equality: when we are 40 out of range, the cardinality is the base cardinality * (total range – distance outside ) / (total range). For the value 140 this gives: 16 * (99 – 40)/ 99 = 9.53, for the value 180 this gives: 16 * (99 – 80) / 99 = 3.07.

For the out-of-range range-based predicate, the cardinality is simply the base cardinality of an in-range “column = unknown constant”, viz: 16.

You might then want to work out (or simply test) the cardinalities for: n1 = 110, n1 in (110, 140, 180), and n1 between 110 and 180. It should give you an interesting insight into one of the problems of statistics going stale on columns that are sequence or time-based.

Comments (5)

5 Comments »

[…] #DATABASE #ORACLE #CBO by Jonathan Lewis : Linear Decay […]

Pingback by #DATABASE #ORACLE #TUNING by Jonathan Lewis : Linear Decay | Database Scene — July 26, 2013 @ 12:27 pm BST Jul 26,2013 | Reply
[…] my nvl() predicate was using a value that I knew couldn’t be in the table and was so far outside the known range of values that its impact on the optimizer;s cardinality calculation was negligible. For a value […]

Pingback by NVL() | Oracle Scratchpad — August 13, 2013 @ 7:14 am BST Aug 13,2013 | Reply
[…] IN-list is NOT consistent with the result for the original mod_200 >= 300. The optimizer uses a “linear decay” strategy for handling predicates that go out of range, but not in a consistent way. It seems that, […]

Pingback by Quantum Data | Oracle Scratchpad — November 20, 2014 @ 11:30 am GMT Nov 20,2014 | Reply
[…] that is above the known high value or below the known low value for a column, the optimizer uses a “linear decay” method for cardinality calculations that is based on how far out of the range the value is (as a […]

Pingback by Upgrades | Oracle Scratchpad — December 10, 2015 @ 8:42 am GMT Dec 10,2015 | Reply
[…] this case, applying the normal “linear decay” strategy to the extended stats seems much more sensible – if one (or both) of the predicates […]

Pingback by Extended Stats | Oracle Scratchpad — August 24, 2020 @ 3:05 pm BST Aug 24,2020 | Reply

RSS feed for comments on this post. TrackBack URI

Comments and related questions are welcome. Cancel reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Oracle Scratchpad

July 24, 2013

Linear Decay

5 Comments »

Comments and related questions are welcome. Cancel reply

Click on the motto below

Everything Changes Eventually

Search this blog

Categories

Special Links

Recent Posts

Recent Comments

Popular articles

Popular References

Archives

Blogroll

Posts by RSS

Oracle Scratchpad

July 24, 2013

Linear Decay

Share this:

Related

5 Comments »

Comments and related questions are welcome. Cancel reply

Click on the motto below

Everything Changes Eventually

Search this blog

Categories

Special Links

Recent Posts

Recent Comments

Popular articles

Popular References

Archives

Blogroll

Posts by RSS