Oracle Scratchpad

June 17, 2016

Cardinality trick

Filed under: CBO,Oracle — Jonathan Lewis @ 1:02 pm GMT Jun 17,2016

In the absence of a virtual column or function-based index, the optimizer uses a basic selectivity guess of 1% for a predicate of the form: “function(column) = constant”; but there is (at least) one special case where it gets clever; simple type conversion:


create table t1 nologging
as
select  cast(
                case
                        when mod(rownum,1000) = 0 then 0
                        when mod(rownum,100)  = 0 then 1
                                                  else 9
                end as varchar2(1)
        ) v1
from
        all_objects
where   rownum <= 50000
;

execute dbms_stats.gather_table_stats(user,'t1')

set autotrace on explain

select count(*) from t1 where v1 = 9;
select count(*) from t1 where sign(v1) = 1;

set autotrace off

If you think about the table creation script you’ll agree that there are 49,500 rows where v1 = ‘9’ so the first query could (in theory) produce an estimated cardinality of 49,500. However I’ve got a datatype error in the predicate and I haven’t created a histogram – and that’s not very helpful in two different ways. In general Oracle will use a guessed selectivity of 1% after applying a function to a column with equality, which would make it report an estimated cardinality of 500 for my sample query, but in this case Oracle uses the number of distinct values for the column (i.e. 3) to infer a value for the number of distinct values for the funciton and uses that in the first query:


---------------------------------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |     1 |     2 |    25   (4)| 00:00:01 |
|   1 |  SORT AGGREGATE    |      |     1 |     2 |            |          |
|*  2 |   TABLE ACCESS FULL| T1   | 16667 | 33334 |    25   (4)| 00:00:01 |
---------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - filter(TO_NUMBER("V1")=9)

On the other hand, while the optimizer “knows” that the number of distinct values for the varchar2 will match the number of distinct numerical equivalents (not that that’s actually true), it has no idea how many of the varchar2 values will equate to negative, zero, or positive values, so the 1% selectivity re-appears for the second query:


---------------------------------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |     1 |     2 |    25   (4)| 00:00:01 |
|   1 |  SORT AGGREGATE    |      |     1 |     2 |            |          |
|*  2 |   TABLE ACCESS FULL| T1   |   500 |  1000 |    25   (4)| 00:00:01 |
---------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - filter(SIGN(TO_NUMBER("V1"))=1)

It shouldn’t surprise you to see that you would also get 500 as the estimated cardinality if the predicate were to read “sign(v1) = 2” — a value that the sign() function can’t take. The optimizer is using a generic rule, it doesn’t know the specifics of the function you’re using.

Footnote:

If you’re wondering when the number of distinct character values doesn’t match the number of distinct numeric values (and assuming all the character values are valid for conversion to numeric) just remember that the same number can be represented in different ways, for example you might change the original cast() that I used the generate the data to:

        cast(
                case
                        when mod(rownum,1000) = 0 then '0'
                        when mod(rownum, 100) = 0 then '1'
                        when mod(rownum,   2) = 0 then '9'
                                                  else '09'
                end as varchar2(2)
        ) v1

Now we have 4 distinct character values (so the optimizer’s estimate would drop to 15,000) but only 3 distinct numeric equivalents.

This, by the way, is why the optimizer transforms a predicate like “character_column = {numeric value}” into “to_number(character_column) = {numeric value}”, rather than converting it to “character_column = to_char({numeric value})”. A character string can only represent one numeric value while a numeric value can be displayed as an infinite number of different character strings (assuming the availability of the appropriate number of typing monkeys).

 

1 Comment »

  1. Nice :-)
    I have been looking for such special cases ever since I discovered (by accident, of course) that the optimizer knows that the cardinality of “column+0=constant” is the same as the cardinality of “column=constant” (http://twitter.com/DBoriented/status/645891276726890496).
    But the optimizer sticks with the 1% guess for similar cases, such as “column-0=constant”, “column*1=constant”, or “column+:bind_var=constant” (where :bind_var=0 in the first hard parse with bind peeking enabled).
    Jonathan, do you know of other such special cases?
    Thanks,
    Oren.

    Comment by Oren Nakdimon (@DBoriented) — June 17, 2016 @ 3:48 pm GMT Jun 17,2016 | Reply


RSS feed for comments on this post. TrackBack URI

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Blog at WordPress.com.