In the absence of a virtual column or function-based index, the optimizer uses a basic selectivity guess of 1% for a predicate of the form: *“function(column) = constant”*; but there is (at least) one special case where it gets clever; simple type conversion:

create table t1 nologging as select cast( case when mod(rownum,1000) = 0 then 0 when mod(rownum,100) = 0 then 1 else 9 end as varchar2(1) ) v1 from all_objects where rownum <= 50000 ; execute dbms_stats.gather_table_stats(user,'t1') set autotrace on explain select count(*) from t1 where v1 = 9; select count(*) from t1 where sign(v1) = 1; set autotrace off

If you think about the table creation script you’ll agree that there are 49,500 rows where * v1 = ‘9’* so the first query could (in theory) produce an estimated cardinality of 49,500. However I’ve got a datatype error in the predicate and I haven’t created a histogram – and that’s not very helpful in two different ways. In general Oracle will use a guessed selectivity of 1% after applying a function to a column with equality, which would make it report an estimated cardinality of 500 for my sample query, but in this case Oracle uses the number of distinct values for the column (i.e. 3) to infer a value for the number of distinct values for the funciton and uses that in the first query:

--------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | --------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | 2 | 25 (4)| 00:00:01 | | 1 | SORT AGGREGATE | | 1 | 2 | | | |* 2 | TABLE ACCESS FULL| T1 | 16667 | 33334 | 25 (4)| 00:00:01 | --------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 2 - filter(TO_NUMBER("V1")=9)

On the other hand, while the optimizer “knows” that the number of distinct values for the varchar2 will match the number of distinct numerical equivalents (not that that’s actually true), it has no idea how many of the varchar2 values will equate to negative, zero, or positive values, so the 1% selectivity re-appears for the second query:

--------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | --------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | 2 | 25 (4)| 00:00:01 | | 1 | SORT AGGREGATE | | 1 | 2 | | | |* 2 | TABLE ACCESS FULL| T1 | 500 | 1000 | 25 (4)| 00:00:01 | --------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 2 - filter(SIGN(TO_NUMBER("V1"))=1)

It shouldn’t surprise you to see that you would also get 500 as the estimated cardinality if the predicate were to read *“sign(v1) = 2”* — a value that the * sign()* function can’t take. The optimizer is using a generic rule, it doesn’t know the specifics of the function you’re using.

### Footnote:

If you’re wondering when the number of distinct character values doesn’t match the number of distinct numeric values (and assuming all the character values are valid for conversion to numeric) just remember that the same number can be represented in different ways, for example you might change the original * cast()* that I used the generate the data to:

cast( case when mod(rownum,1000) = 0 then '0' when mod(rownum, 100) = 0 then '1' when mod(rownum, 2) = 0 then '9' else '09' end as varchar2(2) ) v1

Now we have 4 distinct character values (so the optimizer’s estimate would drop to 15,000) but only 3 distinct numeric equivalents.

This, by the way, is why the optimizer transforms a predicate like *“character_column = {numeric value}”* into *“to_number(character_column) = {numeric value}”*, rather than converting it to *“character_column = to_char({numeric value})”*. A character string can only represent one numeric value while a numeric value can be displayed as an infinite number of different character strings (assuming the availability of the appropriate number of typing monkeys).