Oracle Scratchpad

October 22, 2018

Column Groups

Filed under: Bugs,CBO,Column groups,Indexing,Oracle,Statistics — Jonathan Lewis @ 5:36 pm BST Oct 22,2018

Sometimes a good thing becomes at bad thing when you hit some sort of special case – today’s post is an example of this that came up on the Oracle-L listserver a couple of years ago with a question about what the optimizer was doing. I’ll set the scene by creating some data to reproduce the problem:

rem     Script:         distinct_key_prob.sql
rem     Author:         Jonathan Lewis
rem     Dated:          Apr 2016
rem     Purpose:
rem     Last tested
rem     (Live SQL, with some edits)

drop table t1 purge;

create table t1
with generator as (
        select  --+ materialize
                rownum id
        from dual 
        connect by 
                level <= 1e4 -- > commment to avoid wordpress format issue
        cast(mod(rownum-1,10) as number(8,0))   non_null,
        cast(null as number(8,0))               null_col,
        cast(lpad(rownum,10) as varchar2(10))   small_vc,
        cast(rpad('x',100) as varchar2(100))    padding
        generator       v1,
        generator       v2
        rownum <= 1e6 -- > commment to avoid wordpress format issue

create index t1_i1 on t1(null_col, non_null);


                dbms_stats.create_extended_stats(user,'t1','(non_null, null_col)')

                ownname          => user,
                tabname          =>'T1',
                method_opt       => 'for all columns size 1'

So I have a table with 1,000,000 rows; one of its columns is always null and another has a very small number of distinct values and is never null (though it hasn’t been declared as not null). I’ve created an index that starts with the “always null” column (in a production system we’d really be looking at a column that was “almost always” null and have a few special rows where the column was not null, so an index like this can make sense).

I’ve also got a few lines, commented out, to create extended stats on the column group (non_null, null_col) because any anomaly relating to the handling of the number of distinct keys in a multi-column index may also be relevant to column groups. I can run two variations of this code, one with the index, one without the index but with the column group, and see the same cardinality issue appearing in both cases.

So let’s execute a couple of queries – after setting up a couple of bind variables – and pull their execution plans from memory:

variable b_null    number
variable b_nonnull number

exec :b_null    := 5
exec :b_nonnull := 5

set serveroutput off

prompt  ===================
prompt  Query null_col only
prompt  ===================

select  count(small_vc)
from    t1
        null_col = :b_null

select * from table(dbms_xplan.display_cursor(null,null,'-plan_hash'));

prompt  =========================
prompt  Query (null_col,non_null)
prompt  =========================

select  count(small_vc)
from    t1
        null_col = :b_null
and     non_null = :b_nonnull

select * from table(dbms_xplan.display_cursor(null,null,'-plan_hash'));

The optimizer has statistics that tell it that null_col is always null so its estimate of rows where null_col = 5 should be zero (which will be rounded up to 1); and we have an index starting with null_col so we might expect the optimizer to use an index range scan on that index for these queries. Here are the plans that actually appeared:

SQL_ID  danj9r6rq3c7g, child number 0
select count(small_vc) from t1 where  null_col = :b_null

| Id  | Operation                    | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
|   0 | SELECT STATEMENT             |       |       |       |     2 (100)|          |
|   1 |  SORT AGGREGATE              |       |     1 |    24 |            |          |
|   2 |   TABLE ACCESS BY INDEX ROWID| T1    |     1 |    24 |     2   (0)| 00:00:01 |
|*  3 |    INDEX RANGE SCAN          | T1_I1 |     1 |       |     2   (0)| 00:00:01 |

Predicate Information (identified by operation id):
   3 - access("NULL_COL"=:B_NULL)

SQL_ID  d8kbtq594bsp0, child number 0
select count(small_vc) from t1 where  null_col = :b_null and non_null =

| Id  | Operation          | Name | Rows  | Bytes | Cost (%CPU)| Time     |
|   0 | SELECT STATEMENT   |      |       |       |  2189 (100)|          |
|   1 |  SORT AGGREGATE    |      |     1 |    27 |            |          |
|*  2 |   TABLE ACCESS FULL| T1   |   100K|  2636K|  2189   (4)| 00:00:11 |

Predicate Information (identified by operation id):
   2 - filter(("NULL_COL"=:B_NULL AND "NON_NULL"=:B_NONNULL))

Take a careful look at what we’ve got: the second query has to access exactly the same table rows as those identified by the first query and then apply a second predicate which may discard some of those rows – but the optimizer has changed the access path from a low-cost index-driven access to a high cost tablescan. This is clearly idiotic – there has to be a flaw in the optimizer logic in this situation.

The defect revolves around a slight inconsistency in the handling of columns groups – whether they are explicitly created, or simply inferred by reference to user_indexes.distinct_keys. The anomaly is most easily seen by explicitly creating the column group, gathering stats, and reporting from user_tab_cols.

        column_name, sample_size, num_distinct, num_nulls, density, histogram, data_default
        table_name = upper('T1')
order by


OLUMN_NAME                            Sample     Distinct  NUM_NULLS    DENSITY HISTOGRAM       DATA_DEFAULT
-------------------------------- ------------ ------------ ---------- ---------- --------------- --------------------------------------------
NON_NULL                            1,000,000           10          0         .1 NONE
NULL_COL                                                 0    1000000          0 NONE
SMALL_VC                            1,000,000      995,008          0 .000001005 NONE
PADDING                             1,000,000            1          0          1 NONE
SYS_STULC#01EE$DE1QB7UY1K4$PBI      1,000,000           10          0         .1 NONE            SYS_OP_COMBINED_HASH("NON_NULL","NULL_COL")

As you can see, the optimizer can note that “null_col” is always null so the arithmetic for “null_col = :bind1” is going to produce a very small cardinality estimate; on the other hand when the optimizer sees “null_col = :bind1 and non_null = :bind2” it’s going to transform this into the single predicate “SYS_STULC#01EE$DE1QB7UY1K4$PBI = sys_op_combined_hash(null_col, non_null)”, and the statistics say there are 10 distinct values for this (virtual) column with no nulls – hence the huge cardinality estimate and full tablescan.

The “slight inconsistency” in handling that I mentioned above is that if you used a predicate like “null_col is null and non_null = :bind2″ the optimizer would not use column group because of the “is null” condition – even though it’s exactly the case where the column group statistics would be appropriate. (In the example I’ve constructed the optimizer’s estimate from ignoring the column group would actually be correct – and identical to the estimate it would get from using the column group – because the column is null for every single row.)


Column groups can give you some very bad estimates, and counter-intuitive behaviour, if any of the columns in the group has a significant percentage of nulls; this happens because the column group makes the optimizer lose sight of the number of nulls in the underlying data set.



  1. […] The optimizer’s inconsistency: Column groups lose information about the frequency of nulls in the underlying data. […]

    Pingback by Column Group Catalog | Oracle Scratchpad — October 22, 2018 @ 5:40 pm BST Oct 22,2018 | Reply

  2. […] on its own. (The same problem appears with column groups – which is what I commented on in my previous post on this […]

    Pingback by Extreme Nulls | Oracle Scratchpad — December 14, 2018 @ 7:01 pm GMT Dec 14,2018 | Reply

  3. Hi Jonathon – is there a way this fix this Cardinality Estimation for this query from your test case select count(small_vc) from t1 where null_col = :b_null and non_null = :b_nonnull ?

    Comment by Bhavani P Dhulipalla — August 9, 2019 @ 3:27 am BST Aug 9,2019 | Reply

  4. […] column group sample size matches the table sample size because column group hash values are never null, even if an underlying column is […]

    Pingback by Strange Estimates. | Oracle Scratchpad — October 30, 2019 @ 1:10 pm GMT Oct 30,2019 | Reply

  5. […] colX is null is not an equality predicate, and column group stats will not apply but there can be unexpected side effects even for cases where you don’t use this “is null” […]

    Pingback by Optimizer Tip | Oracle Scratchpad — September 20, 2021 @ 9:04 am BST Sep 20,2021 | Reply

RSS feed for comments on this post. TrackBack URI

Comments and related questions are welcome.

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Website Powered by

%d bloggers like this: