Oracle Scratchpad

November 6, 2023

Swap_Join_Inputs

Filed under: Column groups,Hints,Joins,Oracle,Statistics — Jonathan Lewis @ 2:47 pm GMT Nov 6,2023

This is just a short note (I hope) prompted by a conversation on the Oracle-L list server. A query from a 3rd party application was doing a very large full tablescan with hash join when it should have been doing a high precision index driven nested loop join, and the poor choice of plan was due an optimizer defect when handling column groups (fixed in 23c) when one or more of the columns involved is always, or nearly always, null.

As a follow-up the owner of the problem asked what hints should go into an SQL Patch to make the optimizer use the nested loop. There’s a simple guideline that usually works for this type of “wrong join” problem: report the “Outline Data” from the current execution plan; find the relevant join hint(s) (in this case a use_hash() hint and a full() hint), change those join hint(s) (e.g. use_nl(), index()), and write the entire edited outline data into an SQL Patch watching out for a recently fixed defect in the SQL patch code.

There are, however, various refinements that add complexity to this strategy, as you can appreciate from a note I wrote some years ago about how to hint a hash join properly. This note is an example of handling one such refinement.

The query was a little complex, and the optimizer had unnested a subquery that consisted of a union all of 4 branches, and one of those branches had contributed a very large cardinality estimate to the total for the view, so the optimizer had chosen a hash join between the unnested subquery and a large table. Stripped to a bare minimum that part of the plan looked like this:

HASH JOIN
      VIEW                   VW_NSO_1
      TABLE ACCESS FULL      PO_LINE_LOCATIONS_ALL        

A quick scan of the Outline Data found the join hint (converted to lower case, with quotes removed): use_hash(@sel$ac90cd92 vw_nso_1@sel$ac90cd92), and an over-hasty response might be to convert the use_hash to a use_nl and leave it at that – except for three possible warnings:

  1. we wanted to see a nested loop into po_line_locations_all, so the suggested use_nl() hint would be looping into the wrong “table”
  2. the plan shows the view vw_nso_1 as the build table, while the reported hint is about vw_nso_1 being the second table in the join order
  3. there are further references to vw_nso_1 (and to po_line_locations_all) in the outline data

Here’s the complete set of original hints that might have been relevant to this particular part of the plan:

full(@sel$ac90cd92 po_line_locations_all@sel$2)
no_access(@sel$ac90cd92 vw_nso_1@sel$ac90cd92)
leading(@sel$ac90cd92 po_line_locations_all@sel$2 vw_nso_1@sel$ac90cd92)
use_hash(@sel$ac90cd92 vw_nso_1@sel$ac90cd92)
swap_join_inputs(@sel$ac90cd92 vw_nso_1@sel$ac90cd92)

index_rs_asc(@sel$b584fdd1 po_line_locations_all@sel$16 (......))
batch_table_access_by_rowid(@sel$b584fdd1 po_line_locations_all@sel$16)

index_rs_asc(@sel$5ed1c707 po_line_locations_all@sel$13 (......))
batch_table_access_by_rowid(@sel$5ed1c707 po_line_locations_all@sel$13)

index_rs_asc(@sel$2f35337b po_line_locations_all@sel$10 (......))
batch_table_access_by_rowid(@sel$2f35337b po_line_locations_all@sel$10)

index_rs_asc(@sel$1b7d9ae9 po_line_locations_all@sel$7 (......))

This is where knowing about the “Query Block / Object Alias” section the execution plans is important – I’ve split the list into several parts based on the query block (@sel$xxxxxxxxxx) they are aimed at, and it’s only the first 5 we need to worry about.

Conveniently this part of the plan is a self-contained query block (@sel$ac90cd92) and we can see why we have an apparent contradiction between vw_nso_1 being the second table in the join order while being the build table: it’s second because of the leading() hint which dictates the join order, but it becomes the build table, hence appearing to be the first table in the join order, because of the swap_join_inputs() hint.

What we want is a join order where vw_nso_1 really is the first table in the join order, followed by a nested loop join into po_line_locations_all, using an index (not the full tablescan that the current hints dictate). It would probably be a good idea to get rid of the redundant no_swap_join_inputs() hints at the same time because that hint applies only to hash joins. So I think we need to replace the 5 hints above with the following 4 hints:

no_access(@sel$ac90cd92 vw_nso_1@sel$ac90cd92)
leading(@sel$ac90cd92 vw_nso_1@sel$ac90cd92 po_line_locations_all@sel$2)
use_nl(@sel$ac90cd92 po_line_locations_all@sel$2)
index(@sel$ac90cd92 po_line_locations_all@sel$2(line_location_id))

The index hint references column line_location_id because that’s the access predicate used in original hash join and I’m assuming that there is an index that starts with that column. It’s always a bit hit and miss with hinting and it might have been sufficient (as a first attempt) to use the index hint without trying to reference a specific index, and there might be good reasons for adding more columns to the list, or simple naming the index rather than describing it.

It’s quite likely that if this change in the hints is sufficient the resulting Outline Data would look a little different anyway; in particular the index() hint that I’ve suggested might get expanded to index_rs_asc(), and there might be a batch_table_access_by_rowid() added. Basically you do test runs until you get the result you want and then use the resulting Outline Data for the patch (although, occasionally, you still find that the Outline Data doesn’t reproduce the plan that it came from).

Frivolous Footnote

There were 75 hints totalling 3,191 bytes in the original Outline Data. If the text gets too long and messy for you to cope with when you create the patch you can probably remove all the double quotes, all the table names from the fully qualified column names in indexes, all the outline() and outline_leaf() hints, all the opt_param() hints that reflect system level parameter changes and the begin/end_outline_data hints and the ignore_optim_embedded_hints hint. You could also change long index descriptions to index_names and, if you’re like me, change it all to lower case anyway because I hate reading in capitals – and if you do change it all to lower case you have to remove the quotes. When I did all this to the original outline data the result was 1,350 bytes for 30 hints.

October 21, 2020

Column Groups

Filed under: Column groups,extended stats,Oracle,Statistics — Jonathan Lewis @ 12:14 pm BST Oct 21,2020

Here’s an odd little detail about the statistics of column groups. At first glance it’s counter-intuitive but it’s actually an “obvious” (once you’ve thought about it for a bit) consequence of the approximate_ndv() algorithm for gathering stats.

I’ll present it as a question:

I have a table with two columns: flag and v1. Although the column are not declared as non-null neither holds any nulls. If there are 26 distinct values for flag, and 1,000,000 distinct values for v1, what’s the smallest number of distinct values I could see if I create the column group (flag, v1) ?

The question is a little ambiguous – there’s the number of distinct values that the column (group) holds and the number that a fresh gather of statistics reports it as holding. Here are the stats from a test run of a simple script that creates, populates and gathers stats on my table:

select  column_name, num_distinct 
from    user_tab_cols 
where   table_name = 'T1'
/

COLUMN_NAME                      NUM_DISTINCT
-------------------------------- ------------
FLAG                                       26
ID                                    1000000
V1                                     999040
SYS_STUQ#TO6BT1REX3P1BKO0ULVR9         989120

There are actually 1,000,000 distinct values for v1 (it’s a varchar2() representation of the id column), but the approximate_ndv() mechanism can have an error of (I believe) up to roughly 1.3%, so Oracle’s estimate here is a little bit off. (It’s interesting to note, though, that the same mechanism managed to produce exactly the right answer for the real numeric.)

The column group (represented by the internal column called SYS_STUQ#TO6BT1REX3P1BKO0ULVR9) must hold (at least) 1,000,000 distinct values – but the error in this case is a little larger than the error in v1, with the effect that the number of combinations appears to be less than the number of distinct values for v1!

There’s not much difference in this case between actual and estimate but the test demonstrates the potential for a significant difference between the estimate and the arithmetic that Oracle would do if the column group didn’t exist. Nominally the optimizer would assume there were 26 million distinct values (though in this case I had only created 1M rows in the table and the optimizer would use the number of rows as a sanity check of that 26M).

So, although the difference between actual and estimate is small, we have to ask the question – are there any cases where the optimizer will ignore the column group stats because of a sanity check that “proves” the estimate is “wrong” – after all it must be wrong if the num_distinct of the column group is less than the num_distinct of one of the components. Then again maybe there’s a sanity check that allows for small variations and ignores the column group only if the estimate is “wrong enough”.

I mention this only because an odd optimizer estimate has shown up recently on the Oracle-L mailing list, and the only significant difference I can see (at present) is that a bad plan appears for a partition when this column group anomaly shows up in the stats and a good plan appears when the column group anomaly isn’t present.

Footnote:

If you want to recreate the results above, here’s the model I’ve used (tested on 19.3.0.0 and 11.2.0.4):

rem
rem     Script:         column_group_stats_5.sql
rem     Author:         Jonathan Lewis
rem     Dated:          Oct 2020
rem
rem     Last tested 
rem             19.3.0.0
rem             11.2.0.4
rem 

execute dbms_random.seed(0)

create table t1
as
with generator as (
        select 
                rownum id
        from dual 
        connect by 
                level <= 1e4    -- > comment to avoid WordPress format issue
)
select
        chr(65 + mod(rownum,26))        flag,
        rownum                          id,
        lpad(rownum,10,'0')             v1
from
        generator       v1,
        generator       v2
where
        rownum <= 1e6   -- > comment to avoid WordPress format issue
order by
        dbms_random.value
/


select  column_name, num_distinct 
from    user_tab_cols 
where   table_name = 'T1'
/

begin
        dbms_stats.gather_table_stats(
                ownname     => null,
                tabname     => 'T1',
                method_opt  => 'for all columns size 1 for columns(v1, flag) size 1'
        );
end;
/
 
select  column_name, num_distinct 
from    user_tab_cols 
where   table_name = 'T1'
/

Footnote 2:

As an interesting little statistical quirk, if I defined the column group as (flag, v1) rather than (v1, flag) the estimate for the column group num_distinct was 1,000,000.

October 22, 2018

Column Groups

Filed under: Bugs,CBO,Column groups,Indexing,Oracle,Statistics — Jonathan Lewis @ 5:36 pm BST Oct 22,2018

Jump to 2023 update

Sometimes a good thing becomes at bad thing when you hit some sort of special case – today’s post is an example of this that came up on the Oracle-L listserver a couple of years ago with a question about what the optimizer was doing. I’ll set the scene by creating some data to reproduce the problem:

rem
rem     Script:         distinct_key_prob.sql
rem     Author:         Jonathan Lewis
rem     Dated:          Apr 2016
rem     Purpose:
rem
rem     Last tested
rem             23.3.0.0  (Part fixed)
rem             19.11.0.0
rem             18.3.0.0
rem             12.2.0.1
rem             12.1.0.2
rem             11.2.0.4
rem

drop table t1 purge;

create table t1
nologging
as
with generator as (
        select  --+ materialize
                rownum id
        from dual 
        connect by 
                level <= 1e4 -- > commment to avoid wordpress format issue
)
select
        cast(mod(rownum-1,10) as number(8,0))   non_null,
        cast(null as number(8,0))               null_col,
        cast(lpad(rownum,10) as varchar2(10))   small_vc,
        cast(rpad('x',100) as varchar2(100))    padding
from
        generator       v1,
        generator       v2
where
        rownum <= 1e6 -- > commment to avoid wordpress format issue
;

create index t1_i1 on t1(null_col, non_null);

begin

/*
        dbms_output.put_line(
                dbms_stats.create_extended_stats(user,'t1','(non_null, null_col)')
        );
*/

        dbms_stats.gather_table_stats(
                ownname          => user,
                tabname          =>'T1',
                method_opt       => 'for all columns size 1'
        );
end;
/

So I have a table with 1,000,000 rows; one of its columns is always null and another has a very small number of distinct values and is never null (though it hasn’t been declared as not null). I’ve created an index that starts with the “always null” column (in a production system we’d really be looking at a column that was “almost always” null and have a few special rows where the column was not null, so an index like this can make sense).

I’ve also got a few lines, commented out, to create extended stats on the column group (non_null, null_col) because any anomaly relating to the handling of the number of distinct keys in a multi-column index may also be relevant to column groups. I can run two variations of this code, one with the index, one without the index but with the column group, and see the same cardinality issue appearing in both cases.

So let’s execute a couple of queries – after setting up a couple of bind variables – and pull their execution plans from memory:


variable b_null    number
variable b_nonnull number

exec :b_null    := 5
exec :b_nonnull := 5

set serveroutput off

prompt  ===================
prompt  Query null_col only
prompt  ===================

select  count(small_vc)
from    t1
where
        null_col = :b_null
;

select * from table(dbms_xplan.display_cursor(null,null,'-plan_hash'));

prompt  =========================
prompt  Query (null_col,non_null)
prompt  =========================

select  count(small_vc)
from    t1
where
        null_col = :b_null
and     non_null = :b_nonnull
;

select * from table(dbms_xplan.display_cursor(null,null,'-plan_hash'));

The optimizer has statistics that tell it that null_col is always null so its estimate of rows where null_col = 5 should be zero (which will be rounded up to 1); and we have an index starting with null_col so we might expect the optimizer to use an index range scan on that index for these queries. Here are the plans that actually appeared:


SQL_ID  danj9r6rq3c7g, child number 0
-------------------------------------
select count(small_vc) from t1 where  null_col = :b_null

--------------------------------------------------------------------------------------
| Id  | Operation                    | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |       |       |       |     2 (100)|          |
|   1 |  SORT AGGREGATE              |       |     1 |    24 |            |          |
|   2 |   TABLE ACCESS BY INDEX ROWID| T1    |     1 |    24 |     2   (0)| 00:00:01 |
|*  3 |    INDEX RANGE SCAN          | T1_I1 |     1 |       |     2   (0)| 00:00:01 |
--------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   3 - access("NULL_COL"=:B_NULL)



SQL_ID  d8kbtq594bsp0, child number 0
-------------------------------------
select count(small_vc) from t1 where  null_col = :b_null and non_null =
:b_nonnull

---------------------------------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |       |       |  2189 (100)|          |
|   1 |  SORT AGGREGATE    |      |     1 |    27 |            |          |
|*  2 |   TABLE ACCESS FULL| T1   |   100K|  2636K|  2189   (4)| 00:00:11 |
---------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - filter(("NULL_COL"=:B_NULL AND "NON_NULL"=:B_NONNULL))

Take a careful look at what we’ve got: the second query has to access exactly the same table rows as those identified by the first query and then apply a second predicate which may discard some of those rows – but the optimizer has changed the access path from a low-cost index-driven access to a high cost tablescan. This is clearly idiotic – there has to be a flaw in the optimizer logic in this situation.

The defect revolves around a slight inconsistency in the handling of columns groups – whether they are explicitly created, or simply inferred by reference to user_indexes.distinct_keys. The anomaly is most easily seen by explicitly creating the column group, gathering stats, and reporting from user_tab_cols.


select
        column_name, sample_size, num_distinct, num_nulls, density, histogram, data_default
from
        user_tab_cols
where
        table_name = upper('T1')
order by
        column_id

;

OLUMN_NAME                            Sample     Distinct  NUM_NULLS    DENSITY HISTOGRAM       DATA_DEFAULT
-------------------------------- ------------ ------------ ---------- ---------- --------------- --------------------------------------------
NON_NULL                            1,000,000           10          0         .1 NONE
NULL_COL                                                 0    1000000          0 NONE
SMALL_VC                            1,000,000      995,008          0 .000001005 NONE
PADDING                             1,000,000            1          0          1 NONE
SYS_STULC#01EE$DE1QB7UY1K4$PBI      1,000,000           10          0         .1 NONE            SYS_OP_COMBINED_HASH("NON_NULL","NULL_COL")

As you can see, the optimizer can note that “null_col” is always null so the arithmetic for “null_col = :bind1” is going to produce a very small cardinality estimate; on the other hand when the optimizer sees “null_col = :bind1 and non_null = :bind2” it’s going to transform this into the single predicate “SYS_STULC#01EE$DE1QB7UY1K4$PBI = sys_op_combined_hash(null_col, non_null)”, and the statistics say there are 10 distinct values for this (virtual) column with no nulls – hence the huge cardinality estimate and full tablescan.

The “slight inconsistency” in handling that I mentioned above is that if you used a predicate like “null_col is null and non_null = :bind2″ the optimizer would not use column group because of the “is null” condition – even though it’s exactly the case where the column group statistics would be appropriate. (In the example I’ve constructed the optimizer’s estimate from ignoring the column group would actually be correct – and identical to the estimate it would get from using the column group – because the column is null for every single row.)

tl;dr

Column groups can give you some very bad estimates, and counter-intuitive behaviour, if any of the columns in the group has a significant percentage of nulls; this happens because the column group makes the optimizer lose sight of the number of nulls in the underlying data set.

Update Nov 2023:

Re-running the test on newer versions of Oracle I found that 19.11 [update: 19.20 – see comment #8] still gets the cardinality estimates wrong, but 23.3 produces the correct estimate. This change shows up in the 23c v$system_fix_control as (almost certainly) fix 27982637:

QKSFM_CARDINALITY_27982637         fix column group not null sel when one of the columns is all nul

The number doesn’t show up as a bug number of MOS, and there’s no indications of patches or backports, but you might as well try asking for a backport if you’re running into the problem.

Warning: note that the importance of the penultimate word “all” in that description. If you have just one row with a non-null value then the cardinality error still shows up (assuming the stats reflect the reality).

September 27, 2018

Column Group Catalogue

Filed under: CBO,Column groups,Indexing,Oracle,Statistics — Jonathan Lewis @ 5:16 pm BST Sep 27,2018

I seem to have written a number of aricles about column groups – the rather special, and most useful, variant on extended stats. To make it as easy as possible to find the right article I’ve decided to produce a little catalogue (catalog) of all the relevant articles, with a little note about the topic each article covers. Some of the articles will link to others in the list, and there are a few items in the list from other blogs. There are also a few items which are the titles of drafts which have been hanging around for the last few years.

  • A gap in the algorithm (Dec 2021): the optimizer doesn’t use the column group where it should for OR’ed predicates
  • Optimizer Tip (Sept 2021): reprint of a note for IOUG 2015 on column groups and table_cached_blocks
  • NDV oddity for groups (Oct 2020): A column group will have at least as many values as each of its components – but gathered stats don’t always report that.
  • A threat from nulls in indexes (Dec 2018): Sometimes you have to stop the optimizer using index.distinct_keys for cardinality estimates
  • An optimizer inconsistency (Oct 2018): Column groups lose information about the frequency of nulls in the underlying data.
  • Column group histograms (Aug 2018): Formal coding method
  • Column group histograms (Jul 2018): Hacking to solve a problem
  • Column groups (Mar 2018): A “cosmetic” change to a query makes increased use of column groups possible.
  • Index out of range (Mar 2017): Effects of “out of range” predicates on indexes and column groups
  • Extended Stats (Dec 2016): A dirty workaround to the limited number of column groups allowed.
  • Adaptive Mayhem (Aug 2016): The anguish of 12.1 and adaptive statistics.
  • Index Sanity (June 2016): A very old demonstration – don’t drop an index just because you’re not using it.
  • Column groups (Apr 2016): A bug
  • Automatic column groups (Dec 2015): A threat due to upgrading to 12.1
  • Upgrades (Dec 2015): A summary of a round-table CBO session that covered column groups and virtual columns
  • Column groups (Dec 2015): A problem with column groups and char() types.
  • Column groups (Nov 2015): A predicate “column is null” will disable the use of any related column groups.
  • Extendend Stats (May 2014): Under the covers with column groups.
  • Extended Stats (Sep 2013): How a typing error could introduce extra column groups
  • Extended Stats (April 2012): The stats on a column group will not be used if a predicate on an underlying column queries values outside the low/high range
  • Index Upgrades (Mar 2012): the optimizer can use distinct_keys from indexes in the same way it uses column group stats
  •  
  • Correlation Strength (draft):
  • Column groups – a bug (draft):
  • Case study: Dropping indexes (draft):
  • col_usage$ (draft)
  •  

Other Authors:

March 8, 2018

Column Groups

Filed under: CBO,Column groups,Oracle,Statistics — Jonathan Lewis @ 6:54 am GMT Mar 8,2018

There’s a question about column groups on the ODC database forum that throws up an important collateral issue. The OP is looking at a query like the following and asking which column groups might help the optimizer get the best plan:

select
        a.*, b.*, c.*
from
        a, b, c
where
        a.id   = b.id
and     a.id1  = b.id1
and     a.id   = c.id
and     b.id2  = c.id2
and     a.id4  = 66
and     b.id7  = 44
and     c.id88 = 88
;

Although this query has fairly obviously being engineered to conceal any meaningful table names and column names I’m going to start by being a bit boring about presentation and meaning and do a cosmetic edit of the query because if I had a from clause reading “a, b, c” it would be because I thought the optimizer should identify that as the best join order, in which case I would also have written the predicate section to display the order in which the predicates would be used:

select
        a.*, b.*, c.*
from
        a, b, c
where
        a.id4  = 66
--
and     b.id   = a.id
and     b.id1  = a.id1
and     b.id7  = 44
--
and     c.id   = a.id
and     c.id2  = b.id2
and     c.id88 = 88
;

Having (to my mind) cosmetically improved the query I’ll now ask the question: “Would it make sense to create column groups on a(id, id1), b(id, id1) and c(id, id2) ?”

I’ve written various articles on cases where column groups have effects (or not): “out of range” predicates, “is null” predicates, “histograms issues”, “statistics at both ends of the join”, and “multi-column indexes vs. column groups” are just some of the key areas – are there any clues in those bullet points?

Assuming there are no reasons to stop a particular column group from working we can look at the join from table A to table B: it’s a two-column join so if there’s some strong correlation between the id and id1 columns of these two tables then creating the two column groups (one at each end of the join) can make a difference to the optimizer’s calculations with the most likely effect that the cardinality estimate on the join will go up and, as a side effect, the join order and join method may change.

If we then consider the join to table C we note that it involves two columns from table C being joined to one column from table A and one from table B so, while we could create a column group on the two columns at the table C end of the join, a column group is simply not possible at the A/B end of the join. This means that one end of the join may have a selectivity that is hugely increased (far fewer combinations) because the column group has quantified the correlation but the selectivity at the other end is simply based on the two separate selectivities from a.id and b.id2, and that’s likely to be smaller than the selectivity of (c.id, c.id2), and the optimizer will choose the smaller join selectivity hence producing a lower cardinality estimate.

This is where the collateral issue appears – and it’s a point which also justified my careful rearrangement of the SQL text: there is an opportunity for transitive closure that the human eye can see but the optimizer is not coded to manipulate. We have two predicates: “b.id = a.id” and “c.id = a.id” but they can only both be true when “c.id = b.id”, so let’s replace “c.id = a.id” with “c.id = b.id” so that the join predicates for table C become:

and     c.id   = b.id
and     c.id2  = b.id2

Both left hand sides reference table C, both right hand sides reference table B – so if we now create a column group on c(id, id2) and a column group on b(id, id2) then we may give Oracle some better information about this join as well. In fact, even if we create NO column groups at all this specific change may be enough to result in a change in the selectivity calculations with a subsequent change in the cardinality estimates and execution plan.

April 20, 2016

Column Groups

Filed under: Column groups,Oracle,Statistics — Jonathan Lewis @ 9:07 am BST Apr 20,2016

Patrick Jolliffe alerted the Oracle-L list to a problem that appears when you combine fixed length character columns (i.e. char() or nchar())  with column group statistics. The underlying cause of the problem is the “blank padding” semantics that Oracle uses by default to compare varchar2 with char, so I’ll start with a little demo of that. First some sample data:


rem     Script:         col_group_char_bug.sql
rem     Author:         Jonathan Lewis
rem     Dated:          Apr 2016

execute dbms_random.seed(0)

create table t1
nologging
as
with generator as (
        select  --+ materialize
                rownum id
        from dual
        connect by
                level <= 1e4
)
select
        cast(chr(trunc(dbms_random.value(1,6))+64) as char(1))  c1,
        cast(chr(trunc(dbms_random.value(1,6))+64) as char(2))  c2,
        cast('X' as varchar2(2))                                v2
from
        generator       v1
where
        rownum <= 5 * 5 * 10
;

insert into t1(c1, c2, v2)
select  'X', 'X', 'X'
from    t1
;

update t1 set v2 = c2;
commit;


The little demos I’m going to report here don’t use all the data in this table – there are several other tests in the script that I won’t be reporting – so I’ll just point out that there are 500 rows in the table, half of them have ‘X’ in all three columns, and half of them have a uniform distribution of the letters ‘A’ to ‘E’ in every column.

  • Column c1 is declared as char(1) – so it will hold the data exactly as it was inserted by the script.
  • Column c2 is declared as char(2) – so even though the script apparently inserts a character string of length 1, this will be padded with a space to two characters before being stored.

Now we can create some stats – in particular a frequency histogram on the c2 column – and check the cardinality estimates for a couple of queries:

begin
        dbms_stats.gather_table_stats(
                ownname          => user,
                tabname          => 'T1',
                method_opt       => 'for all columns size 254'
        );
end;
/

set autotrace traceonly explain

prompt  ==================
prompt  c2 without padding
prompt  ==================

select  *
from    t1
where   c2 = 'X'
;

prompt  ================
prompt  c2 with padding
prompt  ================

select  *
from    t1
where   c2 = 'X '
;

set autotrace off

The first query compares c2 with the single character ‘X’, the second compares it with the two-character string ‘X ‘. But since the comparison is with a char(2) column the optimizer pads the first constant with spaces, and both queries end up predicting the same cardinality:


==================
c2 without padding
==================

Execution Plan
----------------------------------------------------------
Plan hash value: 3617692013

--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      |   250 |  2000 |    17   (0)| 00:00:01 |
|*  1 |  TABLE ACCESS FULL| T1   |   250 |  2000 |    17   (0)| 00:00:01 |
--------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter("C2"='X')

================
c2 with padding
================

Execution Plan
----------------------------------------------------------
Plan hash value: 3617692013

--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      |   250 |  2000 |    17   (0)| 00:00:01 |
|*  1 |  TABLE ACCESS FULL| T1   |   250 |  2000 |    17   (0)| 00:00:01 |
--------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter("C2"='X ')


Note that both queries predict the 250 rows where (we know) c2 = ‘X ‘; even though the predicate sections suggest the queries are looking for different data sets. This IS the expected behaviour.

Now let’s make things more complex – we’ll add the predicate “and c1 = ‘X'” to both queries but we’ll create a column group with histogram on (c1, c2) before checking the plans. Again we expect both versions of the new query to predict the same volume of data and (in fact) to produce a perfect prediction because we have so few rows and so few distinct combinations that we should get a perfect frequency histogram:


begin
        dbms_stats.gather_table_stats(
                ownname          => user,
                tabname          =>'T1',
                method_opt       => 'for all columns size 1 for columns (c1, c2) size 254'
        );
end;
/

prompt  ========================
prompt  (c1, c2) without padding
prompt  ========================

select  *
from    t1
where   c1 = 'X' and c2 = 'X'
;

prompt  =====================
prompt  (c1, c2) with padding
prompt  =====================

select  *
from    t1
where   c1 = 'X' and c2 = 'X '
;

And here are the execution plans:

========================
(c1, c2) without padding
========================

Execution Plan
----------------------------------------------------------
Plan hash value: 3617692013

--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      |     2 |    16 |    17   (0)| 00:00:01 |
|*  1 |  TABLE ACCESS FULL| T1   |     2 |    16 |    17   (0)| 00:00:01 |
--------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter("C1"='X' AND "C2"='X')

=====================
(c1, c2) with padding
=====================

Execution Plan
----------------------------------------------------------
Plan hash value: 3617692013

--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      |   250 |  2000 |    17   (0)| 00:00:01 |
|*  1 |  TABLE ACCESS FULL| T1   |   250 |  2000 |    17   (0)| 00:00:01 |
--------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter("C1"='X' AND "C2"='X ')


If we run the query where the literal is padded with spaces to the correct length (2nd query) then the prediction is correct. But if we haven’t padded the literal the prediction is wrong; the estimate is the one the optimizer would have used for “value not found in histogram”.

I think what’s happening is that the optimizer doesn’t “remember” that the literal is being compared with a char() when making the call to sys_op_combined_hash() that it uses for calculating column group stats so it doesn’t pad the column with spaces before calling the function and, as a consequence, the hashed value isn’t the one it should be using.

I’ve run this test on 11.2.0.4 and 12.1.0.2 – the effects are the same on both versions.

Bottom Line:

Be careful about how you use char() data types in your code, and be especially careful if you think you’re going to be creating column group stats involving char() columns – and then remember that 12c may generate column group stats automatically for you. If you use char() columns you will have to ensure that predicates using literal values should have those values padded to the correct number of spaces if you want to have the best possible chance of getting the correct execution plans.

 

December 29, 2015

Column Groups

Filed under: Column groups,Oracle,Statistics,Tuning — Jonathan Lewis @ 1:13 pm GMT Dec 29,2015

I think the “column group” variant of extended stats is a wonderful addition to the Oracle code base, but there’s a very important detail about using the feature that I hadn’t really noticed until a question came up on the OTN database forum recently about a very bad join cardinality estimate.

The point is this: if you have a multi-column equality join and the optimizer needs some help to get a better estimate of join cardinality then column group statistics may help if you create matching stats at both ends of the join. There is a variation on this directive that helps to explain why I hadn’t noticed it before – multi-column indexes (with exactly the correct columns) have the same effect and, most significantly, the combination of  one column group and a matching multi-column index will do the trick.

Here’s some code to demonstrate the effect:

create table t8
as
select
        trunc((rownum-1)/125)   n1,
        trunc((rownum-1)/125)   n2,
        rpad(rownum,180)        v1
from
        all_objects
where
        rownum <= 1000
;

create table t10
as
select
        trunc((rownum-1)/100)   n1,
        trunc((rownum-1)/100)   n2,
        rpad(rownum,180)        v1
from
        all_objects
where
        rownum <= 1000
;
begin
        dbms_stats.gather_table_stats(
                user,
                't8',
                method_opt => 'for all columns size 1'
        );
        dbms_stats.gather_table_stats(
                user,
                't10',
                method_opt => 'for all columns size 1'
        );
end;
/

set autotrace traceonly

select
        t8.v1, t10.v1
from
        t8,t10
where
        t10.n1 = t8.n1
and     t10.n2 = t8.n2
/

set autotrace off

Table t8 has eight distinct values for n1 and n2, and 8 combinations (though the optimizer will assume there are 64 combinations); table t10 has ten distinct values for n1 and n2, and ten combinations (though the optimizer will assume there are 100 combinations). In the absence of any column group stats (or histograms, or indexes) and with no filter predicates on either table, the join cardinality will be “{Cartesian Join cardinality} * {join selectivity}”, and in the absence of any nulls the join selectivity – thanks to the “multi-column sanity check” – will be 1/(greater number of distinct combinations). So we get 1,000,000 / 100 = 10,000.

Here’s the output from autotrace in 11.2.0.4 to prove the point:


---------------------------------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      | 10000 |  3652K|    11  (10)| 00:00:01 |
|*  1 |  HASH JOIN         |      | 10000 |  3652K|    11  (10)| 00:00:01 |
|   2 |   TABLE ACCESS FULL| T8   |  1000 |   182K|     5   (0)| 00:00:01 |
|   3 |   TABLE ACCESS FULL| T10  |  1000 |   182K|     5   (0)| 00:00:01 |
---------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - access("T10"."N1"="T8"."N1" AND "T10"."N2"="T8"."N2")


Statistics
----------------------------------------------------------
          1  recursive calls
          0  db block gets
        835  consistent gets
          0  physical reads
          0  redo size
   19965481  bytes sent via SQL*Net to client
      73849  bytes received via SQL*Net from client
       6668  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
     100000  rows processed

As you can see, the query actually returns 100,000 rows. The estimate of 10,000 is badly wrong thanks to the correlation between the n1 and n2 columns. So let’s check the effect of creating a column group on t10:


begin
        dbms_stats.gather_table_stats(
                user,
                't10',
                method_opt => 'for all columns size 1 for columns (n1,n2) size 1'
        );
end;
/

At this point you might think that the optimizer’s sanity check might say something like: t8 table: 64 combinations, t10 table column group 10 combinations so use the 64 which is now the greater num_distinct. It doesn’t – maybe it will in some future version, but at present the optimizer code doesn’t seem to recognize this as a possibility. (I won’t bother to reprint the unchanged execution plan.)

But, at this point, I could create an index on t8(n1,n2) and run the query again:


create index t8_i1 on t8(n1, n2);

select
        t8.v1, t10.v1
from
        t8,t10
where
        t10.n1 = t8.n1
and     t10.n2 = t8.n2
/

Index created.


100000 rows selected.


Execution Plan
----------------------------------------------------------
Plan hash value: 216880280

---------------------------------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |   100K|    35M|    12  (17)| 00:00:01 |
|*  1 |  HASH JOIN         |      |   100K|    35M|    12  (17)| 00:00:01 |
|   2 |   TABLE ACCESS FULL| T8   |  1000 |   182K|     5   (0)| 00:00:01 |
|   3 |   TABLE ACCESS FULL| T10  |  1000 |   182K|     5   (0)| 00:00:01 |
---------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - access("T10"."N1"="T8"."N1" AND "T10"."N2"="T8"."N2")

Alternatively I could create a column group at the t8 table:



drop index t8_i1;

begin
        dbms_stats.gather_table_stats(
                user,
                't8',
                method_opt => 'for all columns size 1 for columns (n1,n2) size 1'
        );
end;
/

select  
        t8.v1, t10.v1 
from
        t8,t10
where
        t10.n1 = t8.n1
and     t10.n2 = t8.n2
/

Index dropped.


PL/SQL procedure successfully completed.


100000 rows selected.


Execution Plan
----------------------------------------------------------
Plan hash value: 216880280

---------------------------------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |   100K|    35M|    12  (17)| 00:00:01 |
|*  1 |  HASH JOIN         |      |   100K|    35M|    12  (17)| 00:00:01 |
|   2 |   TABLE ACCESS FULL| T8   |  1000 |   182K|     5   (0)| 00:00:01 |
|   3 |   TABLE ACCESS FULL| T10  |  1000 |   182K|     5   (0)| 00:00:01 |
---------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - access("T10"."N1"="T8"."N1" AND "T10"."N2"="T8"."N2")


If you’re wondering why I’ve not picked up this “both ends” detail in the past – it’s because I’ve usually been talking about replacing indexes with column groups and my examples have probably started with indexes at both end of the join before I replaced one index with a column group. (The other examples I’ve given of column groups are typically about single-table access rather than joins.)

 

November 5, 2015

Column Groups

Filed under: CBO,Column groups,Oracle,Statistics — Jonathan Lewis @ 6:48 am GMT Nov 5,2015

I think the “column group” variant of extended stats can be amazingly useful in helping the optimizer to generate good execution plans because of the way they supply better details about cardinality; unfortunately we’ve already seen a few cases (don’t forget to check the updates and comments) where the feature is disabled, and another example of this appeared on OTN very recently.

Modifying the example from OTN to make a more convincing demonstration of the issue, here’s some SQL to prepare a demonstration:

rem
rem     Script:         column_group_null.sql
rem     Author:         Jonathan Lewis
rem     Dated:          Nov 2015
rem     Purpose:
rem

create table t1 ( col1 number, col2 number, col3 date);

insert  into t1
select 1 ,1 ,to_date('03-Nov-2015') from dual
union all
select 1, 2, to_date('03-Nov-2015')  from dual
union all
select 1, 1, to_date('03-Nov-2015')  from dual
union all
select 2, 2, to_date('03-Nov-2015')  from dual
union all   
select 1 ,1 ,null  from dual
union all  
select 1, 1, null  from dual
union all  
select 1, 1, null  from dual
union all
select 1 ,1 ,to_date('04-Nov-2015')  from dual
union all  
select 1, 1, to_date('04-Nov-2015')  from dual
union all  
select 1, 1, to_date('04-Nov-2015')  from dual
;

begin
        dbms_stats.gather_table_stats(
                ownname         => user,
                tabname         => 'T1',
                method_opt      => 'for all columns size 1'
        );

        dbms_stats.gather_table_stats(
                ownname         => user,
                tabname         => 'T1',
                method_opt      => 'for columns (col1, col2, col3)'
        );
end;
/

I’ve collected stats in a slightly unusual fashion because I want to make it clear that I’ve got “ordinary” stats on the table, with a histogram on the column group (col1, col2, col3). You’ll notice that this combination is a bit special – of the 10 rows in the table there are three with the values (1,1,null) and three with the values (1,1,’04-Nov-2015′), so some very clear skew to the data which results in Oracle gathering a frequency histogram on the table.

These two combinations are remarkably similar, so what happens when we execute a query to find them – since there are no indexes the plan will be a tablescan, but what will we see as the cardinality estimates ? Surely they should be the same for both combinations:


select  count(*)
from    t1
where
        col1 = 1
and     col2 = 1
and     col3 = '04-Nov-2015'
;

select  count(*)
from    t1
where
        col1 = 1
and     col2 = 1
and     col3 is null

Brief pause for thought …

and here are the execution plans, including predicate section – in the same order (from 11.2.0.4 and 12.1.0.2):


---------------------------------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |     1 |    12 |     2   (0)| 00:00:01 |
|   1 |  SORT AGGREGATE    |      |     1 |    12 |            |          |
|*  2 |   TABLE ACCESS FULL| T1   |     3 |    36 |     2   (0)| 00:00:01 |
---------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - filter("COL1"=1 AND "COL2"=1 AND "COL3"=TO_DATE(' 2015-11-04
              00:00:00', 'syyyy-mm-dd hh24:mi:ss'))


---------------------------------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |     1 |    12 |     2   (0)| 00:00:01 |
|   1 |  SORT AGGREGATE    |      |     1 |    12 |            |          |
|*  2 |   TABLE ACCESS FULL| T1   |     1 |    12 |     2   (0)| 00:00:01 |
---------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - filter("COL3" IS NULL AND "COL1"=1 AND "COL2"=1)

The predictions are different – the optimizer has used the histogram on the column group for the query with “col3 = to_date()”, but not for the query with “col3 is null”. That’s a bit of a shame really because there are bound to be cases where some queries would benefit enormously from having a column group used even when some of its columns are subject to “is null” tests.

Analysis

The demonstration above isn’t sufficient to prove the point, of course; it merely shows an example of a suspiciously bad estimate. Here are a few supporting details – first we show that both the NULL and the ’04-Nov-2015′ combinations do appear in the histogram. We do this by checking the column stats, the histogram stats, and the values that would be produced by the hashing function for the critical combinations:


set null "n/a"

select distinct
        col3,
        mod(sys_op_combined_hash(col1, col2, col3), 9999999999)
from    t1
where
        col3 is null
or      col3 = to_date('04-Nov-2015')
order by
        2
;

column endpoint_actual_value format a40
column column_name           format a32
column num_buckets           heading "Buckets"

select
        column_name,
        num_nulls, num_distinct, density,
        histogram, num_buckets
from
        user_tab_cols
where
        table_name = 'T1'

break on column_name skip 1

select
        column_name,
        endpoint_number, endpoint_value,
        endpoint_actual_value -- , endpoint_repeat_count
from
        user_tab_histograms
where
        table_name = 'T1'
and     column_name not like 'COL%'
order by
        table_name, column_name, endpoint_number
;

(For an explanation of the sys_op_combined_hash() function, see this URL).

Here’s the output from the three queries:


COL3      MOD(SYS_OP_COMBINED_HASH(COL1,COL2,COL3),9999999999)
--------- ----------------------------------------------------
04-NOV-15                                           5347969765
n/a                                                 9928298503

COLUMN_NAME                       NUM_NULLS NUM_DISTINCT    DENSITY HISTOGRAM          Buckets
-------------------------------- ---------- ------------ ---------- --------------- ----------
COL1                                      0            2         .5 NONE                     1
COL2                                      0            2         .5 NONE                     1
COL3                                      3            2         .5 NONE                     1
SYS_STU2IZIKAO#T0YCS1GYYTTOGYE            0            5        .05 FREQUENCY                5


COLUMN_NAME                      ENDPOINT_NUMBER ENDPOINT_VALUE ENDPOINT_ACTUAL_VALUE
-------------------------------- --------------- -------------- ----------------------------------------
SYS_STU2IZIKAO#T0YCS1GYYTTOGYE                 1      465354344
                                               4     5347969765
                                               6     6892803587
                                               7     9853220028
                                              10     9928298503

As you can see, there’s a histogram only on the combination and Oracle has found 5 distinct values for the combination. At endpoint 4 you can see the combination that includes 4th Nov 2015 (with the interval 1 – 4 indicating a frequency of 3 rows) and at endpoint 10 you can see the combination that includes the null (again with an interval indicating 3 rows). The stats are perfect to get the job done but the optimizer doesn’t seem to use them.

If we examine the optimizer trace file (event 10053) we can see concrete proof that this is the case when we examine the “Single Table Access Path” sections for the two queries – here’s a very short extract from each trace file, the first for the query with “col3 = to_date()”, the second for “col3 is null”:


ColGroup (#1, VC) SYS_STU2IZIKAO#T0YCS1GYYTTOGYE
  Col#: 1 2 3    CorStregth: 1.60
ColGroup Usage:: PredCnt: 3  Matches Full: #1  Partial:  Sel: 0.3000


ColGroup (#1, VC) SYS_STU2IZIKAO#T0YCS1GYYTTOGYE
  Col#: 1 2 3    CorStregth: 1.60
ColGroup Usage:: PredCnt: 2  Matches Full:  Partial:

Apparently “col3 is null” is not a predicate!

The column group can be used only if you have equality predicates on all the columns. This is a little sad – the only time that the sys_op_combined_hash() will return a null is (I think) when all its inputs are null so there is one very special case for null handling with column groups – and even then the num_nulls for the column group would tell the optimizer what it needed to know. As it is we have exactly the information we need to get a good cardinality estimate for the second query but the optimizer is not going to use it.

Summary

If you create a column group to help the optimizer with cardinality calculations it will not be used for queries where any of the underlying columns is used in an “is null” predicate. This is coded into the optimizer, it doesn’t appear to be an accident.

Addendum (Sep 2018)

In all recent versions of Oracle the optimizer can use the number of distinct keys in indexes  (dba_indexes.distinct_keys) in the same way it uses column groups, but if you use an “is null” predicate on any of the indexed columns rather than “equals constant” this feature is also disabled.

 

Website Powered by WordPress.com.