Oracle Scratchpad

August 2, 2018

Extended Histograms – 2

Filed under: extended stats,Histograms,Oracle,Statistics — Jonathan Lewis @ 2:13 pm BST Aug 2,2018

Following on from the previous posting which raised the idea of faking a frequency histogram for a column group (extended stats), this is just a brief demonstration of how you can do this. It’s really only a minor variation of something I’ve published before, but it shows how you can use a query to generate a set of values for the histogram and it pulls in a detail about how Oracle generates and stores column group values.

We’ll start with the same table as we had before – two columns which hold only the combinations (‘Y’, ‘N’) or (‘N’, ‘Y’) in a very skewed way, with a requirement to ensure that the optimizer provides an estimate of 1 if a user queries for (‘N’,’N’) … and I’m going to go the extra mile and create a histogram that does the same when the query is for the final possible combination of (‘Y’,’Y’).

Here’s the starting code that generates the data, and creates histograms on all the columns (I’ve run this against 12.1.0.2 and 12.2.0.1 so far):


rem
rem     Script:         histogram_hack_2a.sql
rem     Author:         Jonathan Lewis
rem     Dated:          Jul 2018
rem
rem     Last tested 
rem             12.2.0.1
rem             12.1.0.2
rem             11.2.0.4
rem

create table t1
as
select 'Y' c2, 'N' c3 from all_objects where rownum <= 71482 -- > comment to deal with wordpress format issue.
union all
select 'N' c2, 'Y' c3 from all_objects where rownum <= 1994 -- > comment to deal with wordpress format issue.
;

variable v1 varchar2(128)

begin
        :v1 := dbms_stats.create_extended_stats(null,'t1','(c2,c3)');
        dbms_output.put_line(:v1);
end;
/

execute dbms_stats.gather_table_stats(null, 't1', method_opt=>'for all columns size 10');

In a variation from the previous version of the code I’ve used the “create_extended_stats()” function so that I can return the resulting virtual column name (also known as an “extension” name) into a variable that I can use later in an anonymous PL/SQL block.

Let’s now compare the values stored in the histogram for that column with the values generated by a function call that I first referenced a couple of years ago:


select
        endpoint_value
from 
        user_tab_histograms
where
        table_name = 'T1'
and     column_name = :v1
;

select 
        distinct c2, c3, 
        mod(sys_op_combined_hash(c2,c3),9999999999) endpoint_value
from t1
;

ENDPOINT_VALUE
--------------
    4794513072
    6030031083

2 rows selected.


C C ENDPOINT_VALUE
- - --------------
N Y     4794513072
Y N     6030031083

2 rows selected.

So we have a method of generating the values that Oracle should store in the histogram; now we need to generate 4 values and supply them to a call to dbms_stats.set_column_stats() in the right order with the frequencies we want to see:


declare
        l_distcnt number;
        l_density number;
        l_nullcnt number;
        l_avgclen number;

        l_srec  dbms_stats.statrec;
        n_array dbms_stats.numarray;

begin
        dbms_stats.get_column_stats (
                ownname =>null,
                tabname =>'t1',
                colname =>:v1,
                distcnt =>l_distcnt,
                density =>l_density,
                nullcnt =>l_nullcnt,
                avgclen =>l_avgclen,
                srec    =>l_srec
        );

        l_srec.novals := dbms_stats.numarray();
        l_srec.bkvals := dbms_stats.numarray();

        for r in (
                select
                        mod(sys_op_combined_hash(c2,c3),9999999999) hash_value, bucket_size
                from    (
                        select 'Y' c2, 'Y' c3, 1 bucket_size from dual
                        union all
                        select 'N' c2, 'N' c3, 1 from dual
                        union all
                        select 'Y' c2, 'N' c3, 71482 from dual
                        union all
                        select 'N' c2, 'Y' c3, 1994 from dual
                        )
                order by hash_value
        ) loop
                l_srec.novals.extend;
                l_srec.novals(l_srec.novals.count) := r.hash_value;

                l_srec.bkvals.extend;
                l_srec.bkvals(l_srec.bkvals.count) := r.bucket_size;
        end loop;

        n_array := l_srec.novals;

        l_distcnt  := 4;
        l_srec.epc := 4;

--
--      For 11g rpcnts must not be mentioned
--      For 12c is must be set to null or you
--      will (probably) raise error:
--              ORA-06533: Subscript beyond count
--

        l_srec.rpcnts := null;

        dbms_stats.prepare_column_values(l_srec, n_array);

        dbms_stats.set_column_stats(
                ownname =>null,
                tabname =>'t1',
                colname =>:v1,
                distcnt =>l_distcnt,
                density =>l_density,
                nullcnt =>l_nullcnt,
                avgclen =>l_avgclen,
                srec    =>l_srec
        );

end;

The outline of the code is simply: get_column_stats, set up a couple of arrays and simple variables, prepare_column_values, set_column_stats. The special detail that I’ve included here is that I’ve used a “union all” query to generate an ordered list of hash values (with the desired frequencies), then grown the arrays one element at a time to copy them in place. (That’s not the only option at this point, and it’s probably not the most efficient option – but it’s good enough). In the past I’ve used this type of approach but used an analytic query against the table data to produce the equivalent of 12c Top-frequency histogram in much older versions of Oracle.

A couple of important points – I’ve set the “end point count” (l_srec.epc) to match the size of the arrays, and I’ve also changed the number of distinct values to match. For 12c to tell the code that this is a frequency histogram (and not a hybrid) I’ve had to null out the “repeat counts” array (l_srec.rpcnts). If you run this on 11g the reference to rpcnts is illegal so has to be commented out.

After running this procedure, here’s what I get in user_tab_histograms for the column:


select
        endpoint_value                          column_value,
        endpoint_number                         endpoint_number,
        endpoint_number - nvl(prev_endpoint,0)  frequency
from    (
        select
                endpoint_number,
                lag(endpoint_number,1) over(
                        order by endpoint_number
                )                               prev_endpoint,
                endpoint_value
        from
                user_tab_histograms
        where
                table_name  = 'T1'
        and     column_name = :v1
        )
order by endpoint_number
;

COLUMN_VALUE ENDPOINT_NUMBER  FREQUENCY
------------ --------------- ----------
   167789251               1          1
  4794513072            1995       1994
  6030031083           73477      71482
  8288761534           73478          1

4 rows selected.


It’s left as an exercise to the reader to check that the estimated cardinality for the predicate “c2 = ‘N’ and c3 = ‘N'” is 1 with this histogram in place.

July 31, 2018

Extended Histograms

Filed under: CBO,extended stats,Histograms,Oracle,Statistics — Jonathan Lewis @ 11:05 pm BST Jul 31,2018

Today’s little puzzle comes courtesy of the Oracle-L mailing list. A table has two columns (c2 and c3), which contain only the values ‘Y’ and ‘N’, with the following distribution:


select   c2, c3, count(*)
from     t1
group by c2, c3
;

C C   COUNT(*)
- - ----------
N Y       1994
Y N      71482

2 rows selected.

The puzzle is this – how do you get the optimizer to predict a cardinality of zero (or, using its best approximation, 1) if you execute a query where the predicate is:

where   c2 = 'N' and c3 = 'N'

Here are 4 tests you might try:

  • Create simple stats (no histograms) on c2 and c3.
  • Create frequency histograms on c2 and c3
  • Create a column group (extended stats) on (c2,c3) but no histograms
  • Create a column group (extended stats) on (c2,c3) with a histogram on (c2, c3)

If you do these tests you’ll find the estimated cardinalities are (from 12.1.0.2):

  • 18,369 – derived as 73,476 / 4  … total rows over total possible combinations
  • 1,940   – derived as 73,476 * (1,994/73,476) * (71,482/73,476) … total rows * fraction where c2 = ‘N’ * fraction where c3 = ‘N’
  • 36,738 – derived as 73,476 / 2 … total rows / number of distinct combinations of (c2, c3)
  • 997      – derived as 1,994 / 2 … half the frequency of the least frequently occurring value in the histogram

The last algorithm appeared in 10.2.0.4; prior to that a “value not in frequency histogram” would have been given an estimated cardinality of 1 (which is what the person on Oracle-L wanted to see).

In fact the optimizer’s behaviour can be reverted to the 10.2.0.3 mechanism by setting fix-control 5483301 to zero (or off), either with an “alter session” call or inside the /*+ opt_param() */ hint. There is, however, another option – if you get the column stats, then immediately set them (dbms_stats.get_column_stats(), dbms_stats.set_column_stats()) the optimizer defines the stats as “user defined” and (for reasons I don’t know – perhaps it’s an oversight) reverts to the 10.2.0.3 behaviour. Here’s some code to demonstrate the point; as the srcipt header says, I’ve tested it on versions up to 18.1


rem
rem     Script:         histogram_hack_2.sql
rem     Author:         Jonathan Lewis
rem     Dated:          Jul 2018
rem
rem     Last tested 
rem             18.1.0.0        via LiveSQL (with some edits)
rem             12.2.0.1
rem             12.1.0.2
rem

create table t1
as
select 'Y' c2, 'N' c3 from all_objects where rownum <= 71482 -- > comment to avoid format issue
union all
select 'N' c2, 'Y' c3 from all_objects where rownum <= 1994 -- > comment to avoid format issue
;

execute dbms_stats.gather_table_stats(user,'t1',method_opt=>'for all columns size 10 for columns (c2,c3) size 10');

column column_name format a128 new_value m_colname

select  column_name
from    user_tab_cols
where   table_name = 'T1'
and     column_name not in ('C2','C3')
;

set autotrace traceonly explain
select /* pre-hack */ * from t1 where c2 = 'N' and c3 = 'N';
set autotrace off

declare
        l_distcnt number default null;
        l_density number default null;
        l_nullcnt number default null;
        l_srec    dbms_stats.statrec;
        l_avgclen number default null;

begin

        dbms_stats.get_column_stats (
                ownname =>user,
                tabname =>'t1',
                colname =>'&m_colname',
                distcnt =>l_distcnt,
                density =>l_density,
                nullcnt =>l_nullcnt,
                srec    =>l_srec,
                avgclen =>l_avgclen
        );

        dbms_stats.set_column_stats(
                ownname =>user,
                tabname =>'t1',
                colname =>'&m_colname',
                distcnt =>l_distcnt,
                density =>l_density,
                nullcnt =>l_nullcnt,
                srec    =>l_srec,
                avgclen =>l_avgclen
        );

end;
/

set autotrace traceonly explain
select /* post-hack */  * from t1 where c2 = 'N' and c3 = 'N';
set autotrace off

I’ve created a simple table for the data and collected stats including histograms on the two columns and on the column group. I’ve taken a simple strategy to find the name of the column group (I could have used the function dbms_stats.create_extended_stats() to set an SQL variable to the name of the column group, of course), and then run a little bit of PL/SQL that literally does nothing more than copy the column group’s stats into memory then write them back to the data dictionary.

Here are the “before” and “after” execution plans that we get from autotrace:

BEFORE
--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      |   997 |  3988 |    23  (27)| 00:00:01 |
|*  1 |  TABLE ACCESS FULL| T1   |   997 |  3988 |    23  (27)| 00:00:01 |
--------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter("C2"='N' AND "C3"='N')


AFTER
--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      |     1 |     4 |    23  (27)| 00:00:01 |
|*  1 |  TABLE ACCESS FULL| T1   |     1 |     4 |    23  (27)| 00:00:01 |
--------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter("C2"='N' AND "C3"='N')

As required – the estimate for the (‘N’,’N’) rows drops down to (the optimizer’s best approximation to ) zero.

Footnote:

An alternative strategy (and, I’d say, a better strategic approach) would have been to create a “fake” frequency histogram that included the value (‘N’,’N’) giving it a frequency of 1 – a method I’ve suggested in the past  but with the little problem that you need to be able to work out the value to use in the array passed to dbms_stats.set_column_stats() to represent the value for the (‘N’,’N’) combination – and I’ve written about that topic in the past as well.

You might wonder why the optimizer is programmed to use “half the least popular” for predicates references values not in the index. Prior to 12c it’s easy to make an argument for the algorithm. Frequency histograms used to be sampled with a very small sample size, so if you were unlucky a “slightly less popular” value could be missed completely in the sample; if you were requesting a value that didn’t appear in the histogram then (presumably) you knew it should exist in the data, so guessing a cardinality somewhat less than the least popular must have seemed like a good idea.

In 12c, of course, you ought to be taking advantage of the “approximate NDV” implementation for using a 100% sample to generate frequency (and Top-N / Top-Frequency histograms). If you’ve got a 12c frequency histogram then the absence of a value in the histogram means the data really wasn’t there so a cardinality estimate of 1 makes more sense. (Of course, you might have allowed Oracle to gather the histogram at the wrong time – but that’s a different issue). If you’ve got a Top-N histogram then the optimizer will behave as if a “missing” value is one of those nominally allowed for in the “low frequency” bucket and use neither the 1 nor the “half the least popular”.

So, for 12c and columns with frequency histograms it seems perfectly reasonably to set the fix control to zero – after getting approval from Oracle support, of course.

 

April 10, 2018

exp catch

Filed under: 12c,Histograms,Oracle,Statistics — Jonathan Lewis @ 5:52 pm BST Apr 10,2018

No-one should be using exp/imp to export and import data any more, they should be using the datapump equivalents expdp/impdp – but if you’re on an older (pre-12c) version of Oracle and still using exp/imp to do things like moving tables with their production statistics over to test systems then be careful that you don’t fall into an obsolescence trap when you finally upgrade to 12c (or Oracle 18).

exp/imp will mess up some of your histograms if you’re still using them to move tables/statistics in 12c.

Remember that 12c can create “Top-N” and “hybrid” histograms – and exp/imp were written long before these new histogram types came into existence. The code has not been updated to allow for the new histogram types so if you happen to generate any histograms of these type in a 12c system and then use exp/imp to move some table stats (and it’s particularly an issue relating to stats) from one system to another – the stats that arrive at the destination system won’t match the stats that left the source system.

Here’s a little sample code to build a model that I can use to demonstrate the problem. It creates a table with three columns that will make it easy for me to create one frequency histogram, one Top-N histogram and one hybrid histogram. I’ve included a couple of substitution variables in the code so that you can specify an Oracle instance to connect to and a directory for the export file that expdp is going to produce. Don’t forget to check that the directory I create in this script doesn’t overwrite a directory that already exists for other reasons on your test system.


rem
rem     Script:         12c_histograms.sql
rem     Author:         Jonathan Lewis
rem     Dated:          Sep 2015
rem
rem     Define m_service to be the service name you connect to
rem     Define m_directory to be the O/S directory you to use for
rem     the export/import/log files
rem
rem     Make sure this code is not over-writing an existing 
rem     definition for a directory called DMPDIR before you 
rem     start


define m_service = 'orcl'
define m_service = 'or32'

define m_directory = '/mnt/working'

host rm &m_directory/expdat.dmp
host rm &m_directory/expdp.dmp

create or replace directory dmpdir as '&m_directory';

drop table t1 purge;

create table t1
nologging
as
with generator as (
        select  --+ materialize
                rownum id 
        from dual 
        connect by 
                level <= 1e4 -- > comment to avoid wordpress format issue
)
select
        trunc(sqrt(rownum + 0)) frequency,
        trunc(sqrt(rownum + 0)) top_n,
        trunc(sqrt(rownum + 0)) hybrid
from
        generator
;

begin
        dbms_stats.gather_table_stats(
                ownname          => user,
                tabname          =>'T1',
                method_opt       => 'for columns frequency size 254 for columns top_n size 95 for columns hybrid size 50'
        );
end;
/

select
        column_name,
        num_distinct,
        histogram,
        num_buckets
from
        user_tab_cols
where
        table_name = 'T1'
;

column endpoint_actual_value format a22
break on column_name skip 1

select 
        column_name, 
        endpoint_number, endpoint_value, endpoint_actual_value, 
        endpoint_repeat_count 
from 
        user_tab_histograms 
where 
        table_name = 'T1'
order by 
        column_name, endpoint_number
;

Here’s an extract, from a 12.1.0.2 instance, of the results of the two queries with a large number of the rows from the histogram data deleted:


COLUMN_NAME              Distinct HISTOGRAM          Buckets
-------------------- ------------ --------------- ----------
FREQUENCY                     100 FREQUENCY              100
TOP_N                         100 TOP-FREQUENCY           95
HYBRID                        100 HYBRID                  50

COLUMN_NAME          ENDPOINT_NUMBER ENDPOINT_VALUE ENDPOINT_ACTUAL_VALUE  ENDPOINT_REPEAT_COUNT
-------------------- --------------- -------------- ---------------------- ---------------------
FREQUENCY                          3              1 1                                          0
                                   8              2 2                                          0
                                  15              3 3                                          0
...
                                9800             98 98                                         0
                                9999             99 99                                         0
                               10000            100 100                                        0


HYBRID                             3              1 1                                          3
                                 224             14 14                                        29
                                 440             20 20                                        41
...
                                9603             97 97                                       195
                                9800             98 98                                       197
                               10000            100 100                                        1

TOP_N                              1              1 1                                          0
                                  16              7 7                                          0
                                  33              8 8                                          0
...
                                9753             98 98                                         0
                                9952             99 99                                         0
                                9953            100 100                                        0

The most important detail is the endpoint_repeat_count column of the hybrid histogram, although you should note that the endpoint_actual_value columns is populated with a copy of the endpoint_value for all three histograms.

Now I’m going to use exp / drop table / imp (or the datapump equivalents) to export, drop, and re-import the table with (one hopes) the exact same statistics. To do this I’ll be using the imp command with the option “statistics=always” with the intention of copying the stats from the export file into the destination database (you’ll have to substitute your own userid/password, of course):


host exp   userid=test_user/test@&m_service file=expdat.dmp tables='(t1)'
-- host expdp userid=test_user/test@&m_service DIRECTORY=dmpdir DUMPFILE=expdp.dmp TABLES='(t1)'

drop table t1 purge;

host imp   userid=test_user/test@&m_service file=expdat.dmp tables='(t1)' statistics=always
-- host impdp userid=test_user/test@&m_service DIRECTORY=dmpdir DUMPFILE=expdp.dmp TABLES='(t1)'

So what do we see now when we re-run the two queries to report the histogram information:


COLUMN_NAME              Distinct HISTOGRAM          Buckets
-------------------- ------------ --------------- ----------
FREQUENCY                     100 FREQUENCY              100
TOP_N                         100 FREQUENCY               95
HYBRID                        100 FREQUENCY               50

COLUMN_NAME          ENDPOINT_NUMBER ENDPOINT_VALUE ENDPOINT_ACTUAL_VALUE  ENDPOINT_REPEAT_COUNT
-------------------- --------------- -------------- ---------------------- ---------------------
FREQUENCY                          3              1                                            0
                                   8              2                                            0
                                  15              3                                            0
...
                                9800             98                                            0
                                9999             99                                            0
                               10000            100                                            0

HYBRID                             3              1                                            0
                                 224             14                                            0
                                 440             20                                            0
...
                                9603             97                                            0
                                9800             98                                            0
                               10000            100                                            0

TOP_N                              1              1                                            0
                                  16              7                                            0
                                  33              8                                            0
...
                                9753             98                                            0
                                9952             99                                            0
                                9953            100                                            0

The histograms on all three columns are now labelled as FREQUENCY.
The endpoint_actual_value is null for all three – but that may be a purely cosmetic detail with no side effects.
The “hybrid” column really has become a frequency histogram – and that’s the critical one – the endpoint_repeat_count columns are all zero.

tl;dr

If you’re still using exp/imp instead of expdp/impdp to move tables (and, more importantly, their statistics) from one database to another then the upgrade to 12c may mean you end up with hybrid histograms on the source system that are “downgraded” to frequency histograms on the destination system, with the effect that execution plans vary between the two systems.

January 30, 2018

Histogram Threat

Filed under: Histograms,Oracle,Statistics,Troubleshooting — Jonathan Lewis @ 8:07 am BST Jan 30,2018

Have you ever seen a result like this:


SQL> select sql_id, count(*) from V$sql group by sql_id having count(*) > 1000;

SQL_ID		COUNT(*)
------------- ----------
1dbzmt8gpg8x7	   30516

A client of mine who had recently upgraded to 12.2.0.1 RAC, using DRCP (database resident connection pooling) for an application using PHP was seeing exactly this type of behaviour for a small number of very simple SQL statements and wanted to find out what was going on because they were also seeing an undesirable level of contention in the library cache when the system load increased.

In this note I just want to highlight a particular detail of their problem – with an example – showing how easily histograms can introduce problems if you don’t keep an eye out for the dangers.

One of their queries really was as simple as this:

select count(*), sum(skew2) from t1 where skew = :b1;

And I’m going to use this query to model the problem. All I have to do is arrange for a data set that results in a hybrid (or height-balanced) histogram being created on the skew column, and then run the query lots of times with different input bind values. In the case of the client there were around 18,000 possible values for the column, and the number of rows per value varied from 1 to about 20,000 – but whatever the number of rows selected the optimum execution plan was always going to be an indexed access.


rem
rem     Script:         acs_child_cursors.sql
rem     Author:         Jonathan Lewis
rem     Dated:          Jan 2018
rem

create table t1 (
        id, skew, skew2, padding
)
as
with generator as (
        select
                rownum id
        from dual
        connect by
                level <= 3e3
)
select
        rownum  id,
        g1.id   id1,
        g2.id   id2,
        rpad('x',100)
from
        generator       g1,
        generator       g2
where
        g2.id <= g1.id     -- > comment to avoid WordPress format issue
order by
        g2.id, g1.id
;

alter table t1 modify skew not null;
alter table t1 modify skew2 not null;

create index t1_skew on t1(skew);

begin
        dbms_stats.gather_table_stats(
                ownname          => user,
                tabname          =>'T1',
                method_opt       => 'for all columns size 1 for columns skew size 254'
--              method_opt       => 'for all columns size 1'
        );
end;
/

variable b1 number
exec :b1 := 0;

set termout off
@start_1000
@start_1000
@start_1000
set termout on

set linesize 120
set trimspool on

column sql_text format a55

select
        child_number, plan_hash_value, executions,
        is_bind_sensitive,
        is_bind_aware,
        is_shareable,
        sql_text
from
        v$sql
where   sql_id = 'b82my582cnvut'
;

The data set contains 3,000 distinct values for skew and the way I’ve generated the rows means that the value N will appear N times – so there’s one row with the value 1 and 3,000 rows with the value 3,000 and so on for a total of 4,501,500 rows. If you want to run the tes the code is likely to take a couple of minutes to complete, requiring roughly 700 MB of disk space.

The mechanism of the script start_1000 is something I published a few years ago, and essentially it executes a script called start_1.sql 1,000 times which, for this test, contains the following two lines:


exec :b1 := :b1 + 1

select count(*), sum(skew2) from t1 where skew = :b1;

The net effect of the 3 calls to start_1000.sql is that my simple SQL statement is called once in turn for each value of skew from 1 to 3,000. The SQL_ID of the statement is ‘b82my582cnvut’ which I’ve used to query v$sql when the run is complete, with the following result:


CHILD_NUMBER PLAN_HASH_VALUE EXECUTIONS I I I SQL_TEXT
------------ --------------- ---------- - - - -------------------------------------------------------
	   0	  1041516234	    498 Y N N select count(*), sum(skew2) from t1 where skew = :b1
	   1	  1041516234	     25 Y Y N select count(*), sum(skew2) from t1 where skew = :b1
	   2	  1041516234	    104 Y Y N select count(*), sum(skew2) from t1 where skew = :b1
	   3	  1041516234	    308 Y Y N select count(*), sum(skew2) from t1 where skew = :b1
	   4	  1041516234	    429 Y Y N select count(*), sum(skew2) from t1 where skew = :b1
	   5	  1041516234	    640 Y Y N select count(*), sum(skew2) from t1 where skew = :b1
	   6	  1041516234	     31 Y Y N select count(*), sum(skew2) from t1 where skew = :b1
	   7	  1041516234	    305 Y Y N select count(*), sum(skew2) from t1 where skew = :b1
	   8	  1041516234	    660 Y Y Y select count(*), sum(skew2) from t1 where skew = :b1

9 rows selected.

I’ve got 9 child cursors, all with the same execution plan, all except the last labelled as not shareable (you may find that you don’t get exactly the same result, on repeated tests I got between 5 and 9 cursors). Given enough time all these cursors except the last (shareable) one would be aged out of the library cache. In the case of the client, who had a shared pool that was probably quite a bit larger than needed, the number of non-shareable cursors could get very large and they were hanging around for ages. Typically most of the cursors would report an execution count less than 30, with many showing just one or two executions and a handful showing execution counts in the region of several hundred (and that hanful were the ones that were still marked as shareable).

After eliminating the critical histogram (using dbms_stats.delete_column_stats()) and eliminating the redundant child cursors (using dbms_pool.purge()) the massive proliferation stopped happening and the performance threat disappeared. The only issue then was to change the table preferences for stats collection on this table to add the clause “for columns size 1 skew” so that the histogram would not be recreated on the next gather.

Further Observations.

I suspect that part of the client’s probem – something that exaggerated the count rather than causing it – could be attributed to using DRCP (database resident connection pool) which probably still has some quirky bits of behaviour. It was also true that the client’s connection pool was probably much bigger than it needed to be so if there were any funny little bits of emergent behaviour at scale the client would probably have hit them.

The problem of escalating child cursors is one that Oracle has been working on for quite a long time, and there’s a (hidden) parameter that was introduced late in 11gR2 (though I think that the 11g mechanism first appeared through a fix control) to allow Oracle to mark a parent cursor obsolete if it acquired too many child cursors.  There’s a note on MoS that the client had read on this topic: Doc ID: 2298504.1: Cursor Mutex X Wait Events: After Upgrading To 12.2″ which looked as if it was describing their symptoms so they had set this parameter (_cursor_obsolete_threshold) from 8192 (the 12.2 default) down to 1024 (the default for 12.1 and earlier versions). This had been of some help with the library cache problem.  When the sql_id at the top of this article reported 30,516 child cursors that would be 29 “obsolete” parent cursors with 1,024 childs cursor and one “live” parent cursor with 820 child cursors.

You can appreciate that if Oracle has to find a parent cursor and pin it while walking a chain of 30,516 child cursors that’s likely to take a lot more time than walking a chain of 30 parent cursors (while holding a library cache mutex, perhaps) to find the one non-obsolete parent, then acquiring the parent mutex to walk a chain of 820 child cursor.

I don’t know the exact details of the mechanisms involved with manipulating the various chains – but there are likely to be times when one process needs exclusive mutexes/latches to change a chain while other processes are holding shared mutexes/latches to search the chain. When you’ve got 30,000 child cursors in total the important questions become: “where’s the trade-off between making the child chains shorter and making the parent search longer ?” (I’d expect – or hope – that the Oracle developers had actually found a way to code the obsolence so that the new parent was first in the search, and the rest were never visited, of course.)

One of the suggestions I made to try to alleviate the problem – which I had assumed was due in part to the degree of concurrent execution of the statement – was to mark the cursor as “hot” This resulted in 36 differet sql_ids for the same statement (the client machine had 72 CPUs). This had some effect but ultimately meant that there were 36 chains of parents that would eventually end up with lots of child cursors – the only difference was the rate at which the total cursor count was growing (a lot slower), balanced against the threat that they might all manage to get to 30,000+ child cursors! Instead, as a quick and dirty workaround, I supplied the client with a script that could be run at quieter moments during the day to call dbms_shared_pool.purge() for the sql_id to flush all its cursors from the library cache.

One final oddity – which isn’t from the client site. When I changed my test above to avoid generating the histogram (using the commented out method_opt “for all columns size 1”) I found that I still got two child cursors; the first ended up marked as bind-aware but insensitive and non-shareable, the second appeared (time and time again) when my bind value got to 499 and was marked bind-sensitive, bind-aware and shareable.  I still have no idea why this happened.

Footnote:

When repeating the test I started with a “flush shared_pool” – but I’ve removed this line from the code above in case anyone ran it without considering the possible side effects. You do need to clear previous copies of the key statement from the library cache, though, if you want to avoid confusing the issue on repeated runs of the test.

 

January 23, 2018

Histograms

Filed under: Histograms,humour,Oracle,Statistics — Jonathan Lewis @ 7:58 am BST Jan 23,2018

I had a sudden urge to go a bit meta – so here’s a relative frequency histogram of my observations of the general use of histograms in Oracle:

 

 

 

January 18, 2018

Column Stats

Filed under: Histograms,Oracle,Statistics — Jonathan Lewis @ 2:22 pm BST Jan 18,2018

I’ve made several comments in the past about the need for being selective when gathering objects statistics with particular reference to the trade-offs when creating histograms. With Oracle 12c it’s now reasonably safe (as far as I’m concerned) to set a method_opt as a table preference that identifies columns where you expect to see Frequency or (pace the buggy behaviour described in a recent post) a Top-N histograms. The biggest problem I have is that I keep forgetting the exact syntax I need – so I’ve written this note more as a reminder to myself than anything else.

Typically I might expect to use the standard 254 columns for gathering histograms, with an occasional variation to increase the bucket count; but for the purposes of this note I’m going to demonstarate with a much lower value. So here’s a table creation statement (running 12.1.0.2 – so it will gather basic stats on the create) and two variations of a call to gather stats with a specific method_opt – followed by a question:


rem
rem     Script:         method_opt.sql
rem     Author:         Jonathan Lewis
rem     Dated:          Dec 2003
rem

create table t1
as
select
        object_type o1,
        object_type o2,
        object_type o3,
        object_id,
        object_name
from
        all_objects
where
        rownum <= 50000 -- > comment to bypass wordpress format problem
;

select  column_name, num_distinct, histogram, num_buckets, to_char(last_analyzed,'hh24:mi:ss')
from    user_tab_cols where table_name = 'T1' order by column_id;

execute dbms_lock.sleep(2)

begin
        dbms_stats.gather_table_stats(
                user,
                't1',
                method_opt=>'for all columns size 1 for columns o1 o2 o3 size 15'
        );
end;
/

select  column_name, num_distinct, histogram, num_buckets, to_char(last_analyzed,'hh24:mi:ss')
from    user_tab_cols where table_name = 'T1' order by column_id;

execute dbms_lock.sleep(2)

begin
        dbms_stats.gather_table_stats(
                user,
                't1',
                method_opt=>'for all columns size 1 for columns size 15 o1 o2 o3'
        );
end;
/

select  column_name, num_distinct, histogram, num_buckets, to_char(last_analyzed,'hh24:mi:ss')
from    user_tab_cols where table_name = 'T1';


The big question is this: which columns will have histograms after each of the gather_table_stats() calls:

method_opt=>'for all columns size 1 for columns o1 o2 o3 size 15'
method_opt=>'for all columns size 1 for columns size 15 o1 o2 o3'

The problem I have is simple – to me both options look as if they will create histograms on all three named columns but the first option is the one that I type in “intuitively” if I don’t stop to think about it carefully. The first option, alas, will only gather a histogram on column o3 – the second option is the one that creates three histograms.

The manuals are a little unclear and ambiguous about how to construct a slightly complicated method_opt; there’s a fragment of text with the usual mix of square brackets, italics and ellipses to indicate optional and repeated clauses (interestingly the only clue about multiple columns is that comma separation seems to be required – despite one of the examples above working withough commas) but there’s no explanation of when a “size” clause should go before a “column” column and when it should go after.

So here are a few more method_opt clauses – can you work out in advance which columns would have histograms if you used them and how many buckets in each histogram; there are a couple that may surprise you:


for columns o1 size 12, o2 size 13, o3 size 14

for columns o1 size 15 o2 size 16 o3 size 17

for columns size 18 o1 size 19 o2 size 20 o3

for columns size 21 o1 o2 size 22 o3

for columns o1 size 12, o2 size 12, o3 size 13, object_id size 13 object_name size 14

for columns size 22 o1 o2 for columns size 23 o3 object_id for columns size 24  object_name

Bottom line – to me – is to check very carefully that the method_opt is going to do what I want it to do; and for production systems I tend to use the final form that repeats the “for columns {size clause} {column list}”.

January 15, 2018

Histogram Hassle

Filed under: Histograms,Oracle,Statistics,Troubleshooting — Jonathan Lewis @ 1:01 pm BST Jan 15,2018

I came across a simple performance problem recently that ended up highlighting a problem with the 12c hybrid histogram algorithm. It was a problem that I had mentioned in passing a few years ago, but only in the context of Top-N histograms and without paying attention to the consequences. In fact I should have noticed the same threat in a recent article by Maria Colgan that mentioned the problems introduced in 12c by the option “for all columns size repeat”.

So here’s the context (note – all numbers used in this example are approximations to make the arithmetic obvious).  The client had a query with a predicate like the follwing:

    t4.columnA = :b1
and t6.columnB = :b2

The optimizer was choosing to drive the query through an indexed access path into t6, which returned ca. 1,000,000 rows before joining (two tables later) to t4 at which point all but a couple of rows remained – typical execution time was in the order of tens of minutes. A hint to start the join order with t4 /*+ leading(t4) */, meant that Oracle used an index to acquire two starting rows and reduced the response time to the classic “sub-second”.

The problem had arisen because the optimizer had estimated a cardinality of 2 rows for the index on t6 and the reason for this was that, on average, that was the correct number. There were 2,000,000 rows in the table with 1,000,000 distinct values. It was just very unlucky that one of the values appeared 1,000,000 times and that was the value the users always wanted to query – and there was no histogram on the column to tell the optimizer that there was a massive skew in the data distribution.

Problem solved – all I had to do was set a table preference for this table to add a histogram to this column and gather stats. Since there were so many distinct values and so much “non-popular” data in the table the optimizer should end up with a hybrid histogram that would highlight this value. I left instructions for the required test and waited for the email telling me that my suggestion was brilliant and the results were fantastic … I got an email telling me it hadn’t worked.

Here’s a model of the situation – I’ve created a table with 2 million rows and a column where every other row contains the same value but otherwise contains the rownum. Because the client code was using a varchar2() column I’ve done the same here, converting the numbers to character strings left-padded with zeros. There are a few rows (about 20) where the column value is higher than the very popular value.


rem
rem     Script:         histogram_problem_12c.sql
rem     Author:         Jonathan Lewis
rem     Dated:          Jan 2018
rem
rem     Last tested
rem             12.2.0.1
rem             12.1.0.2
rem

create table t1
segment creation immediate
nologging
as
with generator as (
        select
                rownum id
        from dual
        connect by
                level <= 2e4
)
select
        rownum  as id,
        case
                when mod(rownum,2) = 0
                        then '999960'
                        else lpad(rownum,6,'0')
        end     as bad_col
from
        generator       v1,
        generator       v2
where
        rownum <= 2e6    -- typing error, ends up with 2 rows per non-popular value.
;

Having created the data I’m going to create a histogram on the bad_col – specifying 254 columns – then query user_tab_histograms for the resulting histogram (from which I’ll delete a huge chunk of boring rows in the middle):


begin

        dbms_stats.gather_table_stats(
                ownname         => 'TEST_USER',
                tabname         => 'T1',
                method_opt      => 'for columns bad_col size 254'
        );

end;
/

select
        column_name, histogram, sample_size
from
        user_tab_columns
where
        table_name = 'T1'
;

column end_av format a12

select
        endpoint_number         end_pt,
        to_char(endpoint_value,'xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx') end_val,
        endpoint_actual_value   end_av,
        endpoint_repeat_count   end_rpt
from
        user_tab_histograms
where
        table_name = 'T1'
and     column_name = 'BAD_COL'
order by
        endpoint_number
;

COLUMN_NAME          HISTOGRAM             Sample
-------------------- --------------- ------------
BAD_COL              HYBRID                 5,513
ID                   NONE               2,000,000

    END_PT END_VAL                         END_AV          END_RPT
---------- ------------------------------- ------------ ----------
         1  303030303031001f0fe211e0800000 000001                1
        12  3030383938311550648a5e3d200000 008981                1
        23  303135323034f8f5cbccd2b4a00000 015205                1
        33  3032333035311c91ae91eb54000000 023051                1
        44  303239373236f60586ef3a0ae00000 029727                1
...
      2685  3938343731391ba0f38234fde00000 984719                1
      2695  39393235303309023378c0a1400000 992503                1
      2704  3939373537370c2db4ae83e2000000 997577                1
      5513  393939393938f86f9b35437a800000 999999                1

254 rows selected.

So we have a hybrid histogram, we’ve sampled 5,513 rows to build the histogram, we have 254 buckets in the histogram report, and the final row in the histogram is end point 5513 (matching the sample size). The first row of the histogram shows us the (real) low value in the column and the last row of the histogram reports the (real) high value. But there’s something very odd about the histogram – we know that ‘999960’ is the one popular value, occurring 50% of the time in the data, but it doesn’t appear in the histogram at all.

Looking more closely we see that every bucket covers a range of about 11 (sometimes 9 or 10) rows from the sample, and the highest value in each bucket appears just once; but the last bucket covers 2,809 rows from the sample with the highest value in the bucket appearing just once. We expect a hybrid histogram to have buckets which (at least initially) are all roughly the same size – i.e. “sample size”/”number of buckets” – with some buckets being larger by something like the amount that appears in their repeat count, so it doesn’t seem right that we have an enormous bucket with a repeat count of just 1. Something is broken.

The problem is that the sample didn’t find the low and high values for the column – although the initial full tablescan did, of course – so Oracle has “injected” the low and high values into the histogram fiddling with the contents of the first and last buckets. At the bottom end of the histogram this hasn’t really caused any problems (in our case), but at the top end it has taken the big bucket for our very popular ‘999960’ and apparently simply replaced the value with the high value of ‘999999’ and a repeat count of 1.

As an indication of the truth of this claim, here are the last few rows of the histogram if I repeat the experiment but, before gathering the histogram, delete the rows where bad_col is greater than ‘999960’. (Oracle’s sample is random, of course, and has changed slightly for this run.)

    END_PT END_VAL                         END_AV          END_RPT
---------- ------------------------------- ------------ ----------
...
      2641  3938373731371650183cf7a0a00000 987717                1
      2652  3939353032310e65c1acf984a00000 995021                1
      2661  393938393433125319cc9f5ba00000 998943                1
      5426  393939393630078c23b063cf600000 999960             2764

Similarly, if I inserted a few hundred rows with a higher value than my popular value (in this case I thought 500 rows would be a fairly safe bet as the sample was about one in 360 rows) I got a histogram which started with a bucket about the popular bucket, so the problem of that bucket being hacked to the high value was less significant:


    END_PT END_VAL                         END_AV          END_RPT
---------- ------------------------------- ------------ ----------
...
      2718  393736313130fe68d8cfd6e4000000 976111                1
      2729  393836373630ebfe9c2b7b94c00000 986761                1
      2740  39393330323515efa3c99771600000 993025                1
      5495  393939393630078c23b063cf600000 999960             2747
      5497  393939393938f86f9b35437a800000 999999                1

Bottom line, then: if you have an important popular value in a column and there aren’t very many rows with a higher value, you may find that Oracle loses sight of the popular value as it fudges the column’s high value into the final bucket.

Workaround

I did consider writing a bit of PL/SQL for the client to fake a realistic frequency histogram, but decided that that wouldn’t be particularly friendly to future DBAs who might have to cope with changes. Luckily the site doesn’t gather stats using the automatic scheduler job and only rarely updates stats anyway, so I suggested we create a histogram on the column using an estimate_percent of 100. This took about 8 minutes to run – for reasons that I will go into in a moment – after which I suggested we lock stats on the table and document the fact that when stats are collected on this table it’s got to be a two-pass job – the normal gather with its auto_sample_size to start with, then a 100% sample for this column to gather the histogram:


begin
        dbms_stats.gather_table_stats(
                user,
                't1',
                method_opt       => 'for columns bad_col size 254',
                estimate_percent => 100,
                cascade          => false
        );
end;
/

    END_PT END_VAL                         END_AV          END_RPT
---------- ------------------------------- ------------ ----------
...
       125  39363839393911e01d15b75c600000 968999                0
       126  393834373530e98510b6f19a000000 984751                0
       253  393939393630078c23b063cf600000 999960                0
       254  393939393938f86f9b35437a800000 999999                0

129 rows selected.

This took a lot longer, of course, and produced an old-style height-balanced histogram. Part of the time came from the increased volume of data that had to be processed, part of it came from a suprise (which also appeared, in a different guise, in the code that created the original hybrid histogram).

I had specifically chosen the method_opt to gather for nothing but the single column. In fact whether I forced the “legacy” (height-balanced) code or the modern (hybrid) code, I got a full tablescan that did some processing of EVERY column in the table and then threw most of the results away. Here are fragments of the SQL – old version first:


select /*+
            no_parallel(t) no_parallel_index(t) dbms_stats
            cursor_sharing_exact use_weak_name_resl dynamic_sampling(0) no_monitoring
            xmlindex_sel_idx_tbl no_substrb_pad
       */
       count(*),
       count("ID"), sum(sys_op_opnsize("ID")),
       count("BAD_COL"), sum(sys_op_opnsize("BAD_COL"))
       ...
from
       "TEST_USER"."T1" t

select /*+
           full(t)    no_parallel(t) no_parallel_index(t) dbms_stats
           cursor_sharing_exact use_weak_name_resl dynamic_sampling(0) no_monitoring
           xmlindex_sel_idx_tbl no_substrb_pad
       */
       to_char(count("ID")),
       to_char(count("BAD_COL")),
       substrb(dump(min("BAD_COL"),16,0,64),1,240),
       substrb(dump(max("BAD_COL"),16,0,64),1,240),
       ...
       count(rowidtochar(rowid))
from
       "TEST_USER"."T1" t  /* ACL,TOPN,NIL,NIL,RWID,U,U254U*/

The new code only used the substrb() functions on the bad_col, but all other columns in the table were subject to the to_char(count()).
The old code applied the count() and sys_op_opnsize() to every column in the table.

This initial scan was a bit expensive – and disappointing – for the client since their table had 290 columns (which means intra-block chaining as a minimum) and had been updated so much that 45% of the rows in the table had to be “continued fetches”. I can’t think why every column had to be processed like this, but if they hadn’t been that would have saved a lot of CPU and I/O since the client’s critical column was very near the start of the table.

Finally

This problem with the popular value going missing is a known issue, for which there is a bug number, but there is further work going on in the same area which means this particular detail is being rolled into another bug fix. More news when it becomes available.

Bear in mind that this problem also appears for Top-N (aka Top-Frequency) histograms – where both the lowest and highest buckets may be replaced with a bucket that reports the low-value and high-value for the column with a repeat-count of 1.

Update (Jan 2018)

This is now fixed under bug number “25994960: CARDINALITY MISESTIMATE FROM HYBRID HISTOGRAM” with a patch (of the same number) for 12.1.0.2

June 1, 2017

Histogram Upgrade – 2

Filed under: 12c,Histograms,Oracle,Statistics — Jonathan Lewis @ 6:00 pm BST Jun 1,2017

While reading a blog post by Maria Colgan a couple of weeks ago I came across an observation about histograms that I’d not noticed before; worse still, it was a feature that seemed to make some “damage-limitation” advice I’d been giving for years a really bad idea! The threat appeared in these paragraphs:

Setting SIZE REPEAT ensures a histogram will only be created for any column that already has one. If the table is a partitioned table, repeat ensures a histogram will be created for a column that already has one on the global level.

What’s the down side to doing this?

The current number of buckets used in each histogram becomes the limit on the maximum number of buckets used for any histogram created in the future.

Unfortunately I’ve been saying for a very long time that you have to be very careful with histograms and should probably create them through PL/SQL code but if you have some frequency histograms that you’re sure are going to be well-behaved then using “for all columns size repeat” to gather the histogram is probably okay. But, after making the claim above, Maria’s blog posting demonstrated the truth of the claim in a demonstration that showed the highly undesirable consequences.

So imagine this: you create a frequency histogram which happens to produce 26 buckets on a particular column; from then on every time you run the gather with size repeat Oracle tries to generate 26 buckets. One day the data looks a little different, temporarily there are only 25 distinct values so on the next gather you get just 25 buckets – which means that when the “missing” value re-appears 12c will give you a Top-N histogram or even a hybrid histogram (11g would have to give you a height-balanced histogram if it noticed all 26 values when 25 buckets had been requested). It is not safe to use size repeat if the number of distinct values that actually exist can vary from day to day.

I have to say that I was fairly shocked that I’d not come across this threat before – so obviously I created a simple model to check how nasty things could get. I had a copy of 11.2.0.4 handy and created a couple of tables cloning the data from all_objects because that’s got a couple of columns that are good for producing frequency histograms.


rem     Script:         histogram_repeat.sql
rem     Author:         Jonathan Lewis
rem     Dated:          June 2017

drop table t2;
drop table t1;

create table t1 as select * from all_objects;
create table t2 as select * from t1;

delete from t1 where object_type = 'EDITION';
delete from t1 where object_type = 'EVALUATION CONTEXT';
commit;

pause ================  Baseline =======================================

select  count(distinct object_type), count(distinct owner) from t1;

execute dbms_stats.gather_table_stats(user,'t1',method_opt =>'for columns object_type owner')

select  column_name, count(*)
from    user_tab_histograms
where   table_name = 'T1'
and     column_name in ( 'OBJECT_TYPE','OWNER')
group by column_name
order by column_name
;

select  column_name, num_buckets, histogram
from    user_tab_columns
where   table_name = 'T1'
and     column_name in ( 'OBJECT_TYPE','OWNER')
order by column_name
;

insert into t1 select * from t2 where object_type = 'EDITION';
insert into t1 select * from t2 where object_type = 'EVALUATION CONTEXT';
commit;

After creating the data I’ve deleted a few rows from t1, reported the number of distinct values in t1 for owner and object_type, then gathered stats on just those two columns using the default size. I’ve then reported the number of histogram buckets in two ways, by counting them in user_tab_histograms and by reporting them directly (with histogram type) from user_tab_columns. Then I’ve finished off by re-inserting (copying from t2) the rows I previously deleted, giving me a couple more object_type values in the table. Here are the results of the queries:


================  Baseline =======================================

COUNT(DISTINCTOBJECT_TYPE) COUNT(DISTINCTOWNER)
-------------------------- --------------------
                        23                   11

COLUMN_NAME            COUNT(*)
-------------------- ----------
OBJECT_TYPE                  17
OWNER                         7

COLUMN_NAME             Buckets HISTOGRAM
-------------------- ---------- ---------------
OBJECT_TYPE                  17 FREQUENCY
OWNER                         7 FREQUENCY

I’m running on 11.2.0.4 – and I have two frequency histograms that have missed a few of the distinct values. But that’s because on the default settings 11g uses sampling (typically about 5,500 rows for smaller data sets) when creating histograms. So re-running the gather with size repeat shouldn’t allow the number of buckets to grow. Here’s what I got when I re-ran the gather (with size repeat) and two queries a further three times


method_opt =>'for columns object_type size repeat owner size repeat'

================  Repeat 1 =======================================

COLUMN_NAME            COUNT(*)
-------------------- ----------
OBJECT_TYPE                  16
OWNER                         9

COLUMN_NAME             Buckets HISTOGRAM
-------------------- ---------- ---------------
OBJECT_TYPE                  16 FREQUENCY
OWNER                         9 FREQUENCY
================  Repeat 2 =======================================

COLUMN_NAME            COUNT(*)
-------------------- ----------
OBJECT_TYPE                  18
OWNER                         8

COLUMN_NAME             Buckets HISTOGRAM
-------------------- ---------- ---------------
OBJECT_TYPE                  18 FREQUENCY
OWNER                         8 FREQUENCY
================  Repeat 3 =======================================

COLUMN_NAME            COUNT(*)
-------------------- ----------
OBJECT_TYPE                  13
OWNER                         9

COLUMN_NAME             Buckets HISTOGRAM
-------------------- ---------- ---------------
OBJECT_TYPE                  13 FREQUENCY
OWNER                         9 FREQUENCY

On the first repeat I got even fewer buckets; but on the second repeat the number of buckets bounced back up and even exceeded the original count; then on the third repeat the number of buckets dropped significantly. If you run the test your results will probably vary, but that’s the effect of the random selection of rows used to generate the histogram. Key point, though, is that in 11g the number of buckets generated by the gather is not limited by the current number of buckets.

But…

What happens with 12.1.0.2 – here are the results. Remember I deleted two sets of object_type before I gathered the first set of stats, then put them back in before doing the repeat gathers. (The number of distinct object_types in 12c is more than I had in 11g).


================  Baseline =======================================

COUNT(DISTINCTOBJECT_TYPE) COUNT(DISTINCTOWNER)
-------------------------- --------------------
                        27                   25

COLUMN_NAME            COUNT(*)
-------------------- ----------
OBJECT_TYPE                  27
OWNER                        25

COLUMN_NAME          NUM_BUCKETS HISTOGRAM
-------------------- ----------- ---------------
OBJECT_TYPE                   27 FREQUENCY
OWNER                         25 FREQUENCY

================  Repeat 1 =======================================

COLUMN_NAME            COUNT(*)
-------------------- ----------
OBJECT_TYPE                  27
OWNER                        25

COLUMN_NAME          NUM_BUCKETS HISTOGRAM
-------------------- ----------- ---------------
OBJECT_TYPE                   27 TOP-FREQUENCY
OWNER                         25 FREQUENCY

================  Repeat 2 =======================================

COLUMN_NAME            COUNT(*)
-------------------- ----------
OBJECT_TYPE                  27
OWNER                        25

COLUMN_NAME          NUM_BUCKETS HISTOGRAM
-------------------- ----------- ---------------
OBJECT_TYPE                   27 TOP-FREQUENCY
OWNER                         25 FREQUENCY

================  Repeat 3 =======================================

COLUMN_NAME            COUNT(*)
-------------------- ----------
OBJECT_TYPE                  27
OWNER                        25

COLUMN_NAME          NUM_BUCKETS HISTOGRAM
-------------------- ----------- ---------------
OBJECT_TYPE                   27 TOP-FREQUENCY
OWNER                         25 FREQUENCY

The number of distinct values for object_type is initially 27, but after gathering stats the first time I added back two more object_type values; but the subsequent gathers stuck with 27 buckets rather than extending to 29 buckets – so the histogram changed from frequency to Top-N. If you check Maria’s blog again you’ll see that this can make a big difference, particularly if the two new values happen to be the lowest and highest values for the column.

The number of buckets on a REPEAT is limited to the number of existing buckets in 12c. That, to my mind, is a major change in behaviour and one you’ll have to watch out for on the upgrade. In 11g if the number of actual values stored dropped briefly the situation was self-correcting; if some new values were introduced the situation was self-correcting – although in both cases the histogram isn’t necessarily telling the truth the way you’d like it because of the small sample sizes used. In 12c the situation doesn’t self-correct. and may introduce a massive change in the arithmetic (as shown in Maria’s example).

The big difference, of course, is that 12c is gathering on a 100% sample using the variation of the approximate_ndv mechanism – so it will always find the right number of values if a frequency histogram is appropriate: presumably this is what was supposed to make it okay to reproduce the number of buckets previously used. In 11g with its small sample size the number of buckets created couldn’t be guaranteed to match the number of distinct values, so I guess the code in 11g wasn’t written to be so rigorous in its assumption about the number of buckets to use next time.

tl;dr

When you upgrade from 11g to 12c think very carefully about whether or not you can still use a “table-level” size repeat to gather histograms – the upgrade may force you to identify specifically the columns that need histograms so that you can name them with an explicit (large enough) size in a gather command.

Footnote:

Don’t forget you can set a table preference for each table specifying a method_opt (though I found this could break on “complex” settings of method_opt in earlier versions); so for columns that need a frequency histogram you could fix a sufficiently large number of buckets by specifying it in the method_opt with a call to dbms_stats.set_table_prefs().

 

December 2, 2016

Histogram Upgrade

Filed under: Histograms,Oracle,Statistics — Jonathan Lewis @ 3:02 pm BST Dec 2,2016

I’ve written notes in the past about the improvements 12c introduces for histograms – particularly the frequency and top-N histograms which can be generated “free of charge” while the basic “approximate NDV” scans are taking place to gather stats. Gathering histograms in 12c is much safer than it used to be in earlier versions of Oracle even in the case of the new hybrid histograms (which are still sampled on a very small sample and therefore still a stability risk).

There is a threat, though, recently highlighted by Franck Pachot, that sneaked in at 11.2.0.4 – the way in which the calculation of endpoint values for histograms on char() and nchar() columns has changed. When you upgrade from anything prior to 11.2.0.4 to either 11.2.0.4 or 12c you need to recreate any histograms on those columns; note that this does not apply to varchar2() and nvarchar2() columns, just the fixed length character types. If you fail to do this then you may find that Oracle produces some very silly estimates of cardinality, which could result in some very inefficient tablescans – in particular you are likely to find (as Franck showed) cases where you “know” that a particular value is IN the histogram but the optimizer behaves as if it isn’t – which means it uses the “half the least popular” estimate for the cardinality.

Here’s a little demo to show the underlying difference:


rem
rem     Script:         histogram_change_11204.sql
rem     Author:         Jonathan Lewis
rem     Dated:          Nov 2016
rem     Purpose:
rem

create table t1 (v1 varchar2(32), c1 char(32));
insert into t1
select
        case when rownum <= 100 then 'N' else 'Y' end,
        case when rownum <= 100 then 'N' else 'Y' end
from
        all_objects
where
        rownum <= 1000    -- > "gt symbol to bypass WordPress formatting issue"
;

begin
        dbms_stats.gather_table_stats( ownname => user,
                tabname          => 'T1',
                method_opt       => 'for all columns size 10'
        );
end;
/

column endpoint_value format 999,999,999,999,999,999,999,999,999,999,999,999
break on column_name skip 1

select
        column_name, endpoint_number, endpoint_value, to_char(endpoint_value,'xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx')
from
        user_tab_histograms
where
        table_name = 'T1'
order by
        column_name,
        endpoint_number
;

Here are the results from an instance of 11.1.0.7 (though anything up to 11.2.0.3 should produce the same), and 11.2.0.4 (and later – including 12.2):

Results 11.1.0.7
----------------
COLUMN_NAME          ENDPOINT_NUMBER                                   ENDPOINT_VALUE TO_CHAR(ENDPOINT_VALUE,'XXXXXXXXX
-------------------- --------------- ------------------------------------------------ ---------------------------------
C1                               100  405,650,737,316,592,000,000,000,000,000,000,000    4e20202020203a7bb119d5f6000000
                                1000  462,766,002,760,475,000,000,000,000,000,000,000    59202020202034d998ff0b5ae00000

V1                               100  404,999,154,965,717,000,000,000,000,000,000,000    4e0000000000181f436c7bbb200000
                                1000  462,114,420,409,600,000,000,000,000,000,000,000    590000000000127d2b51b120000000

Results 11.2.0.4
----------------
COLUMN_NAME          ENDPOINT_NUMBER                                   ENDPOINT_VALUE TO_CHAR(ENDPOINT_VALUE,'XXXXXXXXX
-------------------- --------------- ------------------------------------------------ ---------------------------------
C1                               100  404,999,154,965,717,000,000,000,000,000,000,000    4e0000000000181f436c7bbb200000
                                1000  462,114,420,409,600,000,000,000,000,000,000,000    590000000000127d2b51b120000000

V1                               100  404,999,154,965,717,000,000,000,000,000,000,000    4e0000000000181f436c7bbb200000
                                1000  462,114,420,409,600,000,000,000,000,000,000,000    590000000000127d2b51b120000000

Look particularly at the first 6 bytes of the Hex version of the endpoint values for the char() column c1. In 11.1.0.7 you see “4e2020202020”, “592020202020” – that’s ASCII ‘N’ and ‘Y’ respectively, padded to 6 characters with spaces. In 11.2.0.4 the spaces have disappeared – the char() columns are now padded to 6 characters with zeros (which is how varchar2() columns have always been treated).

In 11.1.0.7 the optimizer will find a histogram entry for c1 = ‘Y’ and produce a cardinality of 900; if you upgrade the database to 11.2.0.4 without recreating the histograms the optimizer won’t find a histogram entry for the predicate and will produce a cardinality of 50 (i.e. 100 / 2).

Footnote

There’s a brief summary of the algorithm Oracle uses to generate values for character-based histograms at this URL.

September 26, 2016

PK Histogram

Filed under: Histograms,Oracle,Statistics — Jonathan Lewis @ 10:25 am BST Sep 26,2016

One of the little myths of Oracle appeared on the Oracle-L list server a few days ago – the one that says: “you don’t need a histogram on a single column unique/primary key”.

Not only can a histogram be helpful on a column that’s declared to hold unique values, the optimizer may even spot the need automatically. It’s a little unusual (and probably the result of a bad design or of poor programming practices) but it does happen. Here’s an example demonstrating the principle:


rem
rem     Script:         pk_histogram.sql
rem     Author:         Jonathan Lewis
rem
rem     Last tested
rem             12.1.0.2
rem             11.2.0.4
rem             11.1.0.7
rem             10.2.0.5
rem

create table t1
as
with generator as (
        select  --+ materialize
                rownum id
        from dual
        connect by
                level <= 1e4
)
select
        rownum  id
from
        generator       v1,
        generator       v2
where
        rownum <= 1e4
;

insert into t1 select id + 1e6 from t1;
insert into t1 select id + 1e7 from t1;

alter table t1 add constraint t1_pk primary key(id);

select
        /*+ dynamic_sampling(0) */
        *
from    t1
where
        id between 12000 and 13000
;

begin
        dbms_stats.gather_table_stats(
                ownname          => user,
                tabname          =>'T1'
        );
end;
/


select
        column_name, sample_size,
        num_distinct, num_nulls, density,
        histogram, num_buckets
from   
        user_tab_cols
where
        table_name = 'T1'
order by
        column_name
;

I’ve created a small data with set some large gaps in the ranges of values used then queried the data with a range-based predicate that will return no rows. After that I’ve let Oracle do its own thing with gathering stats (I think all the dbms_stats preferences are at their defaults on this database). This is the result I got from the final query:

COLUMN_NAME          SAMPLE_SIZE NUM_DISTINCT  NUM_NULLS    DENSITY HISTOGRAM       NUM_BUCKETS
-------------------- ----------- ------------ ---------- ---------- --------------- -----------
ID                          5555        40000          0 .000024752 HEIGHT BALANCED         254

Oracle will gather a histogram automatically if it can see that the distribution of the data you are querying is skewed; but people tend to think of “skewed” as meaning there is wide variation in the number of rows for a given value and forget that it can also mean a variation in the number of rows within a given size of range. All I’ve done with my example is cue Oracle to the fact that I have big gaps in the range of known values – so it has produced a histogram that let’s it know where the gaps are.

Note: The results above are from 11.2.0.4; in 12.1.0.2 I got a HYBRID histogram; on 10.2.0.5 the sample size was 40,000

April 27, 2016

Stats History

Filed under: Histograms,Infrastructure,Oracle,Statistics — Jonathan Lewis @ 1:09 pm BST Apr 27,2016

From time to time we see a complaint on OTN about the stats history tables being the largest objects in the SYSAUX tablespace and growing very quickly, with requests about how to work around the (perceived) threat. The quick answer is – if you need to save space then stop holding on to the history for so long, and then clean up the mess left by the history that you have captured; on top of that you could stop gathering so many histograms because you probably don’t need them, they often introduce instability to your execution plans, and they are often the largest single component of the history (unless you are using incremental stats on partitioned objects***)

For many databases it’s the histogram history – using the default Oracle automatic stats collection job – that takes the most space, here’s a sample query that the sys user can run to get some idea of how significant this history can be:


SQL> select table_name , blocks from user_tables where table_name like 'WRI$_OPTSTAT%HISTORY' order by blocks;

TABLE_NAME                           BLOCKS
-------------------------------- ----------
WRI$_OPTSTAT_AUX_HISTORY                 80
WRI$_OPTSTAT_TAB_HISTORY                244
WRI$_OPTSTAT_IND_HISTORY                622
WRI$_OPTSTAT_HISTHEAD_HISTORY          1378
WRI$_OPTSTAT_HISTGRM_HISTORY           2764

5 rows selected.

As you can see the “histhead” and “histgrm” tables (histogram header and histogram detail) are the largest stats history tables in this (admittedly very small) database.

Oracle gives us a couple of calls in the dbms_stats package to check and change the history setting, demonstrated as follows:


SQL> select dbms_stats.get_stats_history_retention from dual;

GET_STATS_HISTORY_RETENTION
---------------------------
                         31

1 row selected.

SQL> execute dbms_stats.alter_stats_history_retention(7)

PL/SQL procedure successfully completed.

SQL> select dbms_stats.get_stats_history_retention from dual;

GET_STATS_HISTORY_RETENTION
---------------------------
                          7

1 row selected.

Changing the retention period doesn’t reclaim any space, of course – it simply tells Oracle how much of the existing history to eliminate in the next “clean-up” cycle. This clean-up is controllled by a “savtime” column in each table:

SQL> select table_name from user_tab_columns where column_name = 'SAVTIME' and table_name like 'WRI$_OPTSTAT%HISTORY';

TABLE_NAME
--------------------------------
WRI$_OPTSTAT_AUX_HISTORY
WRI$_OPTSTAT_HISTGRM_HISTORY
WRI$_OPTSTAT_HISTHEAD_HISTORY
WRI$_OPTSTAT_IND_HISTORY
WRI$_OPTSTAT_TAB_HISTORY

5 rows selected.

If all you wanted to do was stop the tables from growing further you’ve probably done all you need to do. From this point onwards the automatic Oracle job will start deleting the oldest saved stats and re-using space in the existing table. But you may want to be a little more aggressive about tidying things up, and Oracle gives you a procedure to do this – and it might be sensible to use this procedure anyway at a time of your own choosing:


SQL> execute dbms_stats.purge_stats(sysdate - 7);

Basically this issues a series of delete statements (including a delete on the “stats operation log (wri$_optstat_opr)” table that I haven’t previously mentioned) – here’s an extract from an 11g trace file of a call to this procedure (output from a simple grep command):


delete /*+ dynamic_sampling(4) */ from sys.wri$_optstat_tab_history          where savtime < :1 and rownum <= NVL(:2, rownum)
delete /*+ dynamic_sampling(4) */ from sys.wri$_optstat_ind_history h        where savtime < :1 and rownum <= NVL(:2, rownum)
delete /*+ dynamic_sampling(4) */ from sys.wri$_optstat_aux_history          where savtime < :1 and rownum <= NVL(:2, rownum)
delete /*+ dynamic_sampling(4) */ from sys.wri$_optstat_opr                  where start_time < :1 and rownum <= NVL(:2, rownum)
delete /*+ dynamic_sampling(4) */ from sys.wri$_optstat_histhead_history     where savtime < :1 and rownum <= NVL(:2, rownum)
delete /*+ dynamic_sampling(4) */ from sys.wri$_optstat_histgrm_history      where savtime < :1 and rownum <= NVL(:2, rownum)

Two points to consider here: although the appearance of the rownum clause suggests that there’s a damage limitation strategy built into the code I only saw one commit after the entire delete cycle, and I never saw a limiting bind value being supplied. If you’ve got a large database with very large history tables you might want to delete one day (or even just a few hours) at a time. The potential for a very long, slow, delete is also why you might want to do a manual purge at a time of your choosing rather than letting Oracle do the whole thing on auto-pilot during some overnight operation.

Secondly, even though you may have deleted a lot of data from these table you still haven’t reclaimed the space – so if you’re trying to find space in the sysaux tablespace you’re going to have to rebuild the tables and their indexes. Unfortunately a quick check of v$sysaux_occupants tells us that there is no official “move” producedure:


SQL> execute print_table('select occupant_desc, move_procedure, move_procedure_desc from v$sysaux_occupants where occupant_name = ''SM/OPTSTAT''')

OCCUPANT_DESC                 : Server Manageability - Optimizer Statistics History
MOVE_PROCEDURE                :
MOVE_PROCEDURE_DESC           : *** MOVE PROCEDURE NOT APPLICABLE ***

So we have to run a series of explicit calls to alter table move and alter index rebuild. (Preferably not when anyone is trying to gather stats on an object). Coding that up is left as an exercise to the reader, but it may be best to move the tables in the order of smallest table first, rebuilding indexes as you go.

Footnote:

*** Incremental stats on partitioned objects: I tend to assume that sites which use partitioning are creating very large databases and have probably paid a lot more attention to the details of how to use statistics effectively and successfully; that’s why this note is aimed at sites which don’t use partitioning and therefore think that the space taken up by the stats history significant.

October 23, 2015

Histogram Limit

Filed under: Bugs,Histograms,Oracle,Statistics — Jonathan Lewis @ 8:03 pm BST Oct 23,2015

A surprising question came up on OTN a couple of days ago:

Why does a query for “column = 999999999999999999” run slower than a query for “column > 999999999999999998” (that’s 18 digit numbers, if you don’t want to count them). With the equality predicate the query is very slow, with the range-based predicate perfomance is good.

In the absence of further information there are various reasons why this is possible – but the example in question was about a “versioning” table where the single very large value was used as the “not yet ended” value for the history of an id so, at a minimum, the table held columns (id, nstart, nend, other), and each id could appear many times with pairs of start and end values that supplied non-overlapping, covering ranges and one row that had the very large number as the end value.

Let’s jot down a few ideas about what the data (and stats) might look like.

Assuming every id appears “a few” time and every id has to have a “still valid” row this means that a very large fraction (say 10% to 25%, if “a few” means 4 to 9) of the rows hold the value 999999999999999999.

If you gather stats without a histogram then you should get the low and high, finding that the high is 999999999999999999 and that the range is enormous, and so the predicates “column = {high value}” and “column > {high value} – 1” should give very similar cardinalities.

If you collect stats with a histogram you should find the very popular high value even in a very small sample set (which is what happens with histogram collection in 11g, and even in 12c for hybrid histograms). In this case the histogram should spot the significance of the high value and again the two predicates should have very similar cardinalities.

At first sight there doesn’t seem to be a feasible way that the two cardinalities could be sufficiently different to cause a problem – so maybe there’s something about character conversion or maybe bind variable usage that hasn’t been mentioned. So to test a couple of the less likely ideas I built a data set using 11.2.0.4 – and found a bug:


rem     Script:         histogram_bug.sql
rem     Author:         Jonathan Lewis
rem     Dated:          Sep 2015

create table t1 (
        id              number(18),
        nend            number(18),
        n1              number(18),
        small_vc        varchar2(10)
)
nologging
;

insert /*+ append */ into t1
with generator as (
        select  --+ materialize
                rownum id
        from dual
        connect by
                level <= 1e4
)
select
        rownum -1               id,
        case mod(rownum - 1,4)
                when 0  then 999999999999999999
--              when 0  then 999999999999999
                        else mod(rownum - 1, 250000)
        end                     nend,
        rownum - 1              n1,
        rpad('x',10,'x')        small_vc
from
        generator       v1,
        generator       v2
where
        rownum <= 1e6 -- > "gt" bypasses WordPress formatting issue.
;

commit; 

select * from t1 where nend = 0; 

begin 
        dbms_stats.gather_table_stats( 
                ownname          => user,
                tabname          =>'T1',
                method_opt       => 'for all columns size auto'
        );
end;
/

I have an nend column that is set to 999999999999999999 every 4th row in the table and otherwise has 4 rows per value for 187,500 other (relatively low) values. It’s probably a reasonable initial model of the original data. I’ve run a query with a predicate referencing nend before gathering stats so that the (default) auto option will build a histogram for nend. Then I’ve checked the execution plans for two critical queries:


explain plan for
select  *
from    t1
where   nend = 999999999999999999
;

select * from table(dbms_xplan.display);

explain plan for
select  *
from    t1
where   nend > 999999999999999998
;

select * from table(dbms_xplan.display);

And here’s the surprise – the two plans, in order:

--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      |     4 |   108 |   625   (9)| 00:00:04 |
|*  1 |  TABLE ACCESS FULL| T1   |     4 |   108 |   625   (9)| 00:00:04 |
--------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter("NEND"=999999999999999999)

--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      |   250K|  6591K|   627   (9)| 00:00:04 |
|*  1 |  TABLE ACCESS FULL| T1   |   250K|  6591K|   627   (9)| 00:00:04 |
--------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter("NEND">999999999999999998)

Clearly this should not happen – the equality test is way off, the inequality test is correct. The obvious first guess is that something funny has happened with the statistics so let’s see what they look like – the column stats (user_tab_cols) and the histogram stats (user_tab_histograms) seem like a good starting point:


select
        sample_size, num_distinct, histogram, num_buckets,
        substr(low_value,1,26) low_value, substr(high_value,1,26) high_value
from
        user_tab_cols
where
        table_name = 'T1'
and     column_name = 'NEND'
;

column endpoint_value format 999,999,999,999,999,999,999

select
        endpoint_number, endpoint_value -- , endpoint_repeat_count
from
        user_tab_histograms
where
        table_name = 'T1'
and     column_name = 'NEND'
order by
        endpoint_number
;

And here are the results (with a couple of hundred uninteresting rows eliminated from the histogram):


      Sample     Distinct HISTOGRAM          Buckets LOW_VALUE                  HIGH_VALUE
------------ ------------ --------------- ---------- -------------------------- --------------------------
       5,541      190,176 HEIGHT BALANCED        254 C102                       C9646464646464646464

ENDPOINT_NUMBER               ENDPOINT_VALUE
--------------- ----------------------------
              0                           19
              1                        1,225
              2                        2,503
              3                        3,911
              4                        4,806
...
            188                      247,479
            189                      248,754
            190                      249,862
            254    1,000,000,000,000,000,000

Oracle will have started with a 100% sample to collect stats on all the columns, but taken a small sample to test the need for a histogram on the nend column – and that’s why the sample size of 5,541 has appeared, but that’s not relevant to the problem in hand. The big question comes from endpoint_number 254 – why is the highest value in the histogram 1e19 when we know (and the column stats show) that the highest value is actually 999999999999999999 ?!

It’s a question to which I don’t have an answer – but I do know that

  • if your high value is 15 digits long (all 9s) then the histogram shows the right high value
  • if your high value is more that 15 9s then the histogram shows the high value plus 1
  • the value collected by the query that Oracle runs is the actual value (i.e. 18 9s)
  • if you use set_column stats to set 18 9s as the high value you still get 1e19 in the histogram

Once you see the stats you can understand why the OP sees the odd performance problem. If the histogram identifies 1e19 as a (very) popular value, leaving 999999999999999 as an “average” value with only 4 rows; on the other hand the query for greater than 999999999999999998 can see that there really are 250K rows with higher values.

Footnote:

Interestingly 12c does the same with the stats – introducing the 1e19 in the histogram – but still manages, somehow, to calculate the correct cardinality in the equality case. (There is one slight difference in 12c, the histogram is a hybrid histogram, not a height-balanced histogram).

Update 24th Oct 2015:

I realised late last night that I had written about this behaviour before – though possibly just as a response on OTN or in some unpublished notes. According to some notes that I found on my laptop this is “Bug 18514507 : WRONG CARDINALITY ESTIMATES WHEN NUMERIC VALUE IS LONGER THAN 15 BYTES”. (except that the base bug that that one is linked to are probably not the same bug).

I suspect the problem is related to the way that character histograms are built based on a numeric representation of the string that takes the first few characters of the string, treats that resulting N bytes as a hex number, converts to decimal and then applies round(N, -21) to restrict the precision stored. The effect with strings is that (broadly speaking) you get fifteen digits (sometimes reduced to 14) precision – which is exactly what I seem to be seeing with numbers.

The reason that 12c can get the right answers despite storing the wrong endpoint_value is that it’s also storing to_char() of the right value as the endpoint_actual_value – possibly doing this any time it has had to round the endpoint_value (for character strings the endpoint_actual_value was only populated if two entries in the endpoint_value were the same).

September 30, 2015

Estimate_percent

Filed under: CBO,Histograms,Oracle,Statistics — Jonathan Lewis @ 5:10 pm BST Sep 30,2015

Here’s a live one from OTN – here are a couple of extracts from the problem statement:

We’re experiencing an issue where it seems that the query plan changes from day to day for a particular procedure that runs once a night.
It’s resulting in a performance variance of 10 second completion time vs 20 minutes (nothing in between).
It started occurring about 2 months ago and now it’s becoming more prevalent where the bad query plan is coming up more often.
I noticed that the query plans vary for a simple query.
We do run gather statistics every night. (DBMS_STATS.GATHER_SCHEMA_STATS (ownname=>sys_context( ‘userenv’, ‘current_schema’ ), estimate_percent => 1);)

The query and two execution plans look like this:

select count(*) from cs_bucket_member_v2 where bucket_type='P' and sec_id > 0 and order_id=0;

----------------------------------------------------------------------------------------------------
| Id  | Operation                    | Name                | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |                     |     1 |    12 |   155   (0)| 00:00:02 |
|   1 |  SORT AGGREGATE              |                     |     1 |    12 |            |          |
|*  2 |   TABLE ACCESS BY INDEX ROWID| CS_BUCKET_MEMBER_V2 |  1148 | 13776 |   155   (0)| 00:00:02 |
|*  3 |    INDEX RANGE SCAN          | CS_BUCKET_MEMBER_N1 |  1272 |       |     3   (0)| 00:00:01 |
----------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - filter("BUCKET_TYPE"='P' AND "SEC_ID">0)
   3 - access("ORDER_ID"=0)


------------------------------------------------------------------------------------------
| Id  | Operation          | Name                | Rows  | Bytes | Cost (%CPU)| Time     |
------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |                     |     1 |    12 | 11215   (2)| 00:01:41 |
|   1 |  SORT AGGREGATE    |                     |     1 |    12 |            |          |
|*  2 |   TABLE ACCESS FULL| CS_BUCKET_MEMBER_V2 |  1522K|    17M| 11215   (2)| 00:01:41 |
------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - filter("ORDER_ID"=0 AND "SEC_ID">0 AND "BUCKET_TYPE"='P')

There are a couple of bits of information that would be useful – such as the database version, the number of rows in the table, the number of distinct values in each column, and whether any of the columns have histograms – but there are a couple of reasonable guesses that we might make about the problem. Notice particularly that the number of rows estimated from the index ranges scan is 1272 and only a small volume is then eliminated by the table filter predicates on sec_id and bucket_type. This suggests that the optimizer has information that tells it that most of the rows in the table have sec_id > 0 and bucket_type = ‘P’, and you might note that that suggests that there’s a histogram on bucket_type.

Rather than stating the most obvious guesses about the problem, though, I’ll start by creating a data set and emulating the problem, starting from an empty schema on 11.2.0.4:

create table t1
nologging
as
with generator as (
        select  --+ materialize
                rownum id 
        from dual
        connect by 
                level <= 1e4
)
select
        rownum                  sec_id,
        case
                when mod(rownum,1000) = 0
                        then 'X'
                        else 'P'
        end                     bucket_type,
        case
                when rownum < 1e6 - 50000 
                        then mod(rownum-1,1e5)
                        else 1000
        end                     order_id,
        lpad(rownum,10,'0')     id_vc,
        rpad('x',100,'x')       padding
from
        generator       v1,
        generator       v2
where
        rownum <= 1e6

create index t1_i1 on t1(order_id) nologging; 

select count(*) from t1 where order_id = 1000 and bucket_type = 'P' and sec_id > 1000;

The column names in the table match those needed by the query, and the bucket_p column has a very skewed distribution that will eliminate very little data; the sec_id column is also not going to eliminate data, but it’s very evenly distributed with no large gaps so not a good candidate for a histogram in any case. The order_id has 50,000 rows out of 1,000,000 (5%) set of a single value, and most of those special rows are at the end of the table – it’s a pretty good candidate for a histogram (if Oracle spots it, and if we actually write queries to access that data).

I’ve run a query that references all three columns so that the default method_opt of “for all columns size auto” will apply to them when I gather stats. So here’s the code that gathers stats and checks the resulting execution plans, first for “auto_sample_size” then for the 1% used by the OP:


set autotrace traceonly explain

begin
        dbms_stats.gather_schema_stats(
/*              estimate_percent => 1, */
                ownname          => user
        );
end;
/

select count(*) from t1 where order_id = 1000 and bucket_type = 'P' and sec_id > 1000;

begin
        dbms_stats.gather_schema_stats(
                estimate_percent => 1,
                ownname          => user
        );
end;
/

select count(*) from t1 where order_id = 1000 and bucket_type = 'P' and sec_id > 1000;

set autotrace off

And here are the two plans – in the same order:


---------------------------------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |     1 |    12 |  2333   (4)| 00:00:12 |
|   1 |  SORT AGGREGATE    |      |     1 |    12 |            |          |
|*  2 |   TABLE ACCESS FULL| T1   | 51063 |   598K|  2333   (4)| 00:00:12 |
---------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - filter("ORDER_ID"=1000 AND "SEC_ID">1000 AND "BUCKET_TYPE"='P')


--------------------------------------------------------------------------------------
| Id  | Operation                    | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |       |     1 |    12 |    23   (0)| 00:00:01 |
|   1 |  SORT AGGREGATE              |       |     1 |    12 |            |          |
|*  2 |   TABLE ACCESS BY INDEX ROWID| T1    |    20 |   240 |    23   (0)| 00:00:01 |
|*  3 |    INDEX RANGE SCAN          | T1_I1 |    20 |       |     3   (0)| 00:00:01 |
--------------------------------------------------------------------------------------

Predicate Information (identified by operation id):  
---------------------------------------------------
   2 - filter("SEC_ID">1000 AND "BUCKET_TYPE"='P')
   3 - access("ORDER_ID"=1000)


[Update: Following on from a question in the comments, I’ve expanded this section, and wandered a little off-topic]

I don’t know why, but with a 1% sample (which really did sample 10,000 rows) the optimizer didn’t spot the need for a histogram on order_id, but with the auto_sample_size (which sampled 5,500 – yes, half as many rows) the optimizer spotted the need for the histogram. Checking the trace files the only difference visible in the sample SQL was the presence in the 1% sample of the id_vc and padding columns which were not present in the auto_sample_size trace.

According to the manuals when the method_opt is “for all columns size auto”, then

“Oracle determines the columns on which to collect histograms based on data distribution and the workload of the columns.”

There is nothing in the manuals to suggest that there is a deliberate link between the auto_sample_size and estimate_percent, and there is room for ambiguity in how we interpret this bit of text in the manual so the difference in the SQL used and the effects thereof requires (a) some hand-waving, and/or (b) lots more experimentation.  At the moment I’m prepared to go for hand-waving:

Hypothesis 1: auto_sample_size did not sample the id_vc and padding columns because the (100%) sample taken had given Oracle enough information to decide that the data distribution of those columns was not skewed enough to merit further consideration; but it sampled the three columns that had been used in a fashion that might be helped by a histogram. This sampling spotted the benefit of a histogram on order_id and bucket_type but decided that sec_id didn’t need a histogram

Hypothesis 2: the 1% sample got pretty close to the same results in its estimates of number of distinct values for id_vc and padding as the (100%) auto_sample_size, but still decided to do a sampled test for the data distribution (the manual seems to suggest that the histograms will only be considered if there has been some use of the columns in predicates, but doesn’t explicitly preclude the possibility of creating the histogram on the basis of just the data distribution). After doing the 1% sample to analyze the data for suitability of a histogram the result suggested that only the histogram on bucket_type would be beneficial.  (In fact, after the first sample Oracle took a second 1% histogram sample on just the order_id before deciding that it a histogram on order_id wasn’t needed.)

Bottom line on this: I don’t know if the auto_sample_size “accidentally” eliminated a couple of columns from histogram sampling and if a larger fixed sample size (say 50%, or even 100%) might result in Oracle eliminating a few columns from the histogram; or maybe the code path for histogram samples with auto_sample_size in place is actually a different code path. The only thing I can say is that the two sets of events that appeared from my demonstration don’t seem to be entirely self-consistent, but it would probably take most of a day doing experiments to narrow down the variation in behaviour to a few “best guess” ideas of what’s going on behind the scenes – though unwrapping the code might lead to a more accurate answer more quickly.

Moral

Histograms are tricky things – and you can only make things worse in 11g by NOT using the auto_sample_size.

Footnote

Based on previous experience – my “obvious” guess about the OP’s data was that there was a special-case value for order_id, that the rows for that value were fairly well clustered, probably towards the end of the table, and constituted a small percentage of the table, and that the rest of the data reported “a few” rows per value. That’s why I built the model you see above.

September 4, 2015

Histogram Tip

Filed under: CBO,Histograms,Oracle,Statistics — Jonathan Lewis @ 8:32 am BST Sep 4,2015

I’ve just responded to the call for items for the “IOUG Quick Tips” booklet for 2015 – so it’s probably about time to post the quick tip that I put into the 2014 issue. It’s probably nothing new to most readers of the blog, but sometimes an old thing presented in a new way offers fresh insights or better comprehension.

Histogram Tips

A histogram, created in the right way, at the right time, and supported by the correct client-side code, can be a huge benefit to the optimizer; but if you don’t create and use them wisely they can easily become a source of inconsistent performance, and the automatic statistics gathering can introduce an undesirable overhead during the overnight batch. This note explains how you can create histograms very cheaply on the few columns where they are most likely to have a beneficial effect.

set_column_stats

The dbms_stats package have many procedures and functions built into it that allow us to see (get) and manipulate (set) the stored statistics; in particular it holds two functions get_column_stats() and set_column_stats(), and we can use these procedures to create a histogram very cheaply whenever we want at very low cost. Here’s an example that could be modified to suit a character column in a table where you’ve previously collected some stats. It uses a copy of all_objects, limited to exactly 10,000 rows.


create table t1 as
select
	*
from
	all_objects
where
	rownum <= 10000      --> comment to avoid WordPress formatting issue
;

begin 
        dbms_stats.gather_table_stats( 
		ownname          => user,
		tabname		 => 'T1',
		method_opt 	 => 'for all columns size 1'
	);
end;
/


declare

	m_distcnt		number;		-- num_distinct
	m_density		number;		-- density
	m_nullcnt		number;		-- num_nulls
	m_avgclen		number;		-- avg_col_len

	srec		dbms_stats.statrec;
	c_array		dbms_stats.chararray;

begin

	dbms_stats.get_column_stats(
		ownname		=> user,
		tabname		=> 't1',
		colname		=> 'object_type', 
		distcnt		=> m_distcnt,
		density		=> m_density,
		nullcnt		=> m_nullcnt,
		srec		=> srec,
		avgclen		=> m_avgclen
	); 

	c_array		:= dbms_stats.chararray('A', 'B', 'C', 'X', 'Y');
	srec.bkvals	:= dbms_stats.numarray (  2,   2,   2, 500, 494);
--	srec.rpcnts	:= dbms_stats.numarray (  0,   0,   0,   0,   0);
	srec.epc := 5;

	dbms_stats.prepare_column_values(srec, c_array);

	m_distcnt	:= 5;
	m_density	:= 1/(5000);

	dbms_stats.set_column_stats(
		ownname		=> user,
		tabname		=> 't1',
		colname		=> 'object_type', 
		distcnt		=> m_distcnt,
		density		=> m_density,
		nullcnt		=> m_nullcnt,
		srec		=> srec,
		avgclen		=> m_avgclen
	); 

end;
/

Key features of the code: as you can see, the two calls have identical parameters which identify the table and column name (there is an optional parameter for a (sub) partition name), and most of the basic statistics about the column. The histogram (or low and high values) are accessed through a special record type, and we can populate that record type by supplying an ordered list of values, a matching list of frequencies, and a count of how many values we have supplied.

Since my code is fixing stats on a varchar2() column I’ve declared an array of type dbms_stats.chararray to hold the list of values I want to see in a frequency histogram – there are other array types for dates, raw, number, etc. I’ve then used the structure of the stats record I had declared to hold the list of frequencies (srec.bkvals – possibly a short name for “bucket values”) and the count (srec.epc“end-point count”).

The call to dbms_stats.prepare_column_stats() takes my two arrays and massages them into the correct format for storage as a histogram that I can then write into the data dictionary with the closing call to dbms_stats.set_column_stats(). Before making that call, though, I’ve also set the “num_distinct” variable to tell the optimizer that there are 5 distinct values for the column (it makes sense, but isn’t absolutely necessary, for the num_distinct to match the number of values in the array), and set the “density” to a value that I would like the optimizer to use in it calculations if someone asks for a value that is not in my list.

I’ve included (but commented out) a line that’s relevant to the new histogram mechanisms in 12c –the srec.rpcnts (“repeat counts”) array is used in “hybrid histograms”. It’s not relevant to my example where I’m trying to create a pure frequency histogram, but if I don’t set the array I get an Oracle error: “ORA-06532: Subscript outside of limit”.

Results

There’s one important point to the method that isn’t instantly visible in the code – I created my table with 10,000 rows and there will be no nulls for object_type; but if you sum the array of frequencies it comes to exactly 1,000. This apparent contradiction is not a problem – the optimizer will compare the histogram figures to the total number of non-null entries it has recorded (in other words user_tables.num_rowsuser_tab_columns.num_nulls), and scale up the histogram accordingly. This means that a query for ‘A’ should return an estimated row count of 20 (rather than 2), ‘X’ should return 5,000 (rather than 500) and ‘D’ should return 2 (10,000 rows * 1/5000, the selectivity I had set for non-existent values).

With a little editing to save space, here’s a cut-n-paste from an SQL*Plus session running against 12c:


SQL> set autotrace traceonly explain
SQL> select * from t1 where object_type = 'A';

--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      |    20 |  2100 |    22   (5)| 00:00:01 |
|*  1 |  TABLE ACCESS FULL| T1   |    20 |  2100 |    22   (5)| 00:00:01 |
--------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter("OBJECT_TYPE"='A')

SQL> select * from t1 where object_type = 'X';

--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      |  5000 |   512K|    22   (5)| 00:00:01 |
|*  1 |  TABLE ACCESS FULL| T1   |  5000 |   512K|    22   (5)| 00:00:01 |
--------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter("OBJECT_TYPE"='X')

SQL> select * from t1 where object_type = 'D';

--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      |     1 |   105 |    22   (5)| 00:00:01 |
|*  1 |  TABLE ACCESS FULL| T1   |     1 |   105 |    22   (5)| 00:00:01 |
--------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter("OBJECT_TYPE"='D')

Conclusion

It is very easy to create truly representative histograms (if you know your data) and the resources required to do so are minimal. If you see instability due to bad luck, or bad timing, gathering a histogram then you benefit enormously from writing code to construct histograms.

Footnote

In 12c the introduction of the “approximate NDV” strategy to collecting frequency histograms, and the introduction of the “Top-frequency” histogram has made automatic gathering of histograms on columns with a relatively small number of distinct values much faster and safer – but timing may still be an issue, and the resources needed to gather a safe hybrid histogram may still justify a hand-coded approach.

 

November 19, 2014

Comparisons

Filed under: Histograms,humour,Oracle,Statistics — Jonathan Lewis @ 12:47 pm BST Nov 19,2014

“You can’t compare apples with oranges.”

Oh, yes you can! The answer is 72,731,533,037,581,000,000,000,000,000,000,000.


SQL> 
SQL> create table fruit(v1 varchar2(30));
SQL> 
SQL> insert into fruit values('apples');
SQL> insert into fruit values('oranges');
SQL> commit;
SQL> 
SQL> 
SQL> begin
  2  	     dbms_stats.gather_table_stats(
  3  		     ownname	      => user,
  4  		     tabname	      =>'FRUIT',
  5  		     method_opt       => 'for all columns size 2'
  6  	     );
  7  end;
  8  /
SQL> 
SQL> select
  2  	     endpoint_number,
  3  	     endpoint_value,
  4  	     to_char(endpoint_value,'xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx') hex_value
  5  from
  6  	     user_tab_histograms
  7  where
  8  	     table_name = 'FRUIT'
  9  order by
 10  	     endpoint_number
 11  ;

ENDPOINT_NUMBER                                   ENDPOINT_VALUE HEX_VALUE
--------------- ------------------------------------------------ -------------------------------
              1  505,933,332,254,715,000,000,000,000,000,000,000  6170706c65731ad171a7dca6e00000
              2  578,664,865,292,296,000,000,000,000,000,000,000  6f72616e67658acc6c9dcaf5000000
SQL> 
SQL> 
SQL> 
SQL> select
  2  	     max(endpoint_value) - min(endpoint_value) diff
  3  from
  4  	     user_tab_histograms
  5  where
  6  	     table_name = 'FRUIT'
  7  ;

                                            DIFF
------------------------------------------------
  72,731,533,037,581,000,000,000,000,000,000,000
SQL> 
SQL> spool off


Next Page »

Powered by WordPress.com.