Oracle Scratchpad

May 4, 2014

Extended stats

Filed under: 12c,Histograms,Oracle,Statistics — Jonathan Lewis @ 1:24 pm GMT May 4,2014

Like the recent article on deleting histograms this is another draft that I rediscovered while searching for some notes I had written on a different topic – so I’ve finally finished it off and published it.

Here’s a quirky little detail of extended stats that came up in an OTN thread earlier on this week [ed: actually 8th Jan 2014]. When you create column group stats, Oracle uses an undocumented function sys_op_combined_hash() to create a hash value, and if you gather simple stats on the column (i.e. no histogram) you can get some idea of the range of values that Oracle generates through the hash function. For example:


create table t1 as
select  1 n1, 2 n2
from dual
connect by level<=5000
union all
select  2, 1
from dual
connect by level<=5000
;

select dbms_stats.create_extended_stats(user,'t1','(n1, n2)') name from dual;

begin
	dbms_stats.gather_table_stats(
		ownname		 => user,
		tabname		 =>'T1',
		method_opt 	 => 'for all columns size 1'
	);
end;
/

break on column_name skip 1

column column_name format a32
column endpoint_number heading "EP No."
column endpoint_value  heading "Value" format 999,999,999,999,999,999,999

select
	column_name, endpoint_number, endpoint_value
from
	user_tab_histograms
where
	table_name = 'T1'
order by
	column_name, endpoint_value
;

COLUMN_NAME                          EP No.                        Value
-------------------------------- ---------- ----------------------------
N1                                        0                            1
                                          1                            2

N2                                        0                            1
                                          1                            2

SYS_STUBZH0IHA7K$KEBJVXO5LOHAS            0      298,332,787,864,733,000
                                          1    8,095,781,421,167,520,000

I could have selected low_value and high_value from user_tab_cols, using utl_raw.cast_to_number() to display them in numeric format, but the view user_tab_histograms display the low and high as a two-bucket histogram if there is no actual histogram data for the column in the histogram (histgrm$) table.

We probably don’t need to worry about what the low and high values might be because taking hash values destroys any meaning that a range might have (the optimizer can’t use column group stats in range-based predicates, only in equality predicates). However, we might collect a frequency histogram (or Top-N histogram in 12c) on the column group because there might be some data skew in the sets of values that we need to tell the optimizer about – so let’s gather a histogram with 2 buckets on our sample data set and see what we get:


begin
	dbms_stats.gather_table_stats(
		ownname		 => user,
		tabname		 =>'T1',
		method_opt 	 => 'for all columns size 2'
	);
end;
/

select
	column_name, endpoint_number, endpoint_value
from
	user_tab_histograms
where
	table_name = 'T1'
order by
	column_name, endpoint_value
;

COLUMN_NAME                          EP No.                        Value
-------------------------------- ---------- ----------------------------
N1                                     5000                            1
                                      10000                            2

N2                                     5000                            1
                                      10000                            2

SYS_STUBZH0IHA7K$KEBJVXO5LOHAS         5000                1,977,102,303
                                      10000                7,894,566,276

The histogram values have changed! As one of the posters on the OTN thread points out, what Oracle has actually stored in this case is mod(sys_op_combined_hash() ,9999999999).

So far I haven’t done any exhaustive testing to work out whether the change in the strategy for storing numbers makes any difference to the typical optimizer arithmetic – but I do have at least one case (relating to “missing values” behaviour where the presence or absence of a column group histogram does make a difference to the estimated cardinality in a way that seems inconsistent with other patterns of behaviour: I doubt if it’s actually due to the change in what’s stored, and one day I may come across a client where I actually need to work out what’s going on and how best to work with the anomaly.

Bonus thought:

In 12c Oracle collects column stats automatically as it loads data into an empty table; but not if it’s got extended stats defined on it.  (This is bug no 18425876, fixed in 12.2). Here’s some code modelling a client scenario where we truncate and reload a table every day. In the first part of the demonstration I’ve loaded the table twice to show that after truncating and reloading I get new stats on the table – the first load is 10,000 rows, the second is 20,000 rows and the stats reflect this automatically. In the second part of the code, after adding a set of column group stats, truncating and loading 5,000 rows, the stats from the previous cycle are still in place. (The code is only relevant to 12c, of course)

create table t1 (n1 number, n2 number);

insert	/*+ append */
into	t1
select	object_id, data_object_id
from	all_objects
where
	rownum <= 10000
;
commit;

-- stats have appeared without a call to dbms_stats to gather them.

select blocks, num_rows from user_tables where table_name = 'T1';
select column_name, num_distinct, num_nulls, density, low_value, high_value from user_tab_cols where table_name = 'T1';

truncate table t1;

insert	/*+ append */
into	t1
select	object_id, data_object_id
from	all_objects
where
	rownum <= 20000
;
commit;

-- Stats now show the latest data 

select blocks, num_rows from user_tables where table_name = 'T1';
select column_name, num_distinct, num_nulls, density, low_value, high_value from user_tab_cols where table_name = 'T1';

-- Add a column group to the stats 

select dbms_stats.create_extended_stats(user,'t1','(n1, n2)') name from dual;

truncate table t1;

insert	/*+ append */
into	t1
select	object_id, data_object_id
from	all_objects
where
	rownum <= 5000
;
commit;

-- The stats have not been updated to reflect the new data, and the column group stats are empty

select blocks, num_rows from user_tables where table_name = 'T1';
select column_name, num_distinct, num_nulls, density, low_value, high_value from user_tab_cols where table_name = 'T1';

--------------------------------------------------------
-- Here are the three consecutive sets of results
--------------------------------------------------------

    BLOCKS   NUM_ROWS
---------- ----------
        16      10000

COLUMN_NAME                      NUM_DISTINCT  NUM_NULLS    DENSITY LOW_VALUE                  HIGH_VALUE
-------------------------------- ------------ ---------- ---------- -------------------------- --------------------------
N1                                      10000          0      .0001 C103                       C3020C60
N2                                       2534       7429 .000394633 C103                       C30B2929

    BLOCKS   NUM_ROWS
---------- ----------
        32      20000

COLUMN_NAME                      NUM_DISTINCT  NUM_NULLS    DENSITY LOW_VALUE                  HIGH_VALUE
-------------------------------- ------------ ---------- ---------- -------------------------- --------------------------
N1                                      20000          0     .00005 C103                       C30A4553
N2                                       3115      16848 .000321027 C103                       C30B2929

    BLOCKS   NUM_ROWS
---------- ----------
        32      20000

COLUMN_NAME                      NUM_DISTINCT  NUM_NULLS    DENSITY LOW_VALUE                  HIGH_VALUE
-------------------------------- ------------ ---------- ---------- -------------------------- --------------------------
N1                                      20000          0     .00005 C103                       C30A4553
N2                                       3115      16848 .000321027 C103                       C30B2929
SYS_STUBZH0IHA7K$KEBJVXO5LOHAS

The workaround given in the bug is “add the extended stats after loading the table” – but if you’re constantly truncating and reloading that means you have to drop and add the extended stats and do a tablescan to gather the column group stats every time you reload.

Note: the limitation applies whether you create a column group, “ordinary” extended stats, a virtual column, or an implicit virtual column underlying a function-based index.

Just as a little aside – when I first wrote the demo script I forgot to put in the commit; after the insert/append – which meant I was trying to create column group stats on a table which should have given me Oracle error: “ORA-12838: cannot read/modify an object after modifying it in parallel”; instead this has been trapped by the dbms_stats package and shows up as a slightly confusing:


select dbms_stats.create_extended_stats(user,'t1','(n1, n2)') name from dual
       *
ERROR at line 1:
ORA-20001: Error when processing extension -  resource busy and acquire with NOWAIT specified or timeout expired
ORA-06512: at "SYS.DBMS_STATS", line 12977
ORA-06512: at "SYS.DBMS_STATS", line 44967
ORA-06512: at "SYS.DBMS_STATS", line 44986

Leave a Comment »

No comments yet.

RSS feed for comments on this post. TrackBack URI

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Theme: Rubric. Get a free blog at WordPress.com

Follow

Get every new post delivered to your Inbox.

Join 4,308 other followers