Extended stats

May 4, 2014

Extended stats

Filed under: 12c,extended stats,Histograms,Oracle,Statistics — Jonathan Lewis @ 1:24 pm BST May 4,2014

Like the recent article on deleting histograms this is another draft that I rediscovered while searching for some notes I had written on a different topic – so I’ve finally finished it off and published it.

Here’s a quirky little detail of extended stats that came up in an OTN thread earlier on this week [ed: actually 8th Jan 2014]. When you create column group stats, Oracle uses an undocumented function sys_op_combined_hash() to create a hash value, and if you gather simple stats on the column (i.e. no histogram) you can get some idea of the range of values that Oracle generates through the hash function. For example:


create table t1 as
select  1 n1, 2 n2
from dual
connect by level <= 5000   -- > comment to avoid wordpress format issue
union all
select  2, 1
from dual
connect by level <= 5000   -- > comment to avoid wordpress format issue
; 

select dbms_stats.create_extended_stats(user,'t1','(n1, n2)') name from dual; 

begin 
	dbms_stats.gather_table_stats( 
		ownname		=> user,
		tabname		=>'T1',
		method_opt	=> 'for all columns size 1'
	);
end;
/

break on column_name skip 1

column column_name format a32
column endpoint_number heading "EP No."
column endpoint_value  heading "Value" format 999,999,999,999,999,999,999

select
	column_name, endpoint_number, endpoint_value
from
	user_tab_histograms
where
	table_name = 'T1'
order by
	column_name, endpoint_value
;

COLUMN_NAME                          EP No.                        Value
-------------------------------- ---------- ----------------------------
N1                                        0                            1
                                          1                            2

N2                                        0                            1
                                          1                            2

SYS_STUBZH0IHA7K$KEBJVXO5LOHAS            0      298,332,787,864,733,000
                                          1    8,095,781,421,167,520,000

I could have selected low_value and high_value from user_tab_cols, using utl_raw.cast_to_number() to display them in numeric format, but the view user_tab_histograms display the low and high as a two-bucket histogram if there is no actual histogram data for the column in the histogram (histgrm$) table.

We probably don’t need to worry about what the low and high values might be because taking hash values destroys any meaning that a range might have (the optimizer can’t use column group stats in range-based predicates, only in equality predicates). However, we might collect a frequency histogram (or Top-N histogram in 12c) on the column group because there might be some data skew in the sets of values that we need to tell the optimizer about – so let’s gather a histogram with 2 buckets on our sample data set and see what we get:


begin
	dbms_stats.gather_table_stats(
		ownname		 => user,
		tabname		 =>'T1',
		method_opt 	 => 'for all columns size 2'
	);
end;
/

select
	column_name, endpoint_number, endpoint_value
from
	user_tab_histograms
where
	table_name = 'T1'
order by
	column_name, endpoint_value
;

COLUMN_NAME                          EP No.                        Value
-------------------------------- ---------- ----------------------------
N1                                     5000                            1
                                      10000                            2

N2                                     5000                            1
                                      10000                            2

SYS_STUBZH0IHA7K$KEBJVXO5LOHAS         5000                1,977,102,303
                                      10000                7,894,566,276

The histogram values have changed! As one of the posters on the OTN thread points out, what Oracle has actually stored in this case is mod(sys_op_combined_hash() ,9999999999).

So far I haven’t done any exhaustive testing to work out whether the change in the strategy for storing numbers makes any difference to the typical optimizer arithmetic – but I do have at least one case (relating to “missing values” behaviour where the presence or absence of a column group histogram does make a difference to the estimated cardinality in a way that seems inconsistent with other patterns of behaviour: I doubt if it’s actually due to the change in what’s stored, and one day I may come across a client where I actually need to work out what’s going on and how best to work with the anomaly.

Bonus thought:

In 12c Oracle collects column stats automatically as it loads data into an empty table using a direct path load – but not if it’s got extended stats defined on it. (This is bug number 18425876, labelled as fixed in 12.2 [ed: actually fixed in 12.1.0.2]). Here’s some code modelling a client scenario where we truncate and reload a table every day. In the first part of the demonstration I’ve loaded the table twice to show that after truncating and reloading I get new stats on the table – the first load is 10,000 rows, the second is 20,000 rows and the stats reflect this automatically. In the second part of the code, after adding a set of column group stats, truncating and loading 5,000 rows, the stats from the previous cycle are still in place. (The code is only relevant to 12c, of course)

create table t1 (n1 number, n2 number);

insert	/*+ append */
into	t1
select	object_id, data_object_id
from	all_objects
where
	rownum <= 10000
;
commit;

-- stats have appeared without a call to dbms_stats to gather them.

select blocks, num_rows from user_tables where table_name = 'T1';
select column_name, num_distinct, num_nulls, density, low_value, high_value from user_tab_cols where table_name = 'T1';

truncate table t1;

insert	/*+ append */
into	t1
select	object_id, data_object_id
from	all_objects
where
	rownum <= 20000
;
commit;

-- Stats now show the latest data 

select blocks, num_rows from user_tables where table_name = 'T1';
select column_name, num_distinct, num_nulls, density, low_value, high_value from user_tab_cols where table_name = 'T1';

-- Add a column group to the stats 

select dbms_stats.create_extended_stats(user,'t1','(n1, n2)') name from dual;

truncate table t1;

insert	/*+ append */
into	t1
select	object_id, data_object_id
from	all_objects
where
	rownum <= 5000
;
commit;

-- The stats have not been updated to reflect the new data, and the column group stats are empty

select blocks, num_rows from user_tables where table_name = 'T1';
select column_name, num_distinct, num_nulls, density, low_value, high_value from user_tab_cols where table_name = 'T1';

--------------------------------------------------------
-- Here are the three consecutive sets of results
--------------------------------------------------------

    BLOCKS   NUM_ROWS
---------- ----------
        16      10000

COLUMN_NAME                      NUM_DISTINCT  NUM_NULLS    DENSITY LOW_VALUE                  HIGH_VALUE
-------------------------------- ------------ ---------- ---------- -------------------------- --------------------------
N1                                      10000          0      .0001 C103                       C3020C60
N2                                       2534       7429 .000394633 C103                       C30B2929

    BLOCKS   NUM_ROWS
---------- ----------
        32      20000

COLUMN_NAME                      NUM_DISTINCT  NUM_NULLS    DENSITY LOW_VALUE                  HIGH_VALUE
-------------------------------- ------------ ---------- ---------- -------------------------- --------------------------
N1                                      20000          0     .00005 C103                       C30A4553
N2                                       3115      16848 .000321027 C103                       C30B2929

    BLOCKS   NUM_ROWS
---------- ----------
        32      20000

COLUMN_NAME                      NUM_DISTINCT  NUM_NULLS    DENSITY LOW_VALUE                  HIGH_VALUE
-------------------------------- ------------ ---------- ---------- -------------------------- --------------------------
N1                                      20000          0     .00005 C103                       C30A4553
N2                                       3115      16848 .000321027 C103                       C30B2929
SYS_STUBZH0IHA7K$KEBJVXO5LOHAS

The workaround given in the bug is “add the extended stats after loading the table” – but if you’re constantly truncating and reloading that means you have to drop and add the extended stats and do a tablescan to gather the column group stats every time you reload.

Note: the limitation applies whether you create a column group, “ordinary” extended stats, a virtual column, or an implicit virtual column underlying a function-based index.

Just as a little aside – when I first wrote the demo script I forgot to put in the commit; after the insert/append – which meant I was trying to create column group stats on a table which should have given me Oracle error: “ORA-12838: cannot read/modify an object after modifying it in parallel”; instead this has been trapped by the dbms_stats package and shows up as a slightly confusing:


select dbms_stats.create_extended_stats(user,'t1','(n1, n2)') name from dual
       *
ERROR at line 1:
ORA-20001: Error when processing extension -  resource busy and acquire with NOWAIT specified or timeout expired
ORA-06512: at "SYS.DBMS_STATS", line 12977
ORA-06512: at "SYS.DBMS_STATS", line 44967
ORA-06512: at "SYS.DBMS_STATS", line 44986

Comments (10)

10 Comments »

[…] (For an explanation of the sys_op_combined_hash() function see this URL). […]

Pingback by Column Groups | Oracle Scratchpad — November 5, 2015 @ 6:48 am GMT Nov 5,2015 | Reply
Hi Jonathan,

How does one go about deleting the extended statistics in one full swoop . I am unable to find any useful command that would do that without having to script it out . any ideas? thank you . C

Comment by Charan — March 24, 2016 @ 3:27 pm GMT Mar 24,2016 | Reply
- Charan,
  
  I don’t know of any command to get rid of all extended stats – and I doubt if there is one – but looking at view user_stat_extensions as a driver here’s some simple code:
```
begin
        for r in (select table_name, extension from user_stat_extensions where droppable = 'YES' and creator='USER') loop
                dbms_stats.drop_extended_stats(user, r.table_name, r.extension);
        end loop;
end;
/
```
  Comment by Jonathan Lewis — March 27, 2016 @ 6:24 am BST Mar 27,2016 | Reply
[…] “remember” that the literal is being compared with a char() when making the call to sys_op_combined_hash() that it uses for calculating column group stats so it doesn’t pad the column with spaces […]

Pingback by Column Groups | Oracle Scratchpad — April 20, 2016 @ 9:07 am BST Apr 20,2016 | Reply
Anyone having any locking issues while doing extended stats create or drop?
I’m having a locking issue and it needs a manual intervention (either stop the DB instances on the opposite node in the RAC or stop the Oracle CQN) in 12.1.0.2.160419

Is there a way we can disable this?

Thank you
Mano

Comment by Manohar Shanmugan — August 13, 2016 @ 12:57 am BST Aug 13,2016 | Reply
- Manohar,
  
  This isn’t really the right place to ask the question, it’s not a forum. However, it may be useful to other searchers to see that someone has had some sort of problem with the mix of RAC and extended stats (possibly in conjunction with CQN).
  
  I did a quick search for bugs on MoS with the key word “create_extended_stats” and nothing showed up, although there are several bugs relating to RAC and gathering stats that are “locking” issues – particularly library cache lock and mutex deadlocks. You may have better luck posing the question on the OTN database forum (if you don’t have a MoS account), but try to give a little more detail about what you’re seeing as the locking issue.
  
  Comment by Jonathan Lewis — August 15, 2016 @ 3:53 pm BST Aug 15,2016 | Reply
[…] An alternative strategy (and, I’d say, a better strategic approach) would have been to create a “fake” frequency histogram that included the value (‘N’,’N’) giving it a frequency of 1 – a method I’ve suggested in the past but with the little problem that you need to be able to work out the value to use in the array passed to dbms_stats.set_column_stats() to represent the value for the (‘N’,’N’) combination – and I’ve written about that topic in the past as well. […]

Pingback by Extended Histograms | Oracle Scratchpad — July 31, 2018 @ 11:05 pm BST Jul 31,2018 | Reply
[…] Let’s now compare the values stored in the histogram for that column with the values generated by a function call that I first referenced a couple of years ago: […]

Pingback by Extended Histograms – 2 | Oracle Scratchpad — August 2, 2018 @ 2:14 pm BST Aug 2,2018 | Reply
[…] Extendend Stats (May 2014): Under the covers with column groups. […]

Pingback by Column Group Catalog | Oracle Scratchpad — September 27, 2018 @ 5:16 pm BST Sep 27,2018 | Reply
[…] happened. Apparently, the table has extended statistics on the three primary key columns. I found a post by Jonathan Lewis that shows a virtual column like the one this table has with extended statistics. The error is on […]

Pingback by Datapump Import Fails on Tables With Extended Statistics | Bobby Durrett's DBA Blog — December 17, 2019 @ 11:11 pm GMT Dec 17,2019 | Reply

RSS feed for comments on this post. TrackBack URI

Comments and related questions are welcome. Cancel reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Oracle Scratchpad

May 4, 2014

Extended stats

Bonus thought:

10 Comments »

Comments and related questions are welcome. Cancel reply

Click on the motto below

Everything Changes Eventually

Search this blog

Categories

Special Links

Recent Posts

Recent Comments

Popular articles

Popular References

Archives

Blogroll

Posts by RSS

Oracle Scratchpad

May 4, 2014

Extended stats

Bonus thought:

Share this:

Related

10 Comments »

Comments and related questions are welcome. Cancel reply

Click on the motto below

Everything Changes Eventually

Search this blog

Categories

Special Links

Recent Posts

Recent Comments

Popular articles

Popular References

Archives

Blogroll

Posts by RSS