Like the recent article on deleting histograms this is another draft that I rediscovered while searching for some notes I had written on a different topic – so I’ve finally finished it off and published it.
Here’s a quirky little detail of extended stats that came up in an OTN thread earlier on this week [ed: actually 8th Jan 2014]. When you create column group stats, Oracle uses an undocumented function sys_op_combined_hash() to create a hash value, and if you gather simple stats on the column (i.e. no histogram) you can get some idea of the range of values that Oracle generates through the hash function. For example:
create table t1 as select 1 n1, 2 n2 from dual connect by level <= 5000 -- > comment to avoid wordpress format issue union all select 2, 1 from dual connect by level <= 5000 -- > comment to avoid wordpress format issue ; select dbms_stats.create_extended_stats(user,'t1','(n1, n2)') name from dual; begin dbms_stats.gather_table_stats( ownname => user, tabname =>'T1', method_opt => 'for all columns size 1' ); end; / break on column_name skip 1 column column_name format a32 column endpoint_number heading "EP No." column endpoint_value heading "Value" format 999,999,999,999,999,999,999 select column_name, endpoint_number, endpoint_value from user_tab_histograms where table_name = 'T1' order by column_name, endpoint_value ; COLUMN_NAME EP No. Value -------------------------------- ---------- ---------------------------- N1 0 1 1 2 N2 0 1 1 2 SYS_STUBZH0IHA7K$KEBJVXO5LOHAS 0 298,332,787,864,733,000 1 8,095,781,421,167,520,000
I could have selected low_value and high_value from user_tab_cols, using utl_raw.cast_to_number() to display them in numeric format, but the view user_tab_histograms display the low and high as a two-bucket histogram if there is no actual histogram data for the column in the histogram (histgrm$) table.
We probably don’t need to worry about what the low and high values might be because taking hash values destroys any meaning that a range might have (the optimizer can’t use column group stats in range-based predicates, only in equality predicates). However, we might collect a frequency histogram (or Top-N histogram in 12c) on the column group because there might be some data skew in the sets of values that we need to tell the optimizer about – so let’s gather a histogram with 2 buckets on our sample data set and see what we get:
begin dbms_stats.gather_table_stats( ownname => user, tabname =>'T1', method_opt => 'for all columns size 2' ); end; / select column_name, endpoint_number, endpoint_value from user_tab_histograms where table_name = 'T1' order by column_name, endpoint_value ; COLUMN_NAME EP No. Value -------------------------------- ---------- ---------------------------- N1 5000 1 10000 2 N2 5000 1 10000 2 SYS_STUBZH0IHA7K$KEBJVXO5LOHAS 5000 1,977,102,303 10000 7,894,566,276
The histogram values have changed! As one of the posters on the OTN thread points out, what Oracle has actually stored in this case is mod(sys_op_combined_hash() ,9999999999).
So far I haven’t done any exhaustive testing to work out whether the change in the strategy for storing numbers makes any difference to the typical optimizer arithmetic – but I do have at least one case (relating to “missing values” behaviour where the presence or absence of a column group histogram does make a difference to the estimated cardinality in a way that seems inconsistent with other patterns of behaviour: I doubt if it’s actually due to the change in what’s stored, and one day I may come across a client where I actually need to work out what’s going on and how best to work with the anomaly.
Bonus thought:
In 12c Oracle collects column stats automatically as it loads data into an empty table using a direct path load – but not if it’s got extended stats defined on it. (This is bug number 18425876, labelled as fixed in 12.2 [ed: actually fixed in 12.1.0.2]). Here’s some code modelling a client scenario where we truncate and reload a table every day. In the first part of the demonstration I’ve loaded the table twice to show that after truncating and reloading I get new stats on the table – the first load is 10,000 rows, the second is 20,000 rows and the stats reflect this automatically. In the second part of the code, after adding a set of column group stats, truncating and loading 5,000 rows, the stats from the previous cycle are still in place. (The code is only relevant to 12c, of course)
create table t1 (n1 number, n2 number); insert /*+ append */ into t1 select object_id, data_object_id from all_objects where rownum <= 10000 ; commit; -- stats have appeared without a call to dbms_stats to gather them. select blocks, num_rows from user_tables where table_name = 'T1'; select column_name, num_distinct, num_nulls, density, low_value, high_value from user_tab_cols where table_name = 'T1'; truncate table t1; insert /*+ append */ into t1 select object_id, data_object_id from all_objects where rownum <= 20000 ; commit; -- Stats now show the latest data select blocks, num_rows from user_tables where table_name = 'T1'; select column_name, num_distinct, num_nulls, density, low_value, high_value from user_tab_cols where table_name = 'T1'; -- Add a column group to the stats select dbms_stats.create_extended_stats(user,'t1','(n1, n2)') name from dual; truncate table t1; insert /*+ append */ into t1 select object_id, data_object_id from all_objects where rownum <= 5000 ; commit; -- The stats have not been updated to reflect the new data, and the column group stats are empty select blocks, num_rows from user_tables where table_name = 'T1'; select column_name, num_distinct, num_nulls, density, low_value, high_value from user_tab_cols where table_name = 'T1'; -------------------------------------------------------- -- Here are the three consecutive sets of results -------------------------------------------------------- BLOCKS NUM_ROWS ---------- ---------- 16 10000 COLUMN_NAME NUM_DISTINCT NUM_NULLS DENSITY LOW_VALUE HIGH_VALUE -------------------------------- ------------ ---------- ---------- -------------------------- -------------------------- N1 10000 0 .0001 C103 C3020C60 N2 2534 7429 .000394633 C103 C30B2929 BLOCKS NUM_ROWS ---------- ---------- 32 20000 COLUMN_NAME NUM_DISTINCT NUM_NULLS DENSITY LOW_VALUE HIGH_VALUE -------------------------------- ------------ ---------- ---------- -------------------------- -------------------------- N1 20000 0 .00005 C103 C30A4553 N2 3115 16848 .000321027 C103 C30B2929 BLOCKS NUM_ROWS ---------- ---------- 32 20000 COLUMN_NAME NUM_DISTINCT NUM_NULLS DENSITY LOW_VALUE HIGH_VALUE -------------------------------- ------------ ---------- ---------- -------------------------- -------------------------- N1 20000 0 .00005 C103 C30A4553 N2 3115 16848 .000321027 C103 C30B2929 SYS_STUBZH0IHA7K$KEBJVXO5LOHAS
The workaround given in the bug is “add the extended stats after loading the table” – but if you’re constantly truncating and reloading that means you have to drop and add the extended stats and do a tablescan to gather the column group stats every time you reload.
Note: the limitation applies whether you create a column group, “ordinary” extended stats, a virtual column, or an implicit virtual column underlying a function-based index.
Just as a little aside – when I first wrote the demo script I forgot to put in the commit; after the insert/append – which meant I was trying to create column group stats on a table which should have given me Oracle error: “ORA-12838: cannot read/modify an object after modifying it in parallel”; instead this has been trapped by the dbms_stats package and shows up as a slightly confusing:
select dbms_stats.create_extended_stats(user,'t1','(n1, n2)') name from dual * ERROR at line 1: ORA-20001: Error when processing extension - resource busy and acquire with NOWAIT specified or timeout expired ORA-06512: at "SYS.DBMS_STATS", line 12977 ORA-06512: at "SYS.DBMS_STATS", line 44967 ORA-06512: at "SYS.DBMS_STATS", line 44986
[…] (For an explanation of the sys_op_combined_hash() function see this URL). […]
Pingback by Column Groups | Oracle Scratchpad — November 5, 2015 @ 6:48 am GMT Nov 5,2015 |
Hi Jonathan,
How does one go about deleting the extended statistics in one full swoop . I am unable to find any useful command that would do that without having to script it out . any ideas? thank you . C
Comment by Charan — March 24, 2016 @ 3:27 pm GMT Mar 24,2016 |
Charan,
I don’t know of any command to get rid of all extended stats – and I doubt if there is one – but looking at view user_stat_extensions as a driver here’s some simple code:
Comment by Jonathan Lewis — March 27, 2016 @ 6:24 am BST Mar 27,2016 |
[…] “remember” that the literal is being compared with a char() when making the call to sys_op_combined_hash() that it uses for calculating column group stats so it doesn’t pad the column with spaces […]
Pingback by Column Groups | Oracle Scratchpad — April 20, 2016 @ 9:07 am BST Apr 20,2016 |
Anyone having any locking issues while doing extended stats create or drop?
I’m having a locking issue and it needs a manual intervention (either stop the DB instances on the opposite node in the RAC or stop the Oracle CQN) in 12.1.0.2.160419
Is there a way we can disable this?
Thank you
Mano
Comment by Manohar Shanmugan — August 13, 2016 @ 12:57 am BST Aug 13,2016 |
Manohar,
This isn’t really the right place to ask the question, it’s not a forum. However, it may be useful to other searchers to see that someone has had some sort of problem with the mix of RAC and extended stats (possibly in conjunction with CQN).
I did a quick search for bugs on MoS with the key word “create_extended_stats” and nothing showed up, although there are several bugs relating to RAC and gathering stats that are “locking” issues – particularly library cache lock and mutex deadlocks. You may have better luck posing the question on the OTN database forum (if you don’t have a MoS account), but try to give a little more detail about what you’re seeing as the locking issue.
Comment by Jonathan Lewis — August 15, 2016 @ 3:53 pm BST Aug 15,2016 |
[…] An alternative strategy (and, I’d say, a better strategic approach) would have been to create a “fake” frequency histogram that included the value (‘N’,’N’) giving it a frequency of 1 – a method I’ve suggested in the past but with the little problem that you need to be able to work out the value to use in the array passed to dbms_stats.set_column_stats() to represent the value for the (‘N’,’N’) combination – and I’ve written about that topic in the past as well. […]
Pingback by Extended Histograms | Oracle Scratchpad — July 31, 2018 @ 11:05 pm BST Jul 31,2018 |
[…] Let’s now compare the values stored in the histogram for that column with the values generated by a function call that I first referenced a couple of years ago: […]
Pingback by Extended Histograms – 2 | Oracle Scratchpad — August 2, 2018 @ 2:14 pm BST Aug 2,2018 |
[…] Extendend Stats (May 2014): Under the covers with column groups. […]
Pingback by Column Group Catalog | Oracle Scratchpad — September 27, 2018 @ 5:16 pm BST Sep 27,2018 |
[…] happened. Apparently, the table has extended statistics on the three primary key columns. I found a post by Jonathan Lewis that shows a virtual column like the one this table has with extended statistics. The error is on […]
Pingback by Datapump Import Fails on Tables With Extended Statistics | Bobby Durrett's DBA Blog — December 17, 2019 @ 11:11 pm GMT Dec 17,2019 |