I was going to drop a comment here but have turned it into a blog post

https://hourim.wordpress.com/2016/01/20/natural-and-adjusted-hybrid-histogram/

I hope this answers your question

Best regards

]]>We can see that ‘detail’ when tracing column statistics gathering (as exposed by http://www.pythian.com/blog/options-for-tracing-oracle-dbms_stats/) with the following:

SQL> exec dbms_stats.set_global_prefs('TRACE',to_char(1+16)); SQL> exec dbms_stats.gather_table_stats(ownname=>'',tabname=>'HISTOGRAM',method_opt=>'FOR COLUMNS C2 SIZE 3');

it shows that the last bucket (value 15) has been removed to be replaced by the max:

DBMS_STATS: remove last bucket: Typ=2 Len=2: c1,10 add: Typ=2 Len=2: c1,15 DBMS_STATS: removal_count: 1 total_nonnull_rows: 12 mnb: 3 DBMS_STATS: adjusted coverage: .667 DBMS_STATS: hist_type in exec_get_topn: 2048 ndv:6 mnb:3 DBMS_STATS: Evaluating frequency histogram for col: "C2" DBMS_STATS: number of values = 4, max # of buckects = 3, pct = 100, ssize = 12 DBMS_STATS: Trying to convert frequency histogram to hybrid

The ‘adjusted coverage’ may suggest that dbms_stats verifies that there is still a minimum coverage of top frequencies. For example if we calculate stats with only two buckets we get:

DBMS_STATS: remove first bucket: Typ=2 Len=2: c1,8 add: Typ=2 Len=2: c1,6 DBMS_STATS: remove last bucket: Typ=2 Len=2: c1,10 add: Typ=2 Len=2: c1,15 DBMS_STATS: removal_count: 2 total_nonnull_rows: 12 mnb: 2 DBMS_STATS: Abort top-n histogram, as the addition of min/max does not preserve the minimum coverage: .166667 vs. .5

Regards,

Franck.

And because of this rule Oracle looses information: only 10 rows in the histogram and the number of distinct values becomes wrong.

I was just playing with a random data set: when something is wrong in real case it might be due to such anomalies even with a bigger number of rows and buckets (at the boundary between the Top-Frequency and Hybrid) ]]>

I always expect to see a few oddities at the boundaries, and playing around with very small data sets and bucket counts is asking for oddities.

One detail I don’t seem to have in the article is that Oracle does want to keep track of the low and high values – and I think that that’s why the bucket of 2 rows has appeared. After that I can’t explain the counting errors. I have been able to produce a couple more anomalies by adding more rows (with values between 6 and 29) to your data set and then asking for histograms with fewer buckets than distinct values. I suspect something odd can happen as Oracle decides which bucket to eliminate to allow it to introduce a bucket for the low value.

]]>