Sometimes wishes come true and in 19c – with fix_control QKSFM_DBMS_STATS_27268249 – one of mine did. The description of this fix (which is enabled by default) is: “use approximate ndv for computing leaf blocks and distinct keys”.
Here’s a key item in the output file from running tkprof against the trace file generated by a simple call to:
execute dbms_stats.gather_index_stats(user,'t1_i2')
You’ll notice tthat I haven’t included a value for the estimate_percent parameter in the call, which means it will default to dbms_stats.auto_sample_size. (unless I’ve done something a little silly with set_table_prefs or set_global_prefs). The index is a two_column index on t1(x1, x2) with a size of roughly 16,000 blocks on a table of approximately 6 million rows.
select /*+ opt_param('_optimizer_use_auto_indexes' 'on')
no_parallel_index(t, "T1_I2") dbms_stats cursor_sharing_exact
use_weak_name_resl dynamic_sampling(0) no_monitoring xmlindex_sel_idx_tbl
opt_param('optimizer_inmemory_aware' 'false') no_substrb_pad no_expand
index_ffs(t,"T1_I2") */ count(*) as nrw,
approx_count_distinct(sys_op_lbid(106818,'L',t.rowid)) as nlb,
approx_count_distinct(sys_op_combined_hash("X1","X2")) as ndk,
sys_op_countchg(substrb(t.rowid,1,15),1) as clf
from
"TEST_USER"."T1" t where "X1" is not null or "X2" is not null
Rows (1st) Rows (avg) Rows (max) Row Source Operation
---------- ---------- ---------- ---------------------------------------------------
1 1 1 SORT AGGREGATE APPROX (cr=15821 pr=0 pw=0 time=2812116 us starts=1)
6018750 6018750 6018750 INDEX FAST FULL SCAN T1_I2 (cr=15821 pr=0 pw=0 time=894658 us starts=1 cost=2117 size=192000000 card=6000000)(object id 106818)
The first point of interest is the appearance of the approx_count_distinct() function calls used for the nlb (number of leaf blocks) and ndk (number of distinct keys) columns. It’s also worth nothing that the ndk value is derived from a call to sys_op_combined_hash() applied to the two base columns which means the number of distinct keys for a multi-column index is calculated in exactly the same way as the number of distinct values for a column group.
There are two more important details though: first that the mechanism uses a fast full scan of the whole index, secondly that the size of this index is about 16,000 blocks.
A final (unrelated) point is the little reminder in the query’s hints that 19c includes an automatic indexing mechanism. It’s easy to forget such things when your overnight batch job takes longer than usual.
For comparison purposes, the following shows the effect of disabling the feature:
alter session set "_fix_control"='27268249:0';
select /*+ opt_param('_optimizer_use_auto_indexes' 'on')
no_parallel_index(t, "T1_I2") dbms_stats cursor_sharing_exact
use_weak_name_resl dynamic_sampling(0) no_monitoring xmlindex_sel_idx_tbl
opt_param('optimizer_inmemory_aware' 'false') no_substrb_pad no_expand
index_ffs(t,"T1_I2") */ count(*) as nrw,count(distinct sys_op_lbid(106818,
'L',t.rowid)) as nlb,count(distinct hextoraw(sys_op_descend("X1")
||sys_op_descend("X2"))) as ndk,sys_op_countchg(substrb(t.rowid,1,15),1) as
clf
from
"TEST_USER"."T1" sample block ( 7.0114135742,1) t where "X1" is not null
or "X2" is not null
Rows (1st) Rows (avg) Rows (max) Row Source Operation
---------- ---------- ---------- ---------------------------------------------------
1 1 1 SORT GROUP BY (cr=1132 pr=0 pw=0 time=460459 us starts=1)
421761 421761 421761 INDEX SAMPLE FAST FULL SCAN T1_I2 (cr=1132 pr=0 pw=0 time=67203 us starts=1 cost=150 size=8413700 card=420685)(object id 106818)
The calculations for nlb and ndk are simple count()s and the thing that ndk counts is a messy concatenation of the columns hextoraw(sys_op_descend(“X1”) || sys_op_descend(“X2”)) that Oracle has used to ensure that counts for like ‘AB’ || ‘CD’ and ‘ABC’||’D’ don’t get combined.
Perhaps most significantly for some people is that the execution plan shows us that the index fast full scan was a SAMPLE and only analyzed (a fairly typical) 1,132 blocks out of 16,000 and 400,000 rows out of 6 million This looks a bit of a threat, of course; but there may be a few critical indexes where this extra workload will stop random variations in execution plans when it really matters.
As with so many details of Oracle there are likely to be cases where the new method is hugely beneficial, and some where it’s a nuisance, so it’s good to know that you can be a little selective about when it gets used.
Footnote
Don’t forget that it’s a good idea to think about setting the table preference “table_cached_blocks” to allow Oracle to produce a better value for the clustering_factor. This is another mechanism that increases the CPU required to gather index stats.
It’s an odd little detail that the fixed control appeared in 19.3.0.0 according to my archived copies of v$system_fix_control and certainly wasn’t in 18.3.0.0 – but the entry in the 19.3.0.0 view lists it under control that were available from Oracle 8.0.0.0 !
[…] 19c – better stats (April 2022): Oracle now uses approx_count_distinct() on the whole index in dbms_stats.gather_index_stats(). […]
Pingback by Indexing Catalogue | Oracle Scratchpad — April 11, 2022 @ 5:05 pm BST Apr 11,2022 |
[…] 19c – better stats (April 2022): Oracle now uses approx_count_distinct() on the whole index in dbms_stats.gather_index_stats(). […]
Pingback by Statistics catalogue | Oracle Scratchpad — April 11, 2022 @ 5:06 pm BST Apr 11,2022 |
[…] this posting for a few more notes on the […]
Pingback by Minimum stats | Oracle Scratchpad — August 7, 2022 @ 6:34 pm BST Aug 7,2022 |