Oracle Scratchpad

April 11, 2022

Index Upgrade

Filed under: 19c,Indexing,Oracle,Statistics — Jonathan Lewis @ 4:55 pm BST Apr 11,2022

Sometimes wishes come true and in 19c – with fix_control QKSFM_DBMS_STATS_27268249 – one of mine did. The description of this fix (which is enabled by default) is: “use approximate ndv for computing leaf blocks and distinct keys”.

Here’s a key item in the output file from running tkprof against the trace file generated by a simple call to:

execute dbms_stats.gather_index_stats(user,'t1_i2')

The index is a two_column index on t1(x1, x2) with a size of roughly 16,000 blocks on a table of approximately 6 million rows.

select /*+ opt_param('_optimizer_use_auto_indexes' 'on')
  no_parallel_index(t, "T1_I2")  dbms_stats cursor_sharing_exact
  use_weak_name_resl dynamic_sampling(0) no_monitoring xmlindex_sel_idx_tbl
  opt_param('optimizer_inmemory_aware' 'false') no_substrb_pad  no_expand
  index_ffs(t,"T1_I2") */ count(*) as nrw,
  approx_count_distinct(sys_op_lbid(106818,'L',t.rowid)) as nlb,
  approx_count_distinct(sys_op_combined_hash("X1","X2")) as ndk,
  sys_op_countchg(substrb(t.rowid,1,15),1) as clf
from
 "TEST_USER"."T1" t where "X1" is not null or "X2" is not null

Rows (1st) Rows (avg) Rows (max)  Row Source Operation
---------- ---------- ----------  ---------------------------------------------------
         1          1          1  SORT AGGREGATE APPROX (cr=15821 pr=0 pw=0 time=2812116 us starts=1)
   6018750    6018750    6018750   INDEX FAST FULL SCAN T1_I2 (cr=15821 pr=0 pw=0 time=894658 us starts=1 cost=2117 size=192000000 card=6000000)(object id 106818)


The first point of interest is the appearance of the approx_count_distinct() function calls used for the nlb (number of leaf blocks) and ndk (number of distinct keys) columns. It’s also worth nothing that the ndk value is derived from a call to sys_op_combined_hash() applied to the two base columns which means the number of distinct keys for a multi-column index is calculated in exactly the same way as the number of distinct values for a column group.

There are two more important details though: first that the mechanism uses a fast full scan of the whole index, secondly that the size of this index is about 16,000 blocks.

A final (unrelated) point is the little reminder in the hints that 19c includes an automatic indexing mechanism. It’s easy to forget such things when your overnight batch job takes longer than usual.

For comparison purposes, the following shows the effect of disabling the feature:

alter session set "_fix_control"='27268249:0';


select /*+ opt_param('_optimizer_use_auto_indexes' 'on')
  no_parallel_index(t, "T1_I2")  dbms_stats cursor_sharing_exact
  use_weak_name_resl dynamic_sampling(0) no_monitoring xmlindex_sel_idx_tbl
  opt_param('optimizer_inmemory_aware' 'false') no_substrb_pad  no_expand
  index_ffs(t,"T1_I2") */ count(*) as nrw,count(distinct sys_op_lbid(106818,
  'L',t.rowid)) as nlb,count(distinct hextoraw(sys_op_descend("X1")
  ||sys_op_descend("X2"))) as ndk,sys_op_countchg(substrb(t.rowid,1,15),1) as
  clf
from
 "TEST_USER"."T1" sample block (  7.0114135742,1)  t where "X1" is not null
  or "X2" is not null

Rows (1st) Rows (avg) Rows (max)  Row Source Operation
---------- ---------- ----------  ---------------------------------------------------
         1          1          1  SORT GROUP BY (cr=1132 pr=0 pw=0 time=460459 us starts=1)
    421761     421761     421761   INDEX SAMPLE FAST FULL SCAN T1_I2 (cr=1132 pr=0 pw=0 time=67203 us starts=1 cost=150 size=8413700 card=420685)(object id 106818)

The calculations for nlb and ndk are simple count()s and the thing that ndk counts is a messy concatenation of the columns hextoraw(sys_op_descend(“X1”) || sys_op_descend(“X2”)) that Oracle has used to ensure that counts for like ‘AB’ || ‘CD’ and ‘ABC’||’D’ don’t get combined.

Perhaps most significantly for some people is that the execution plan shows us that the index fast full scan was a SAMPLE and only analyzed (a fairly typical) 1,132 blocks out of 16,000 and 400,000 rows out of 6 million This looks a bit of a threat, of course; but there may be a few critical indexes where this extra workload will stop random variations in execution plans when it really matters.

As with so many details of Oracle there are likely to be cases where the new method is hugely beneficial, and some where it’s a nuisance, so it’s good to know that you can be a little selective about when it gets used.

Footnote

Don’t forget that it’s a good idea to change think about setting the table preference “table_cached_blocks” to allow Oracle to produce a better value for the clustering_factor. This is another mechanism that increases the CPU required to gather index stats.

It’s an odd little detail that the fixed control appeared in 19.3.0.0 according to my histogram of v$system_fix_control and certainly wasn’t in 18.3.0.0 – but the entry in the view thinks that it was available from Oracle 8.0.0.0.

2 Comments »

  1. […] 19c – better stats (April 2022): Oracle now uses approx_count_distinct() on the whole index in dbms_stats.gather_index_stats(). […]

    Pingback by Indexing Catalogue | Oracle Scratchpad — April 11, 2022 @ 5:05 pm BST Apr 11,2022 | Reply

  2. […] 19c – better stats (April 2022): Oracle now uses approx_count_distinct() on the whole index in dbms_stats.gather_index_stats(). […]

    Pingback by Statistics catalogue | Oracle Scratchpad — April 11, 2022 @ 5:06 pm BST Apr 11,2022 | Reply


RSS feed for comments on this post. TrackBack URI

Comments and related questions are welcome.

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Website Powered by WordPress.com.

%d bloggers like this: