Oracle Scratchpad

September 18, 2012

Minimum stats

Filed under: Indexing,Oracle,Statistics — Jonathan Lewis @ 5:06 pm BST Sep 18,2012

Occasionally I come across complaints that dbms_stats is not obeying the estimate_percent when sampling data and is therefore taking more time than it “should” when gathering stats. The complaint, when I have seen it, always seems to be about the sample size Oracle chose for indexes.

There is a simple but (I believe) undocumented reason for this: because indexes are designed to collate similar data values they are capable of accentuating any skew in the data distribution, which means a sample taken from a small number of leaf blocks can be highly misleading as a guide to the whole index – so Oracle aims for a minimum sample size for gathering index stats.

I’ve found remnants of a note I wrote on in December 2004 which claims that this limit (as of Oracle 9.2) was 919 leaf blocks – and I have a faint memory of discovering this figure in an official Metalink (MOS) note. I can’t find the note any more, but it’s easy enough to set up a test to see if the requirement still exists and if the limit is still the same. Here’s a test I ran recently on using an 8KB block size:

rem     Script:         index_sample.sql
rem     Author:         Jonathan Lewis
rem     Dated:          May 2012
rem     Last tested

create table t1
with generator as (
	select	--+ materialize
		rownum id
	from dual
	connect by
		level <= 1e4
	rownum 				id,
	trunc(dbms_random.value(1,101))	n1,
	lpad(rownum,12,'0')		v1,
	lpad('x',100,'x')		padding
	generator	v1,
	generator	v2
	rownum <= 1e5

create index t1_i0 on t1(n1, lpad('0',70,'0')) pctfree 00;
create index t1_i1 on t1(n1, lpad('1',70,'1')) pctfree 10;
create index t1_i2 on t1(n1, lpad('2',70,'2')) pctfree 20;
create index t1_i3 on t1(n1, lpad('3',70,'3')) pctfree 30;
create index t1_i4 on t1(n1, lpad('4',70,'4')) pctfree 40;
create index t1_i5 on t1(n1, lpad('5',70,'5')) pctfree 50;
create index t1_i6 on t1(n1, lpad('6',70,'6')) pctfree 60;
create index t1_i7 on t1(n1, lpad('7',70,'7')) pctfree 70;
create index t1_i8 on t1(n1, lpad('8',70,'8')) pctfree 80;
create index t1_i9 on t1(n1, lpad('9',70,'9')) pctfree 90;

alter session set events '10046 trace name context forever, level 4';

		ownname		 => user,
		tabname		 =>'T1',
		cascade		 => true,
		estimate_percent => 1,
		method_opt	 => 'for all columns size 1'

I’ve created a table with 100,000 rows, and then created 10 indexes, each with a different pctfree so that their leaf block counts vary quite dramatically. After enabling event 10046 (at level 4, in case I needed to check some bind values) I’ve called dbms_stats.gather_table_stats with the cascade option set to true to ensure that index stats are collected, and with an explicit sample size (estimate_percent) of 1%.

The content of the trace file (extracted by a call to tkprof) is very revealing. The index with pctfree set to zero shows the following:

select /*+  no_parallel_index(t, "T1_I0")  dbms_stats cursor_sharing_exact
  use_weak_name_resl dynamic_sampling(0) no_monitoring no_substrb_pad
  no_expand index(t,"T1_I0") */ count(*) as nrw,count(distinct
  sys_op_lbid(65128,'L',t.rowid)) as nlb,count(distinct
  hextoraw(sys_op_descend("N1")||sys_op_descend("SYS_NC00005$"))) as ndk,
  sys_op_countchg(substrb(t.rowid,1,15),1) as clf
 "TEST_USER"."T1" t where "N1" is not null or "SYS_NC00005$" is not null

call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        1      0.01       0.00          0          0          0           0
Execute      1      0.00       0.00          0          0          0           0
Fetch        1      0.46       0.00       1065       1066          0           1
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total        3      0.48       0.00       1065       1066          0           1

Misses in library cache during parse: 1
Optimizer mode: ALL_ROWS
Parsing user id: 56     (recursive depth: 1)
Number of plan statistics captured: 1

Rows (1st) Rows (avg) Rows (max)  Row Source Operation
---------- ---------- ----------  ---------------------------------------------------
         1          1          1  SORT GROUP BY (cr=1066 pr=1065 pw=0 time=0 us)
    100000     100000     100000   INDEX FULL SCAN T1_I0 (cr=1066 pr=1065 pw=0 time=0 us cost=1066 size=8636378 card=100423)(object id 65128)

As you can see there is no sample clause applied to the index, and Oracle has examined all 100,000 rows. For all the other indexes, though, the reference to the table includes a sample clause. Stripping out just the sample clause, and reporting them in order of the setting of pctfree, this is what we see:

 "TEST_USER"."T1" sample block ( 95.3319502075,1)  t
 "TEST_USER"."T1" sample block ( 83.9729532164,1)  t
 "TEST_USER"."T1" sample block ( 73.7797045601,1)  t
 "TEST_USER"."T1" sample block ( 62.4320652174,1)  t
 "TEST_USER"."T1" sample block ( 52.2634212921,1)  t
 "TEST_USER"."T1" sample block ( 40.8953364187,1)  t
 "TEST_USER"."T1" sample block ( 29.5384417588,1)  t
 "TEST_USER"."T1" sample block ( 19.3164620817,1)  t
 "TEST_USER"."T1" sample block (  7.9547815248,1)  t

As the size of the index (as forced by pctfree) increases we see a sample clause being introduced, and the size of the sample decresing as the size of the index grows. What does this do to the number of blocks examined – the following list shows the Fetch line for each index, again in order of increasing index size:

call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Fetch        1      0.28       0.00       1202       1153          0           1
Fetch        1      0.18       0.00       1365       1145          0           1
Fetch        1      0.17       0.00       1553       1133          0           1
Fetch        1      0.14       0.00       1831       1164          0           1
Fetch        1      0.21       0.00       2180       1186          0           1
Fetch        1      0.17       0.00       2766       1163          0           1
Fetch        1      0.12       0.00       3797       1142          0           1
Fetch        1      0.10       0.00       5724       1129          0           1
Fetch        1      0.17       0.00       5352       1147          0           1

Reading down the list you can see that the number of blocks examined (query) is consistently within a few percent of 1,140. Based on these figures it seems likely that there is still a minimum requirement, but it’s a couple of hundred blocks higher than the figure I reported in 2004.

You will have noticed, of course, the interesting feature that the number of buffer gets (query + current) is smaller than the number of blocks read (disk). This is a side effect of the execution plan Oracle has chosen to sample the index, and one of the cases where Oracle will be reporting various session statistics about “prefetched” blocks and “prefetched blocks aged out before use” in particular.

Rows (1st) Rows (avg) Rows (max)  Row Source Operation
---------- ---------- ----------  ---------------------------------------------------
         1          1          1  SORT GROUP BY (cr=1129 pr=5724 pw=0 time=0 us)
     18683      18683      18683   INDEX SAMPLE FAST FULL SCAN T1_I8 (cr=1129 pr=5724 pw=0 time=0 us cost=174 size=1668228 card=19398)(object id 65136)

When sampling with a table scan, or index fast full scan, Oracle will do multi-block reads to pick up many adjacent blocks in a file, but may choose to examine just a few of them. The rationale for this is simple – if you want to collect (say) every fifth block in a segment – because of a 20% sample size – then a single read request of 16 blocks will get 3 of the blocks you need, and this might be faster then issuing three read requests for individual blocks – it’s a cost-based decision. If Oracle does a 16 block read to get three blocks the tkprof line would show: disk = 16, query = 3.


  1. I’m curious as to how the same analysis might come out using AUTO_SAMPLE_SIZE

    Comment by Rich Headrick — September 19, 2012 @ 1:52 am BST Sep 19,2012 | Reply

  2. Hi Sir,
    Does having no stats or stale stats affect the performance of a insert statement?
    insert into a values()

    Comment by yasir — February 28, 2013 @ 11:09 am GMT Feb 28,2013 | Reply

    • Yasir,

      Interesting question.
      I wouldn’t have thought so (at least, as far as basic Oracle tables go).

      If you enable the 10053 trace against an “insert into table () values()” then you will see that the optimizer code path does get visited, but it doesn’t seem to say anything about object-level statistics.

      Comment by Jonathan Lewis — February 28, 2013 @ 12:22 pm GMT Feb 28,2013 | Reply

  3. […] sample size is much bigger than expected – but that’s probably related to the normal “select 1,100 leaf blocks strategy”. Because of the skew in the pattern of deleted values it’s possible for the sample size in […]

    Pingback by Strange Estimates. | Oracle Scratchpad — October 30, 2019 @ 1:10 pm GMT Oct 30,2019 | Reply

RSS feed for comments on this post. TrackBack URI

Comments and related questions are welcome.

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Website Powered by

%d bloggers like this: