Oracle Scratchpad

January 11, 2023

Quiz Night

Filed under: Indexing,Oracle,Statistics — Jonathan Lewis @ 5:27 pm GMT Jan 11,2023

I may have given the answers to this little puzzle elsewhere on the blog already, but if so I can’t find where, and it’s worth a little note of its own.

I have a non-partitioned heap table that is a subset of all_objects, and I’m going to execute a series of calls to dbms_stats to see what object stats appear. When I first wrote the script Oracle didn’t gather stats automatically when you created a table, so I’ve have to add in a call to delete_table_stats immediately after creating the table.

I’ve run a little script to display various stats about the table at each stage; that part of the process is left to any readers who want to check my answers.


rem     Script:         gather_col_stats.sql
rem     Author:         Jonathan Lewis
rem     Dated:          Feb 2015
rem     Purpose:        
rem
rem     Last tested 
rem             19.11.0.0
rem             11.2.0.4
rem

drop table t1;

@@setup

create table t1 
as
select  * 
from    all_objects
where   rownum <= 50000
;


execute dbms_stats.delete_table_stats(user,'t1');
start show_simple_stats t1

execute dbms_stats.gather_table_stats(user,'t1',method_opt=>'for columns created size 1')
start show_simple_stats t1

delete from t1 where rownum <= 2000;

create index t1_i1 on t1(object_id);
start show_simple_stats t1

execute dbms_stats.gather_table_stats(user,'t1',method_opt=>'for all indexed columns size 1')
start show_simple_stats t1

After creating the table I call delete_table_stats() – does this definitely mean there are no stats on the table?

I’ve then gathered stats on just one column of the table – what has Oracle really done?

I’ve then deleted a couple of thousand rows and created an index on the table. (Why did I do the delete?) Since this is 11g (at least) what extra stats have come into existence?

I’ve then gathered stats for just the indexed columns – what has Oracle really done?

Answers

The script I wrote to report statistics was a set of simple queries against user_tables, user_indexes, user_tab_cols, and user_tab_histograms in turn. After the call to delete_table_stats() the first reportshowed that there were no statistics about the table in any of these views.

After the gathering stats but specifying just one column the report showed the following

Table stats
===========
    BLOCKS   NUM_ROWS       RLEN  CHAIN_CNT       Sample
---------- ---------- ---------- ---------- ------------
      1009      50000        135          0       50,000

Table column stats
==================
COLUMN_NAME                            Sample     Distinct  NUM_NULLS    DENSITY HISTOGRAM          Buckets CLEN GLO LOW_VALUE                  HIGH_VALUE
-------------------------------- ------------ ------------ ---------- ---------- --------------- ---------- ---- --- -------------------------- -----------------
APPLICATION                                                                      NONE                            NO
CREATED                                50,000          532          0 .001879699 NONE                     1    8 YES 7877041101390F             78790A0E0A330F
CREATED_APPID                                                                    NONE                            NO
...

Table histogram stats
=====================
COLUMN_NAME                      ENDPOINT_NUMBER ENDPOINT_VALUE ENDPOINT_ACTUAL_VALUE
-------------------------------- --------------- -------------- ----------------------------------------
CREATED                                        0     2458591.04
                                               1     2459502.41

Obviously we expect to see the column stats for just the one column, and the histogram is just reporting the low and high values since we have requested “no histogram”. The interesting point, though is that the table stats show that Oracle has also derived an average row length (avg_row_len – heading rlen) – so where did that come from.

Index stats
===========

INDEX_NAME               BLEVEL LEAF_BLOCKS   NUM_ROWS DISTINCT_KEYS CLUSTERING_FACTOR       Sample    AVG_LBK    AVG_DBK
-------------------- ---------- ----------- ---------- ------------- ----------------- ------------ ---------- ----------
T1_I1                         1         107      48000         48000              1003       48,000          1          1


After deleting 2,000 rows and creating an ordinary B-tree index we get an extra entry in the stats report which is about the index stats; all the other parts of the report remain unchanged. We don’t see any column stats of the column used to define the index, but one thing we do see is that the user_indexes correctly reports 48,000 as its num_rows while the num_rows in user_tables has not been updated with this bit of information.

Little anecdote: about 20 years ago I was called in to find the root cause of a performance problem. After a 3 hour management meeting I was allowed to examine the 10053 trace the DBAs had already dumped in anticipation of my arrival – it took 10 minutes to find an Oracle bug that appeared when the num_rows for a primary key index was smaller than the num_rows for the table. (The bug was fixed in the next release). And, of course, you should remember that num_rows for an index could well be much smaller than num_rows in user_tables thanks to NULLs.

After the call to gather stats “for all indexed columns” the report produced the following results

Table stats (updated)
=====================

    BLOCKS   NUM_ROWS       RLEN  CHAIN_CNT       Sample
---------- ---------- ---------- ---------- ------------
      1009      48000        136          0       48,000


Index stats (no change)
=======================

INDEX_NAME               BLEVEL LEAF_BLOCKS   NUM_ROWS DISTINCT_KEYS CLUSTERING_FACTOR       Sample    AVG_LBK    AVG_DBK
-------------------- ---------- ----------- ---------- ------------- ----------------- ------------ ---------- ----------
T1_I1                         1         107      48000         48000              1003       48,000          1          1

Table column stats - indexed columns now set
============================================

COLUMN_NAME                            Sample     Distinct  NUM_NULLS    DENSITY HISTOGRAM          Buckets CLEN GLO LOW_VALUE                  HIGH_VALUE
-------------------------------- ------------ ------------ ---------- ---------- --------------- ---------- ---- --- -------------------------- --------------------------
APPLICATION                                                                      NONE                            NO
CREATED                                50,000          532          0 .001879699 NONE                     1    8 YES 7877041101390F             78790A0E0A330F
CREATED_APPID                                                                    NONE                            NO
..
NAMESPACE                                                                        NONE                            NO
OBJECT_ID                              48,000       48,000          0 .000020833 NONE                     1    5 YES C22547                     C3083902
OBJECT_NAME                                                                      NONE                            NO
...


Table histogram stats (indexd columns now added)
================================================

COLUMN_NAME                      ENDPOINT_NUMBER ENDPOINT_VALUE ENDPOINT_ACTUAL_VALUE
-------------------------------- --------------- -------------- ----------------------------------------
CREATED                                        0     2458591.04
                                               1     2459502.41
OBJECT_ID                                      0           3670
                                               1          75601

It’s not surprising to see that the index stats haven’t changed and the indexed column now has stats. A detail of the column stats (which is still not a surprise) is that the original 50,000 sample size is there for the created column; Oracle hasn’t been asked to do anything about it, so it hasn’t done anything about it.

One thing that has changed, though, is that the table stats now show a corrected num_rows, and a change to the average row length (rlen). It’s the row length that is the critical feature I want to highlight in this little quiz. How does Oracle get this estimate? The answer is one you don’t really want to hear: it does a lot of work to calculate it – and for some systems the work done will be huge.

Here’s the SQL (pulled from the tkprof summary of a trace file, with a little cosmetic editing) that Oracle ran in 19.11 to gather “just” the stats on the created column.

select /*+  full(t)    no_parallel(t) no_parallel_index(t) dbms_stats
  cursor_sharing_exact use_weak_name_resl dynamic_sampling(0) no_monitoring
  xmlindex_sel_idx_tbl opt_param('optimizer_inmemory_aware' 'false')
  no_substrb_pad  */
  to_char(count("OWNER")),to_char(count("OBJECT_NAME")),
  to_char(count("SUBOBJECT_NAME")),to_char(count("OBJECT_ID")),
  to_char(count("DATA_OBJECT_ID")),to_char(count("OBJECT_TYPE")),
  to_char(count("CREATED")),
  substrb(dump(min("CREATED"),16,0,64),1,240),
  substrb(dump(max("CREATED"),16,0,64),1,240),
  to_char(count("LAST_DDL_TIME")),
  to_char(count("TIMESTAMP")),to_char(count("STATUS")),
  to_char(count("TEMPORARY")),to_char(count("GENERATED")),
  to_char(count("SECONDARY")),to_char(count("NAMESPACE")),
  to_char(count("EDITION_NAME")),to_char(count("SHARING")),
  to_char(count("EDITIONABLE")),to_char(count("ORACLE_MAINTAINED")),
  to_char(count("APPLICATION")),to_char(count("DEFAULT_COLLATION")),
  to_char(count("DUPLICATED")),to_char(count("SHARDED")),
  to_char(count("CREATED_APPID")),to_char(count("CREATED_VSNID")),
  to_char(count("MODIFIED_APPID")),to_char(count("MODIFIED_VSNID"))
from
 "TEST_USER"."T1" t  /* ACL,ACL,ACL,ACL,ACL,ACL,NDV,NIL,NIL,NNV,ACL,ACL,ACL,
  ACL,ACL,ACL,ACL,ACL,ACL,ACL,ACL,ACL,ACL,ACL,ACL,ACL,ACL,ACL*/

I’ve highlighted the three lines in the statement where the simple stats about the created column are derived in pure SQL, but you can also see that every single column in the table also has the number of non-null entries count, and at the end of the query you can see the “comment” code that triggers the routine to calculate the average column length (ACL) for almost all the columns.

So gathering stats for “just one” column actually does a lot of work getting lengths and counts of all the other columns in the table – and then discards them rather than storing partial stats for the user_tab_cols view.

Summary

When you try to gather stats for a single column of a table – after, for example, you’ve just added a function-based index – Oracle will have to do a lot more work than you might have realised because it want’s to update the average row length (avg_row_len) for the table, and to do that it calculates the average column length (avg_col_len) for every single column.

2 Comments »

  1. […] Quiz night 38 (Jan 2023): what really happens when you try to gather stats on a single column […]

    Pingback by Quiz Catalogue | Oracle Scratchpad — January 11, 2023 @ 5:31 pm GMT Jan 11,2023 | Reply

  2. […] Quiz Night […]

    Pingback by Statistics catalogue | Oracle Scratchpad — January 11, 2023 @ 5:33 pm GMT Jan 11,2023 | Reply


RSS feed for comments on this post. TrackBack URI

Comments and related questions are welcome.

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Website Powered by WordPress.com.

%d bloggers like this: