I may have given the answers to this little puzzle elsewhere on the blog already, but if so I can’t find where, and it’s worth a little note of its own.
I have a non-partitioned heap table that is a subset of all_objects, and I’m going to execute a series of calls to dbms_stats to see what object stats appear. When I first wrote the script Oracle didn’t gather stats automatically when you created a table, so I’ve have to add in a call to delete_table_stats immediately after creating the table.
I’ve run a little script to display various stats about the table at each stage; that part of the process is left to any readers who want to check my answers.
rem Script: gather_col_stats.sql rem Author: Jonathan Lewis rem Dated: Feb 2015 rem Purpose: rem rem Last tested rem 19.11.0.0 rem 11.2.0.4 rem drop table t1; @@setup create table t1 as select * from all_objects where rownum <= 50000 ; execute dbms_stats.delete_table_stats(user,'t1'); start show_simple_stats t1 execute dbms_stats.gather_table_stats(user,'t1',method_opt=>'for columns created size 1') start show_simple_stats t1 delete from t1 where rownum <= 2000; create index t1_i1 on t1(object_id); start show_simple_stats t1 execute dbms_stats.gather_table_stats(user,'t1',method_opt=>'for all indexed columns size 1') start show_simple_stats t1
After creating the table I call delete_table_stats() – does this definitely mean there are no stats on the table?
I’ve then gathered stats on just one column of the table – what has Oracle really done?
I’ve then deleted a couple of thousand rows and created an index on the table. (Why did I do the delete?) Since this is 11g (at least) what extra stats have come into existence?
I’ve then gathered stats for just the indexed columns – what has Oracle really done?
Answers
The script I wrote to report statistics was a set of simple queries against user_tables, user_indexes, user_tab_cols, and user_tab_histograms in turn. After the call to delete_table_stats() the first reportshowed that there were no statistics about the table in any of these views.
After the gathering stats but specifying just one column the report showed the following
Table stats =========== BLOCKS NUM_ROWS RLEN CHAIN_CNT Sample ---------- ---------- ---------- ---------- ------------ 1009 50000 135 0 50,000 Table column stats ================== COLUMN_NAME Sample Distinct NUM_NULLS DENSITY HISTOGRAM Buckets CLEN GLO LOW_VALUE HIGH_VALUE -------------------------------- ------------ ------------ ---------- ---------- --------------- ---------- ---- --- -------------------------- ----------------- APPLICATION NONE NO CREATED 50,000 532 0 .001879699 NONE 1 8 YES 7877041101390F 78790A0E0A330F CREATED_APPID NONE NO ... Table histogram stats ===================== COLUMN_NAME ENDPOINT_NUMBER ENDPOINT_VALUE ENDPOINT_ACTUAL_VALUE -------------------------------- --------------- -------------- ---------------------------------------- CREATED 0 2458591.04 1 2459502.41
Obviously we expect to see the column stats for just the one column, and the histogram is just reporting the low and high values since we have requested “no histogram”. The interesting point, though is that the table stats show that Oracle has also derived an average row length (avg_row_len – heading rlen) – so where did that come from.
Index stats =========== INDEX_NAME BLEVEL LEAF_BLOCKS NUM_ROWS DISTINCT_KEYS CLUSTERING_FACTOR Sample AVG_LBK AVG_DBK -------------------- ---------- ----------- ---------- ------------- ----------------- ------------ ---------- ---------- T1_I1 1 107 48000 48000 1003 48,000 1 1
After deleting 2,000 rows and creating an ordinary B-tree index we get an extra entry in the stats report which is about the index stats; all the other parts of the report remain unchanged. We don’t see any column stats of the column used to define the index, but one thing we do see is that the user_indexes correctly reports 48,000 as its num_rows while the num_rows in user_tables has not been updated with this bit of information.
Little anecdote: about 20 years ago I was called in to find the root cause of a performance problem. After a 3 hour management meeting I was allowed to examine the 10053 trace the DBAs had already dumped in anticipation of my arrival – it took 10 minutes to find an Oracle bug that appeared when the num_rows for a primary key index was smaller than the num_rows for the table. (The bug was fixed in the next release). And, of course, you should remember that num_rows for an index could well be much smaller than num_rows in user_tables thanks to NULLs.
After the call to gather stats “for all indexed columns” the report produced the following results
Table stats (updated) ===================== BLOCKS NUM_ROWS RLEN CHAIN_CNT Sample ---------- ---------- ---------- ---------- ------------ 1009 48000 136 0 48,000 Index stats (no change) ======================= INDEX_NAME BLEVEL LEAF_BLOCKS NUM_ROWS DISTINCT_KEYS CLUSTERING_FACTOR Sample AVG_LBK AVG_DBK -------------------- ---------- ----------- ---------- ------------- ----------------- ------------ ---------- ---------- T1_I1 1 107 48000 48000 1003 48,000 1 1 Table column stats - indexed columns now set ============================================ COLUMN_NAME Sample Distinct NUM_NULLS DENSITY HISTOGRAM Buckets CLEN GLO LOW_VALUE HIGH_VALUE -------------------------------- ------------ ------------ ---------- ---------- --------------- ---------- ---- --- -------------------------- -------------------------- APPLICATION NONE NO CREATED 50,000 532 0 .001879699 NONE 1 8 YES 7877041101390F 78790A0E0A330F CREATED_APPID NONE NO .. NAMESPACE NONE NO OBJECT_ID 48,000 48,000 0 .000020833 NONE 1 5 YES C22547 C3083902 OBJECT_NAME NONE NO ... Table histogram stats (indexd columns now added) ================================================ COLUMN_NAME ENDPOINT_NUMBER ENDPOINT_VALUE ENDPOINT_ACTUAL_VALUE -------------------------------- --------------- -------------- ---------------------------------------- CREATED 0 2458591.04 1 2459502.41 OBJECT_ID 0 3670 1 75601
It’s not surprising to see that the index stats haven’t changed and the indexed column now has stats. A detail of the column stats (which is still not a surprise) is that the original 50,000 sample size is there for the created column; Oracle hasn’t been asked to do anything about it, so it hasn’t done anything about it.
One thing that has changed, though, is that the table stats now show a corrected num_rows, and a change to the average row length (rlen). It’s the row length that is the critical feature I want to highlight in this little quiz. How does Oracle get this estimate? The answer is one you don’t really want to hear: it does a lot of work to calculate it – and for some systems the work done will be huge.
Here’s the SQL (pulled from the tkprof summary of a trace file, with a little cosmetic editing) that Oracle ran in 19.11 to gather “just” the stats on the created column.
select /*+ full(t) no_parallel(t) no_parallel_index(t) dbms_stats cursor_sharing_exact use_weak_name_resl dynamic_sampling(0) no_monitoring xmlindex_sel_idx_tbl opt_param('optimizer_inmemory_aware' 'false') no_substrb_pad */ to_char(count("OWNER")),to_char(count("OBJECT_NAME")), to_char(count("SUBOBJECT_NAME")),to_char(count("OBJECT_ID")), to_char(count("DATA_OBJECT_ID")),to_char(count("OBJECT_TYPE")), to_char(count("CREATED")), substrb(dump(min("CREATED"),16,0,64),1,240), substrb(dump(max("CREATED"),16,0,64),1,240), to_char(count("LAST_DDL_TIME")), to_char(count("TIMESTAMP")),to_char(count("STATUS")), to_char(count("TEMPORARY")),to_char(count("GENERATED")), to_char(count("SECONDARY")),to_char(count("NAMESPACE")), to_char(count("EDITION_NAME")),to_char(count("SHARING")), to_char(count("EDITIONABLE")),to_char(count("ORACLE_MAINTAINED")), to_char(count("APPLICATION")),to_char(count("DEFAULT_COLLATION")), to_char(count("DUPLICATED")),to_char(count("SHARDED")), to_char(count("CREATED_APPID")),to_char(count("CREATED_VSNID")), to_char(count("MODIFIED_APPID")),to_char(count("MODIFIED_VSNID")) from "TEST_USER"."T1" t /* ACL,ACL,ACL,ACL,ACL,ACL,NDV,NIL,NIL,NNV,ACL,ACL,ACL, ACL,ACL,ACL,ACL,ACL,ACL,ACL,ACL,ACL,ACL,ACL,ACL,ACL,ACL,ACL*/
I’ve highlighted the three lines in the statement where the simple stats about the created column are derived in pure SQL, but you can also see that every single column in the table also has the number of non-null entries count, and at the end of the query you can see the “comment” code that triggers the routine to calculate the average column length (ACL) for almost all the columns.
So gathering stats for “just one” column actually does a lot of work getting lengths and counts of all the other columns in the table – and then discards them rather than storing partial stats for the user_tab_cols view.
Summary
When you try to gather stats for a single column of a table – after, for example, you’ve just added a function-based index – Oracle will have to do a lot more work than you might have realised because it want’s to update the average row length (avg_row_len) for the table, and to do that it calculates the average column length (avg_col_len) for every single column.
[…] Quiz night 38 (Jan 2023): what really happens when you try to gather stats on a single column […]
Pingback by Quiz Catalogue | Oracle Scratchpad — January 11, 2023 @ 5:31 pm GMT Jan 11,2023 |
[…] Quiz Night […]
Pingback by Statistics catalogue | Oracle Scratchpad — January 11, 2023 @ 5:33 pm GMT Jan 11,2023 |