From time to time I see people asking how they can check how far their system has got in it’s call to collect stats. The question is often a little ambiguous – they know which call to dbms_stats they’ve used, so they know whether they’re trying to monitor stats collection against a single table (taking advantage of v$session_longops springs to mind) or against an entire schema.
Here’s one simple-minded approach that I whipped up a few years ago – it’s possible you could do better with the latest versions of dbms_stats. Its purpose is simply to give you some way of checking what work dbms_stats needs to do (in this example, for a schema), so that you can check how much has been done and how much remains. The basic method can be modified in various ways to match your tastes.
We start with the simple observation that many calls to dbms_stats have an “options” parameter that allows you to specify things like ‘LIST STALE’, rather than ‘GATHER STALE’. The “list” options populate an object table type with details of the data segments whose statistics would be gathered by the corresponding “gather” option. So we could start with a simple piece of code to do the following:
declare m_object_list dbms_stats.objecttab; begin dbms_stats.gather_schema_stats( ownname => 'test_user', options => 'LIST AUTO', -- options => 'LIST STALE', -- options => 'LIST EMPTY', objlist => m_object_list ); for i in 1..m_object_list.count loop dbms_output.put_line( rpad(m_object_list(i).ownname,30) || rpad(m_object_list(i).objtype, 6) || rpad(m_object_list(i).objname,30) || rpad(m_object_list(i).partname,30) || rpad(m_object_list(i).subpartname,30) || lpad(m_object_list(i).confidence,4) ); end loop; end; /
Call this before you make your call to gather stats (and in general it might be better to use the utl_file package to write to a file rather than using dbms_output and capturing screen output) then, as time passes, you can check the “last_analyzed” column on the relevant view to see which objects are still waiting for their stats to be collected. Of course, this approach is a little clunky, and requires a certain amount of tedious manual labour to get the check done, but once we have the basic principle the rest is easy. Let’s start by using the code in a pipelined function.
create or replace function list_stale ( i_option in varchar2, i_user in varchar2 default user ) return dbms_stats.objecttab pipelined as pragma autonomous_transaction; m_object_list dbms_stats.objecttab; begin if i_option not in ( 'LIST AUTO', 'LIST STALE','LIST EMPTY' ) then null; else dbms_stats.gather_schema_stats( ownname => i_user, options => i_option, objlist => m_object_list ); for i in 1..m_object_list.count loop pipe row (m_object_list(i)); end loop; end if; return; end; /
You’ll notice that I’ve declared the function to run as an autonomous transaction – the call to dbms_stats does various things (such as updating the mon_mods$ and col_usage$ tables) that you’re not supposed to do in a pipelined function, but you can hide these by using the automonous_transaction pragma. (Strangely the error message you get in the absences of the pragma is: “ORA-04092: cannot COMMIT in a trigger”.)
With this definition I can call the function with code like:
select *
from table(list_stale('LIST AUTO'))
OWNNAME OBJTYP OBJNAME PARTNAME SUBPARTNAME CONFIDENCE
------------ ------ --------------- ---------------- ---------------------- ----------
TEST_USER TABLE PT_RANGE 100
TEST_USER TABLE PT_RANGE P600 100
TEST_USER TABLE PT_RL 100
TEST_USER TABLE PT_RL P_2002_MAR 100
TEST_USER TABLE PT_RL P_2002_MAR P_2002_MAR_THE_REST 100
TEST_USER TABLE PT_RL P_2002_MAR P_2002_MAR_MN 100
TEST_USER TABLE PT_RL P_2002_MAR P_2002_MAR_EAST_COAST 100
TEST_USER TABLE PT_RL P_2002_MAR P_2002_MAR_CA 100
TEST_USER TABLE T1 100
TEST_USER INDEX PT_PK 100
TEST_USER INDEX PT_PK P600 100
TEST_USER INDEX T1_PK 100
... etc ...
This will give me a list of all objects in my schema that need stats collected – either because they have no stats, or their stats are stale.
But I can be more subtle than this. I could, for example, write a query that joins this “table” to the view dba_segments and keep running it as the stats collection proceeds to report the objects that are still waiting for stats and the sizes of those objects. The SQL needs just a little thought as the objtype is only reported as “TABLE” or “INDEX” so you have to do a UNION ALL in the join and work out which rows are really for partitions and which for subpartitions. And then you have to mess about a bit with outer joins because Oracle will, for example, want to collect stats on a table when a partition is stale – and there is no data segment at the table level of a partitioned table.
Your code might look something like this:
with list_result as (
select *
from table(list_stale('LIST AUTO'))
)
select
lst.objname, nvl(seg.segment_type,lst.objtype) segment_type, lst.partname, seg.blocks
from
list_result lst,
dba_segments seg
where
lst.partname is null
and seg.owner(+) = lst.ownname
and seg.segment_type(+) = lst.objtype
and seg.segment_name(+) = lst.objname
and seg.partition_name is null
union all
select
lst.objname, nvl(seg.segment_type,lst.objtype || ' PARTITION') segment_type, lst.partname, seg.blocks
from
list_result lst,
dba_segments seg
where
lst.subpartname is null
and lst.partname is not null
and seg.owner(+) = lst.ownname
and substr(seg.segment_type(+),1,5) = lst.objtype
and seg.segment_name(+) = lst.objname
and seg.partition_name(+) = lst.partname
union all
select
lst.objname, segment_type, lst.partname, seg.blocks
from
list_result lst,
dba_segments seg
where
lst.subpartname is not null
and seg.owner = lst.ownname
and substr(seg.segment_type,1,5) = lst.objtype
and seg.segment_name = lst.objname
and seg.partition_name = lst.subpartname
order by
1,2,3
/
In a tiny schema, where every data segment is one of my “1MB uniform extents”, this is the sample of output I got:
OBJNAME SEGMENT_TYPE PARTNAME BLOCKS --------------- ------------------ ------------------------------ ---------- PT_PK INDEX PT_PK INDEX PARTITION P200 128 PT_PK INDEX PARTITION P400 128 PT_PK INDEX PARTITION P600 128 PT_RANGE TABLE PT_RANGE TABLE PARTITION P200 128 PT_RANGE TABLE PARTITION P400 128 PT_RANGE TABLE PARTITION P600 128 PT_RL TABLE PT_RL TABLE PARTITION P_2002_FEB PT_RL TABLE PARTITION P_2002_MAR PT_RL TABLE SUBPARTITION P_2002_FEB 128 PT_RL TABLE SUBPARTITION P_2002_FEB 128 PT_RL TABLE SUBPARTITION P_2002_FEB 128 PT_RL TABLE SUBPARTITION P_2002_FEB 128 PT_RL TABLE SUBPARTITION P_2002_MAR 128 PT_RL TABLE SUBPARTITION P_2002_MAR 128 PT_RL TABLE SUBPARTITION P_2002_MAR 128 PT_RL TABLE SUBPARTITION P_2002_MAR 128 RL_PK INDEX RL_PK INDEX PARTITION P_2002_FEB RL_PK INDEX PARTITION P_2002_MAR RL_PK INDEX SUBPARTITION P_2002_FEB 128 RL_PK INDEX SUBPARTITION P_2002_FEB 128 RL_PK INDEX SUBPARTITION P_2002_FEB 128 RL_PK INDEX SUBPARTITION P_2002_FEB 128 RL_PK INDEX SUBPARTITION P_2002_MAR 128 RL_PK INDEX SUBPARTITION P_2002_MAR 128 RL_PK INDEX SUBPARTITION P_2002_MAR 128 RL_PK INDEX SUBPARTITION P_2002_MAR 128 T1 TABLE 128 T1_PK INDEX 128 T2 TABLE 128 T2_I1 INDEX 128
Obviously you could do something more complex to associate indexes with their tables, then sum subpartitions back to partitions and partitions back to table level stats – but a simple list of objects with primary physical sizes might be sufficient to give you an idea of how much work is still to be done. If you want to be really elegant (or extravagant) you could even combine this SQL with another piece reporting the objects with last_analyzed more recent than the start of run including, if you’re not running a parallel collection, the approximate time taken for each object (think lag() and comparing consecutive times).

As always, great post.
On a side note, have probably read this but just curious, in terms of stats collection, what your experience has been with “having” to collect extended stats:
http://blogs.oracle.com/optimizer/2011/03/how_do_i_know_what_extended_statistics_are_needed_for_a_given_workload.html
Thanks and great post.
Comment by ellis — April 1, 2011 @ 10:53 pm UTC Apr 1,2011 |
Ellis,
Thanks for the link – I hadn’t noticed that post.
Extended stats can be very helpful and I would expect to find a few cases on any site I visit where they could be useful. However, although I might look at the capturing table and report to see what it said, I don’t think I would let Oracle generate extended stats automatically.
I have the same opinion of virtual columns, by the way. It’s another variation on the same theme, of course, but again I would expect to find a few cases on any system where they could be helpful.
Comment by Jonathan Lewis — April 5, 2011 @ 6:04 pm UTC Apr 5,2011 |
Typo : ‘it is mon_mods$’ not mon_mod$.
SQL> desc mon_mods$
Name Null? Type
—————————————– ——– —————————-
OBJ# NUMBER
INSERTS NUMBER
UPDATES NUMBER
DELETES NUMBER
TIMESTAMP DATE
FLAGS NUMBER
DROP_SEGMENTS NUMBER
Please, delete this comment when dequeued end consumed.
Comment by Bernard Polarski — April 5, 2011 @ 9:40 am UTC Apr 5,2011 |
Bernard,
Thanks for the correction – now fixed.
Comment by Jonathan Lewis — April 5, 2011 @ 6:00 pm UTC Apr 5,2011 |
Hi Sir,
I created a partition table (monthly) and inserted rows into it.Gathered the stats of the table using exec dbms_stats.gather_table_stats(‘ANAND’,'SALES_PART’);. Then i locked the stats of the table using dbms_stats.lock_table_stats and did insert into sales_part select * from sales_part where time_id >’01-NOV-10′; which inserted data in December partition.
After doing the insert checked dba_tab_statistics
21:17:31 SQL> select OWNER,TABLE_NAME,PARTITION_NAME,PARTITION_POSITION,STATTYPE_LOCKED ,STALE_STATS
21:17:35 2 from dba_tab_statistics
21:17:35 3 where owner =’ANAND’
21:17:36 4 and STATTYPE_LOCKED is not null
21:17:36 5 order by 2
21:17:37 6 /
OWNER TABLE_NAME PARTITION_NAME PARTITION_POSITION STATT STA
—————————— ————— ————————- —————— —– —
ANAND SALES_PART ALL NO
ANAND SALES_PART SALE_APR 4 ALL NO
ANAND SALES_PART SALE_AUG 8 ALL NO
ANAND SALES_PART SALE_DEC 12 ALL NO
ANAND SALES_PART SALE_FEB 2 ALL NO
ANAND SALES_PART SALE_JAN 1 ALL NO
ANAND SALES_PART SALE_SEPT 9 ALL NO
ANAND SALES_PART SALE_JUN 6 ALL NO
ANAND SALES_PART SALE_MAY 5 ALL NO
ANAND SALES_PART SALE_MCH 3 ALL NO
ANAND SALES_PART SALE_NOV 11 ALL NO
ANAND SALES_PART SALE_OCT 10 ALL NO
ANAND SALES_PART SALE_JUL 7 ALL NO
So, the STALE_STATS column for SALE_DEC partition shows NO.
Immediately, then i ran
select * from table(list_stale(‘LIST AUTO’)); –> It didn’t list the table/partition name .
After, which dba_tab_statistics started showing STALE_STATS value as YES for SALE_DEC partition.
21:21:24 SQL> select OWNER,TABLE_NAME,PARTITION_NAME,PARTITION_POSITION,STATTYPE_LOCKED ,STALE_STATS
21:21:26 2 from dba_tab_statistics
21:21:27 3 where owner =’ANAND’
21:21:27 4 and STATTYPE_LOCKED is not null
21:21:28 5 order by 2
21:21:28 6 /
OWNER TABLE_NAME PARTITION_NAME PARTITION_POSITION STATT STA
—————————— ————— ————————- —————— —– —
ANAND SALES_PART ALL YES
ANAND SALES_PART SALE_APR 4 ALL NO
ANAND SALES_PART SALE_AUG 8 ALL NO
ANAND SALES_PART SALE_DEC 12 ALL YES
ANAND SALES_PART SALE_FEB 2 ALL NO
ANAND SALES_PART SALE_JAN 1 ALL NO
ANAND SALES_PART SALE_SEPT 9 ALL NO
ANAND SALES_PART SALE_JUN 6 ALL NO
ANAND SALES_PART SALE_MAY 5 ALL NO
ANAND SALES_PART SALE_MCH 3 ALL NO
ANAND SALES_PART SALE_NOV 11 ALL NO
ANAND SALES_PART SALE_OCT 10 ALL NO
ANAND SALES_PART SALE_JUL 7 ALL NO
Why is it that after running the function dba_tab_statistics is showing stats as STALE, why not before it?
Comment by Anand — April 8, 2011 @ 4:12 pm UTC Apr 8,2011 |
Monitoring Info is not flushed permanently, as can be seen also by selecting from dba_tab_modifications.
There ist the procedure dbms_stats.flush_database_monitoring_info to flush current info.
Afterward changes are visible.
Guess: I assume, not tested, upon using the list stale option, this is done internally as well. An this is
the reason why this information is seen after the call.
Comment by Stefan — April 10, 2011 @ 6:08 pm UTC Apr 10,2011 |
Anand,
I think Stefan has probably answered your question correctly.
There is a memory structure (x$) which collects (approximate) information about data changes, and various things flush this in-memory information to the table. You can flush the information with an explicit call, and there are various routines which do the flush as their first step. Your example probably demonstrates a timing issue about whether or not the flush takes place.
You could always investigate by enabling tracing and seeing what SQL gets generated behind the scenes as you do your tests.
Comment by Jonathan Lewis — April 20, 2011 @ 1:33 pm UTC Apr 20,2011 |
The database version is 11.2.0.1.0
Comment by Anand — April 8, 2011 @ 4:13 pm UTC Apr 8,2011 |
Dear Sir,
In dbms_stats.gather_table_stats() function, there is a parameter called estimate_percent. It tells oracle to sample that much amount to gather statistics.
We have lots of tables ranging from 100gb to 1 tb size. Weekly we take table stats with estimate_percent=100 and degree=10. The tables are highly DML active so our senior DBA argues that estimate_percent=100 is the best choice as there are lots of deletions/insertions/updations.
But it takes a lot of time to calculate statistics.
Is it really neccessary to estimate_percent to set 100?
Comment by A DBA — November 8, 2011 @ 8:46 am UTC Nov 8,2011 |
A DBA,
No, it’s not necessary.
The best strategy depends on version of Oracle, options licensed, time available, and rate of change.
In 11g I would look at using the default (auto_sample_size) because this allows Oracle to use its new ‘approximate NDV’ to calculate the number of distinct values for each column; however I would disable all histogram collection and write code to generate histograms where I thought it necessary.
There’s also the question of partitioning – a monolithic table (with indexes) of 1TB is probably not a good thing (nor is 100GB, for that matter), and you should probably looking at partitioned tables and gathering stats only for changed partitions.
The topic is too big for a blog note – but it’s one a talk about in my chapter of Expert Oracle Practices
Comment by Jonathan Lewis — November 13, 2011 @ 4:21 pm UTC Nov 13,2011 |