Oracle Scratchpad

April 1, 2011

Stats collection

Filed under: Oracle,Statistics — Jonathan Lewis @ 6:44 pm GMT Apr 1,2011

From time to time I see people asking how they can check how far their system has got in it’s call to collect stats. The question is often a little ambiguous – they know which call to dbms_stats they’ve used, so they know whether they’re trying to monitor stats collection against a single table (taking advantage of v$session_longops springs to mind) or against an entire schema.

Here’s one simple-minded approach that I whipped up a few years ago – it’s possible you could do better with the latest versions of dbms_stats. Its purpose is simply to give you some way of checking what work dbms_stats needs to do (in this example, for a schema), so that you can check how much has been done and how much remains. The basic method can be modified in various ways to match your tastes.

 

We start with the simple observation that many calls to dbms_stats have an “options” parameter that allows you to specify things like ‘LIST STALE’, rather than ‘GATHER STALE’. The “list” options populate an object table type with details of the data segments whose statistics would be gathered by the corresponding “gather” option. So we could start with a simple piece of code to do the following:

declare
	m_object_list	dbms_stats.objecttab;
begin

	dbms_stats.gather_schema_stats(
		ownname		=> 'test_user',
		options		=> 'LIST AUTO',
--		options		=> 'LIST STALE',
--		options		=> 'LIST EMPTY',
		objlist		=> m_object_list
	);

	for i in 1..m_object_list.count loop
		dbms_output.put_line(
			rpad(m_object_list(i).ownname,30)     ||
			rpad(m_object_list(i).objtype, 6)     ||
			rpad(m_object_list(i).objname,30)     ||
			rpad(m_object_list(i).partname,30)    ||
			rpad(m_object_list(i).subpartname,30) ||
			lpad(m_object_list(i).confidence,4)
		);
	end loop;
end;
/

Call this before you make your call to gather stats (and in general it might be better to use the utl_file package to write to a file rather than using dbms_output and capturing screen output) then, as time passes, you can check the “last_analyzed” column on the relevant view to see which objects are still waiting for their stats to be collected. Of course, this approach is a little clunky, and requires a certain amount of tedious manual labour to get the check done, but once we have the basic principle the rest is easy. Let’s start by using the code in a pipelined function.

create or replace function list_stale (
	i_option	in	varchar2,
	i_user		in	varchar2 default user
)
return dbms_stats.objecttab pipelined
as
	pragma autonomous_transaction;
	m_object_list	dbms_stats.objecttab;
begin

	if i_option not in (
		'LIST AUTO', 'LIST STALE','LIST EMPTY'
	) then
		null;
	else
		dbms_stats.gather_schema_stats(
			ownname		=> i_user,
			options		=> i_option,
			objlist		=> m_object_list
		);

		for i in 1..m_object_list.count loop
			pipe row (m_object_list(i));
		end loop;

		end if;

	return;
end;
/

You’ll notice that I’ve declared the function to run as an autonomous transaction – the call to dbms_stats does various things (such as updating the mon_mods$ and col_usage$ tables) that you’re not supposed to do in a pipelined function, but you can hide these by using the automonous_transaction pragma. (Strangely the error message you get in the absences of the pragma is: “ORA-04092: cannot COMMIT in a trigger”.)

With this definition I can call the function with code like:

select  *
from    table(list_stale('LIST AUTO'))

OWNNAME      OBJTYP OBJNAME         PARTNAME         SUBPARTNAME            CONFIDENCE
------------ ------ --------------- ---------------- ---------------------- ----------
TEST_USER    TABLE  PT_RANGE                                                       100
TEST_USER    TABLE  PT_RANGE        P600                                           100
TEST_USER    TABLE  PT_RL                                                          100
TEST_USER    TABLE  PT_RL           P_2002_MAR                                     100
TEST_USER    TABLE  PT_RL           P_2002_MAR       P_2002_MAR_THE_REST           100
TEST_USER    TABLE  PT_RL           P_2002_MAR       P_2002_MAR_MN                 100
TEST_USER    TABLE  PT_RL           P_2002_MAR       P_2002_MAR_EAST_COAST         100
TEST_USER    TABLE  PT_RL           P_2002_MAR       P_2002_MAR_CA                 100
TEST_USER    TABLE  T1                                                             100
TEST_USER    INDEX  PT_PK                                                          100
TEST_USER    INDEX  PT_PK           P600                                           100
TEST_USER    INDEX  T1_PK                                                          100
        ... etc ...

This will give me a list of all objects in my schema that need stats collected – either because they have no stats, or their stats are stale.

But I can be more subtle than this. I could, for example, write a query that joins this “table” to the view dba_segments and keep running it as the stats collection proceeds to report the objects that are still waiting for stats and the sizes of those objects. The SQL needs just a little thought as the objtype is only reported as “TABLE” or “INDEX” so you have to do a UNION ALL in the join and work out which rows are really for partitions and which for subpartitions. And then you have to mess about a bit with outer joins because Oracle will, for example, want to collect stats on a table when a partition is stale – and there is no data segment at the table level of a partitioned table.

Your code might look something like this:

with list_result as (
	select *
	from	table(list_stale('LIST AUTO'))
)
select
	lst.objname, nvl(seg.segment_type,lst.objtype) segment_type, lst.partname, seg.blocks
from
	list_result	lst,
	dba_segments	seg
where
	lst.partname is null
and	seg.owner(+) = lst.ownname
and	seg.segment_type(+) = lst.objtype
and	seg.segment_name(+) = lst.objname
and	seg.partition_name is null
union all
select
	lst.objname, nvl(seg.segment_type,lst.objtype || ' PARTITION') segment_type, lst.partname, seg.blocks
from
	list_result	lst,
	dba_segments	seg
where
	lst.subpartname is null
and	lst.partname is not null
and	seg.owner(+) = lst.ownname
and	substr(seg.segment_type(+),1,5) = lst.objtype
and	seg.segment_name(+) = lst.objname
and	seg.partition_name(+) = lst.partname
union all
select
	lst.objname, segment_type, lst.partname, seg.blocks
from
	list_result	lst,
	dba_segments	seg
where
	lst.subpartname is not null
and	seg.owner = lst.ownname
and	substr(seg.segment_type,1,5) = lst.objtype
and	seg.segment_name = lst.objname
and	seg.partition_name = lst.subpartname
order by
	1,2,3
/

In a tiny schema, where every data segment is one of my “1MB uniform extents”, this is the sample of output I got:

OBJNAME         SEGMENT_TYPE       PARTNAME                           BLOCKS
--------------- ------------------ ------------------------------ ----------
PT_PK           INDEX
PT_PK           INDEX PARTITION    P200                                  128
PT_PK           INDEX PARTITION    P400                                  128
PT_PK           INDEX PARTITION    P600                                  128
PT_RANGE        TABLE
PT_RANGE        TABLE PARTITION    P200                                  128
PT_RANGE        TABLE PARTITION    P400                                  128
PT_RANGE        TABLE PARTITION    P600                                  128
PT_RL           TABLE
PT_RL           TABLE PARTITION    P_2002_FEB
PT_RL           TABLE PARTITION    P_2002_MAR
PT_RL           TABLE SUBPARTITION P_2002_FEB                            128
PT_RL           TABLE SUBPARTITION P_2002_FEB                            128
PT_RL           TABLE SUBPARTITION P_2002_FEB                            128
PT_RL           TABLE SUBPARTITION P_2002_FEB                            128
PT_RL           TABLE SUBPARTITION P_2002_MAR                            128
PT_RL           TABLE SUBPARTITION P_2002_MAR                            128
PT_RL           TABLE SUBPARTITION P_2002_MAR                            128
PT_RL           TABLE SUBPARTITION P_2002_MAR                            128
RL_PK           INDEX
RL_PK           INDEX PARTITION    P_2002_FEB
RL_PK           INDEX PARTITION    P_2002_MAR
RL_PK           INDEX SUBPARTITION P_2002_FEB                            128
RL_PK           INDEX SUBPARTITION P_2002_FEB                            128
RL_PK           INDEX SUBPARTITION P_2002_FEB                            128
RL_PK           INDEX SUBPARTITION P_2002_FEB                            128
RL_PK           INDEX SUBPARTITION P_2002_MAR                            128
RL_PK           INDEX SUBPARTITION P_2002_MAR                            128
RL_PK           INDEX SUBPARTITION P_2002_MAR                            128
RL_PK           INDEX SUBPARTITION P_2002_MAR                            128
T1              TABLE                                                    128
T1_PK           INDEX                                                    128
T2              TABLE                                                    128
T2_I1           INDEX                                                    128

Obviously you could do something more complex to associate indexes with their tables, then sum subpartitions back to partitions and partitions back to table level stats – but a simple list of objects with primary physical sizes might be sufficient to give you an idea of how much work is still to be done. If you want to be really elegant (or extravagant) you could even combine this SQL with another piece reporting the objects with last_analyzed more recent than the start of run including, if you’re not running a parallel collection, the approximate time taken for each object (think lag() and comparing consecutive times).

10 Comments »

  1. As always, great post.

    On a side note, have probably read this but just curious, in terms of stats collection, what your experience has been with “having” to collect extended stats:

    http://blogs.oracle.com/optimizer/2011/03/how_do_i_know_what_extended_statistics_are_needed_for_a_given_workload.html

    Thanks and great post.

    Comment by ellis — April 1, 2011 @ 10:53 pm GMT Apr 1,2011 | Reply

    • Ellis,

      Thanks for the link – I hadn’t noticed that post.

      Extended stats can be very helpful and I would expect to find a few cases on any site I visit where they could be useful. However, although I might look at the capturing table and report to see what it said, I don’t think I would let Oracle generate extended stats automatically.

      I have the same opinion of virtual columns, by the way. It’s another variation on the same theme, of course, but again I would expect to find a few cases on any system where they could be helpful.

      Comment by Jonathan Lewis — April 5, 2011 @ 6:04 pm GMT Apr 5,2011 | Reply

  2. Typo : ‘it is mon_mods$’ not mon_mod$.

    SQL> desc mon_mods$
    Name Null? Type
    —————————————– ——– —————————-
    OBJ# NUMBER
    INSERTS NUMBER
    UPDATES NUMBER
    DELETES NUMBER
    TIMESTAMP DATE
    FLAGS NUMBER
    DROP_SEGMENTS NUMBER

    Please, delete this comment when dequeued end consumed.

    Comment by Bernard Polarski — April 5, 2011 @ 9:40 am GMT Apr 5,2011 | Reply

  3. Hi Sir,

    I created a partition table (monthly) and inserted rows into it.Gathered the stats of the table using exec dbms_stats.gather_table_stats(‘ANAND’,’SALES_PART’);. Then i locked the stats of the table using dbms_stats.lock_table_stats and did insert into sales_part select * from sales_part where time_id >’01-NOV-10′; which inserted data in December partition.

    After doing the insert checked dba_tab_statistics

    21:17:31 SQL> select OWNER,TABLE_NAME,PARTITION_NAME,PARTITION_POSITION,STATTYPE_LOCKED ,STALE_STATS
    21:17:35 2 from dba_tab_statistics
    21:17:35 3 where owner =’ANAND’
    21:17:36 4 and STATTYPE_LOCKED is not null
    21:17:36 5 order by 2
    21:17:37 6 /

    OWNER TABLE_NAME PARTITION_NAME PARTITION_POSITION STATT STA
    —————————— ————— ————————- —————— —– —
    ANAND SALES_PART ALL NO
    ANAND SALES_PART SALE_APR 4 ALL NO
    ANAND SALES_PART SALE_AUG 8 ALL NO
    ANAND SALES_PART SALE_DEC 12 ALL NO
    ANAND SALES_PART SALE_FEB 2 ALL NO
    ANAND SALES_PART SALE_JAN 1 ALL NO
    ANAND SALES_PART SALE_SEPT 9 ALL NO
    ANAND SALES_PART SALE_JUN 6 ALL NO
    ANAND SALES_PART SALE_MAY 5 ALL NO
    ANAND SALES_PART SALE_MCH 3 ALL NO
    ANAND SALES_PART SALE_NOV 11 ALL NO
    ANAND SALES_PART SALE_OCT 10 ALL NO
    ANAND SALES_PART SALE_JUL 7 ALL NO

    So, the STALE_STATS column for SALE_DEC partition shows NO.

    Immediately, then i ran

    select * from table(list_stale(‘LIST AUTO’)); –> It didn’t list the table/partition name .

    After, which dba_tab_statistics started showing STALE_STATS value as YES for SALE_DEC partition.

    21:21:24 SQL> select OWNER,TABLE_NAME,PARTITION_NAME,PARTITION_POSITION,STATTYPE_LOCKED ,STALE_STATS
    21:21:26 2 from dba_tab_statistics
    21:21:27 3 where owner =’ANAND’
    21:21:27 4 and STATTYPE_LOCKED is not null
    21:21:28 5 order by 2
    21:21:28 6 /

    OWNER TABLE_NAME PARTITION_NAME PARTITION_POSITION STATT STA
    —————————— ————— ————————- —————— —– —
    ANAND SALES_PART ALL YES
    ANAND SALES_PART SALE_APR 4 ALL NO
    ANAND SALES_PART SALE_AUG 8 ALL NO
    ANAND SALES_PART SALE_DEC 12 ALL YES
    ANAND SALES_PART SALE_FEB 2 ALL NO
    ANAND SALES_PART SALE_JAN 1 ALL NO
    ANAND SALES_PART SALE_SEPT 9 ALL NO
    ANAND SALES_PART SALE_JUN 6 ALL NO
    ANAND SALES_PART SALE_MAY 5 ALL NO
    ANAND SALES_PART SALE_MCH 3 ALL NO
    ANAND SALES_PART SALE_NOV 11 ALL NO
    ANAND SALES_PART SALE_OCT 10 ALL NO
    ANAND SALES_PART SALE_JUL 7 ALL NO

    Why is it that after running the function dba_tab_statistics is showing stats as STALE, why not before it?

    Comment by Anand — April 8, 2011 @ 4:12 pm GMT Apr 8,2011 | Reply

    • Monitoring Info is not flushed permanently, as can be seen also by selecting from dba_tab_modifications.
      There ist the procedure dbms_stats.flush_database_monitoring_info to flush current info.
      Afterward changes are visible.

      Guess: I assume, not tested, upon using the list stale option, this is done internally as well. An this is
      the reason why this information is seen after the call.

      Comment by Stefan — April 10, 2011 @ 6:08 pm GMT Apr 10,2011 | Reply

    • Anand,

      I think Stefan has probably answered your question correctly.

      There is a memory structure (x$) which collects (approximate) information about data changes, and various things flush this in-memory information to the table. You can flush the information with an explicit call, and there are various routines which do the flush as their first step. Your example probably demonstrates a timing issue about whether or not the flush takes place.

      You could always investigate by enabling tracing and seeing what SQL gets generated behind the scenes as you do your tests.

      Comment by Jonathan Lewis — April 20, 2011 @ 1:33 pm GMT Apr 20,2011 | Reply

  4. The database version is 11.2.0.1.0

    Comment by Anand — April 8, 2011 @ 4:13 pm GMT Apr 8,2011 | Reply

  5. Dear Sir,
    In dbms_stats.gather_table_stats() function, there is a parameter called estimate_percent. It tells oracle to sample that much amount to gather statistics.
    We have lots of tables ranging from 100gb to 1 tb size. Weekly we take table stats with estimate_percent=100 and degree=10. The tables are highly DML active so our senior DBA argues that estimate_percent=100 is the best choice as there are lots of deletions/insertions/updations.
    But it takes a lot of time to calculate statistics.
    Is it really neccessary to estimate_percent to set 100?

    Comment by A DBA — November 8, 2011 @ 8:46 am GMT Nov 8,2011 | Reply

    • A DBA,

      No, it’s not necessary.

      The best strategy depends on version of Oracle, options licensed, time available, and rate of change.

      In 11g I would look at using the default (auto_sample_size) because this allows Oracle to use its new ‘approximate NDV’ to calculate the number of distinct values for each column; however I would disable all histogram collection and write code to generate histograms where I thought it necessary.

      There’s also the question of partitioning – a monolithic table (with indexes) of 1TB is probably not a good thing (nor is 100GB, for that matter), and you should probably looking at partitioned tables and gathering stats only for changed partitions.

      The topic is too big for a blog note – but it’s one a talk about in my chapter of Expert Oracle Practices

      Comment by Jonathan Lewis — November 13, 2011 @ 4:21 pm GMT Nov 13,2011 | Reply


RSS feed for comments on this post. TrackBack URI

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Theme: Rubric. Get a free blog at WordPress.com

Follow

Get every new post delivered to your Inbox.

Join 4,431 other followers