Oracle Scratchpad

April 27, 2016

Stats History

Filed under: Histograms,Infrastructure,Oracle,Statistics — Jonathan Lewis @ 1:09 pm BST Apr 27,2016

From time to time we see a complaint on OTN about the stats history tables being the largest objects in the SYSAUX tablespace and growing very quickly, with requests about how to work around the (perceived) threat. The quick answer is – if you need to save space then stop holding on to the history for so long, and then clean up the mess left by the history that you have captured; on top of that you could stop gathering so many histograms because you probably don’t need them, they often introduce instability to your execution plans, and they are often the largest single component of the history (unless you are using incremental stats on partitioned objects***)

For many databases it’s the histogram history – using the default Oracle automatic stats collection job – that takes the most space, here’s a sample query that the sys user can run to get some idea of how significant this history can be:


SQL> select table_name , blocks from user_tables where table_name like 'WRI$_OPTSTAT%HISTORY' order by blocks;

TABLE_NAME                           BLOCKS
-------------------------------- ----------
WRI$_OPTSTAT_AUX_HISTORY                 80
WRI$_OPTSTAT_TAB_HISTORY                244
WRI$_OPTSTAT_IND_HISTORY                622
WRI$_OPTSTAT_HISTHEAD_HISTORY          1378
WRI$_OPTSTAT_HISTGRM_HISTORY           2764

5 rows selected.

As you can see the “histhead” and “histgrm” tables (histogram header and histogram detail) are the largest stats history tables in this (admittedly very small) database.

Oracle gives us a couple of calls in the dbms_stats package to check and change the history setting, demonstrated as follows:


SQL> select dbms_stats.get_stats_history_retention from dual;

GET_STATS_HISTORY_RETENTION
---------------------------
                         31

1 row selected.

SQL> execute dbms_stats.alter_stats_history_retention(7)

PL/SQL procedure successfully completed.

SQL> select dbms_stats.get_stats_history_retention from dual;

GET_STATS_HISTORY_RETENTION
---------------------------
                          7

1 row selected.

Changing the retention period doesn’t reclaim any space, of course – it simply tells Oracle how much of the existing history to eliminate in the next “clean-up” cycle. This clean-up is controllled by a “savtime” column in each table:

SQL> select table_name from user_tab_columns where column_name = 'SAVTIME' and table_name like 'WRI$_OPTSTAT%HISTORY';

TABLE_NAME
--------------------------------
WRI$_OPTSTAT_AUX_HISTORY
WRI$_OPTSTAT_HISTGRM_HISTORY
WRI$_OPTSTAT_HISTHEAD_HISTORY
WRI$_OPTSTAT_IND_HISTORY
WRI$_OPTSTAT_TAB_HISTORY

5 rows selected.

If all you wanted to do was stop the tables from growing further you’ve probably done all you need to do. From this point onwards the automatic Oracle job will start deleting the oldest saved stats and re-using space in the existing table. But you may want to be a little more aggressive about tidying things up, and Oracle gives you a procedure to do this – and it might be sensible to use this procedure anyway at a time of your own choosing:


SQL> execute dbms_stats.purge_stats(sysdate - 7);

Basically this issues a series of delete statements (including a delete on the “stats operation log (wri$_optstat_opr)” table that I haven’t previously mentioned) – here’s an extract from an 11g trace file of a call to this procedure (output from a simple grep command):


delete /*+ dynamic_sampling(4) */ from sys.wri$_optstat_tab_history          where savtime < :1 and rownum <= NVL(:2, rownum)
delete /*+ dynamic_sampling(4) */ from sys.wri$_optstat_ind_history h        where savtime < :1 and rownum <= NVL(:2, rownum)
delete /*+ dynamic_sampling(4) */ from sys.wri$_optstat_aux_history          where savtime < :1 and rownum <= NVL(:2, rownum)
delete /*+ dynamic_sampling(4) */ from sys.wri$_optstat_opr                  where start_time < :1 and rownum <= NVL(:2, rownum)
delete /*+ dynamic_sampling(4) */ from sys.wri$_optstat_histhead_history     where savtime < :1 and rownum <= NVL(:2, rownum)
delete /*+ dynamic_sampling(4) */ from sys.wri$_optstat_histgrm_history      where savtime < :1 and rownum <= NVL(:2, rownum)

Two points to consider here: although the appearance of the rownum clause suggests that there’s a damage limitation strategy built into the code I only saw one commit after the entire delete cycle, and I never saw a limiting bind value being supplied. If you’ve got a large database with very large history tables you might want to delete one day (or even just a few hours) at a time. The potential for a very long, slow, delete is also why you might want to do a manual purge at a time of your choosing rather than letting Oracle do the whole thing on auto-pilot during some overnight operation.

Secondly, even though you may have deleted a lot of data from these table you still haven’t reclaimed the space – so if you’re trying to find space in the sysaux tablespace you’re going to have to rebuild the tables and their indexes. Unfortunately a quick check of v$sysaux_occupants tells us that there is no official “move” producedure:


SQL> execute print_table('select occupant_desc, move_procedure, move_procedure_desc from v$sysaux_occupants where occupant_name = ''SM/OPTSTAT''')

OCCUPANT_DESC                 : Server Manageability - Optimizer Statistics History
MOVE_PROCEDURE                :
MOVE_PROCEDURE_DESC           : *** MOVE PROCEDURE NOT APPLICABLE ***

So we have to run a series of explicit calls to alter table move and alter index rebuild. (Preferably not when anyone is trying to gather stats on an object). Coding that up is left as an exercise to the reader, but it may be best to move the tables in the order of smallest table first, rebuilding indexes as you go.

Footnote:

*** Incremental stats on partitioned objects: I tend to assume that sites which use partitioning are creating very large databases and have probably paid a lot more attention to the details of how to use statistics effectively and successfully; that’s why this note is aimed at sites which don’t use partitioning and therefore think that the space taken up by the stats history significant.

7 Comments »

  1. Thank you Jonathan for that information. Note that I was nonplussed by your “execute print_table” at the end of your post. I googled it and discovered it’s a procedure for formating rows into columns. In case other readers of your blog are, just lilke me, ignorant of that tool, I’m including here a link to an explanation of that nifty procedure: http://bit.ly/1NA5H3B

    Comment by Dear DBA Frank (@fdernoncourt) — April 27, 2016 @ 4:37 pm BST Apr 27,2016 | Reply

  2. I once had a really strange situation in 11.2.0.3 where there was a problem with MMON apparently (witnessed by there being no recent AWR snapshots).
    The stas history cleanup had apparently been failing silently for some time.
    Then what happened with the MMON (?) issue is the delete of stats history seemingly “moved” to the concurrent foreground processes, alarmingly….
    If I remember, there were four foreground sessions which were doing a large amount of cleanup but they were apparently recursive operations from stats calls from application ETL sessions (validated via ASH).

    Here is the post – it raises more questions than there are answers unfortunately:
    https://orastory.wordpress.com/2015/02/25/understanding-vs-resolution-statistics-history-cleanup/

    However, I have seen at least one other situation where this stats history has caused problems requiring immediate resolution.

    And as the stats history is of questionable value day-to-day, I had little hestitation in going for the magic DBMS_STATS.PURGE_ALL flag which does a TRUNCATE not a DELETE, although it is emphasised that this should only be used under the guidance of Oracle Support:
    https://orastory.wordpress.com/2015/02/16/dbms_stats-purge_stats/

    Also, those delete statements with ROWNUM led to these:
    https://orastory.wordpress.com/2015/02/26/count-stopkey-nvl/
    https://jonathanlewis.wordpress.com/2015/05/15/quiz-night-28/

    Comment by Dom Brooks — April 27, 2016 @ 5:08 pm BST Apr 27,2016 | Reply

  3. Thank you for sharing this nice post.
    Foued

    Comment by fouedgray — May 1, 2016 @ 6:34 pm BST May 1,2016 | Reply

  4. Yeah, with incremental stats, I’ve often seen WRI$_OPTSTAT_SYNOPSIS$ become a “monster” in SYSAUX – especially for sites where they’ve opted for things like daily partitions kept “forever”.

    Comment by connormcdonald — May 2, 2016 @ 7:22 am BST May 2,2016 | Reply

    • Connor,

      True, and especially catastrophic with hugely denormalised fact tables with a very large number of columns; and the time to derive global stats from the synopses could become enormous.

      12.2 promises to be much better, with much smaller synopses – though that does mean the accuracy might drop off by a couple of percent, and everyone who’s been accumulating incrementals for the last three years may have to regather for the entire table once.

      Comment by Jonathan Lewis — May 2, 2016 @ 12:13 pm BST May 2,2016 | Reply

  5. Hi Jonathan,

    The rownum-based purge size limitation only applies to automated stats history purging. It isn’t used when you call DBMS_STATS.PURGE_STATS. But as Dom noted in his linked articles, the “rownum < NVL(:2, rownum)" predicate used when the limitation is active doesn't actually work efficiently. Bug 18608261 addresses this problem with stats purging by changing the predicates.

    The preferred and supported method for shrinking the space of the stats history tables in SYSAUX is to use DBMS_STATS.PURGE_STATS(DBMS_STATS.PURGE_ALL). This is far safer and more efficient than repeatedly purging a few days or hours of stats history and then moving the tables and rebuilding their indexes. It's disadvantage is that you lose the entire stats history, but that rebuilds quickly.

    NB: Using PURGE_ALL does not require sign-off from Oracle Support.

    Mark Jefferys
    Oracle Support

    Comment by Mark Jefferys — May 6, 2016 @ 6:50 pm BST May 6,2016 | Reply

    • Mark,

      Thanks for that explanation, and nice to know that there’s a bug fix in place. (For those who have no access to MoS the bug is reported fixed in 12.2 but there are patches available for some platforms for 11.2.0.4 and 12.1.0.2).

      I paused briefly at the “purge_all” to consider the possible need for a mechanism that keeps one generation of stats history (i.e. most recent, a bit like the way rman can be configure) in case a problem with refreshed stats starts to show up after a pattern of “gather, purge, stats-related problem finally appears three days later”. It seems a little unlikely, but that’s just an argument about whether or not to pay insurance premiums.

      I have to say it struck as odd a few years ago when the whole stats history mechanism was introduced that the stats history defaulted to 31 days while the AWR defaulted to just 7 – to my mind the balance was the wrong way round ;)

      Comment by Jonathan Lewis — May 7, 2016 @ 9:46 am BST May 7,2016 | Reply


RSS feed for comments on this post. TrackBack URI

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Blog at WordPress.com.