Oracle Scratchpad

September 16, 2016

Stats collection time

Filed under: Oracle,Statistics,Troubleshooting — Jonathan Lewis @ 12:58 pm GMT Sep 16,2016

Yesterday I posted a note about querying dba_optstat_operations to get a quick report of how long calls to dbms_stats had been taking but said I had another script that helped to fill some of the gaps it left. One of my readers points out fairly promptely that 12c enhances the feature considerably, with a view dba_optstat_operation_tasks that (for example) lists all the tables processed during a single call to gather_schema_stats.

Well, I wrote my script years (if not decades) before 12c came out, so I’m going to publish it anyway.

It definitely falls into the quick and dirty category. It’s not trying to be clever, makes a critical assumption about the recent past, and isn’t a generally useful tool, but I’ve found it a quick way to highlight a possible problem with expensive stats collection routines (especially those written by 3rd parties who have disabled Oracle’s automatic job.)

The concept is very simple: list all the tables in the order they were last analyzed and assume that for any “obviously continuous” stretch of time the start of stats collections for one table coincided with the end of stats collection for the previous one. A little fiddling with the lag() function then allows you to report the time spent on gathering the stats.

rem
rem     stats_check.sql
rem
rem     Quick and dirty to get an idea of
rem     where the time went on the most
rem     recent stats collection job.
rem
rem     Weakness: It assumes that the start
rem     of stats collection of a table is the
rem     end of stats collection for the previous
rem     table.  This may be far from true.
rem

define m_schema = '&1'

set linesize 144
set pagesize 60
set trimspool on

column a_date format a20

spool stats_check

select
        table_name,
        blocks, num_rows, sample_size,
        round(100 * sample_size/nullif(num_rows,0),2)   sample_pct,
        to_char(last_analyzed,'dd-mon-yyyy hh24:mi:ss') a_date,
        to_char(
                round(1440 *
                        (
                        last_analyzed -
                        lag(last_analyzed, 1) over (order by last_analyzed)
                        ), 2
                ),'9,999.00'
        )                        a_time
from
        dba_tables
where
        owner = upper('&m_schema')
order by
        last_analyzed
;

spool off

It’s a very simple script – one detail that’s quite useful is the presence of the data about table size and sample size. But that’s really only the starting point. If you’re gathering stats on a table then most of the time might be spent on gathering stats for histograms or indexes, so there’s plenty of scope to add bits to the script to report number of histograms per table, number of indexes per table, and so on. Personally I tend to do “incremental” trouble-shooting and keep my scripts (and their run time) short, so I’d only worry about specific details if I saw a headline figure that seemed worth a little more effort.

Here’s some (camouflaged) output – which happens to highlight one of the weaknesses of the approach:

TABLE_NAME              BLOCKS   NUM_ROWS SAMPLE_SIZE SAMPLE_PCT A_DATE                  A_TIME
--------------------- --------  --------- ----------- ---------- --------------------  --------
TABLE_AAA                    0          0           0            14-jul-2016 01:27:55       .00
TABLE_BBB              6320666  540356865   108071373         20 14-jul-2016 02:30:45     62.83
TABLE_CCC              9587372  930770535   186154107         20 14-jul-2016 03:36:13     65.47
TABLE_DDD              2272319  104698080    20939616         20 14-jul-2016 04:19:45     43.53
TABLE_EEE              4413696  258746720    51749344         20 14-jul-2016 04:49:13     29.47
TABLE_FFF                    0          0           0            14-jul-2016 05:02:28     13.25
TABLE_GGG              4377656  258740995    51748199         20 14-jul-2016 05:02:28       .00
TABLE_HHH                    0          0           0            14-jul-2016 05:02:28       .00
TABLE_III                   60         52          52        100 14-jul-2016 05:02:29       .02
TABLE_JJJ                   60        170         170        100 14-jul-2016 05:02:30       .02
TABLE_KKK                   60        100         100        100 14-jul-2016 05:02:30       .00
TABLE_LLL                   60       5548        5021       90.5 14-jul-2016 05:02:31       .02
TABLE_MMM                 4660     686575      137315         20 14-jul-2016 05:02:32       .02
...
TABLE_TTT                    0          0           0            14-jul-2016 05:02:38       .00
TABLE_UUU                   60        659         659        100 14-jul-2016 05:02:38       .00
TABLE_VVV               325324   18618685     3723737         20 14-jul-2016 05:04:14      1.60
TABLE_WWW                   60       1106        1106        100 14-jul-2016 05:06:01      1.78
TABLE_XXX                24001    2534810      506962         20 14-jul-2016 05:06:13       .20
TABLE_YYY                    0          0           0            14-jul-2016 05:06:20       .12
TABLE_ZZZ                    0          0           0            14-jul-2016 05:06:20       .00

You’ll notice that the zero row FFF table is reported to have taken 13.25 minutes for stats gathering, while the 4.3M row GGG table table took no time at all. That’s the problem of sorting by the last_analyzed time when it’s accurate only to the second and some tables take less than a second to gather stats. Clearly these two time estimates are the wrong way round. (WWW and XXX look fairly suspect too, but their timestamps are different so there must be some other explanation – but the time is too small to worry about at the moment.)

Tables BBB and CCC are also an example of how variable the results are when you compare cost with time. Although the table and sample had twice the rows table CCC has fewer histograms and indexes. The same variation shows up clearly between EEE and GGG (after allowing for the order error), and slightly less immediately between DDD and all the other large tables.

That brings me to one more threat of mis-interpretation (and a reason why I’m not terribly keen on publishing scripts that I’ve hacked together). If you see Oracle claiming (through this script) that it took no more than a couple of minutes to gather stats on a 5 billion row table it may be true – but maybe that stats collection was only related to gathering stats for a single, fairly new, partition in a partitioned table. There are, no doubt, several boundary cases that a simple script like this will have overlooked – and when I run it I’ll be at a client site with knowledge of the client database and I’ll spot the anomalies and omissions.

Update

One simple enhancement to the script above – indexes take time to analyze as well, and if you query only for tables the time spent analyzing the indexes on a table will be attributed to the NEXT table in the output. All we have to do is change the reference to dba_tables to be an inline view referencing dba_tables and dba_indexes in a UNION ALL:


select
        table_name, index_name,
        blocks, num_rows, sample_size,
        round(100 * sample_size/nullif(num_rows,0),2)   sample_pct,
        to_char(last_analyzed,'dd-mon-yyyy hh24:mi:ss') a_date,
        to_char(
                round(1440 *
                        (
                        last_analyzed -
                        lag(last_analyzed, 1) over (order by last_analyzed)
                        ), 2
                ),'9,999.99'
        )                        a_time
from
        (
        select
                owner, table_name, null as index_name,
                blocks, num_rows, sample_size, last_analyzed
        from
                dba_tables
        union all
        select
                table_owner, table_name, index_name,
                leaf_blocks, num_rows, sample_size, last_analyzed
        from
                dba_indexes
        )
where
        owner = upper('&m_schema')
order by
        last_analyzed
;

You’ll notice that I’ve left the ‘&schema’ outside the view so that I only have to supply a value once; the optimizer is smart enough to push this predicate inside the union all view so that the plan shows the two separate subqueries being operated only for the given schema.

Footnote

A reader has emailed me to point out that one of the cases where this script won’t help (or, at the least, would produce highly misleading results) when the CONCURRENT preference has been set to true, allowing multiple jobs run to gather stats on difference objects at the same.

 

1 Comment »

  1. […] Lewis (@JLOracle) recently published a short post about Stats Collection Time, talking about the table dba_optstat_operation (and dba_optstat_operation_tasks ), which reminded […]

    Pingback by Stats Collection Time Anomaly | Neil Chandler's DBA Blog — September 16, 2016 @ 6:17 pm GMT Sep 16,2016 | Reply


RSS feed for comments on this post. TrackBack URI

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Blog at WordPress.com.