Oracle Scratchpad

February 20, 2010

Time on Stats

Filed under: Infrastructure,Statistics — Jonathan Lewis @ 6:35 pm GMT Feb 20,2010

In a blog that’s now three years old I pointed out the issue of the automatic stats collection job and the way it could affect performance by changing object-level statistics behind your back. There’s another reason it could have an annoying side effect, though – it actually takes time to collect statistics.

The automatic job is, by default, allowed to run from 10:00 pm to 6:00 am on weekdays and for even longer over the weekend – and the time it needs to complete can vary dramatically from day to day. So it’s nice to have a little script to cross-check when it started and how long it ran. All you have to do is query the view dba_scheduler_job_run_details:

column job_name format a20
column status format a12
column actual_start_date format a36
column run_duration format a14

        job_name, status, actual_start_date, run_duration
        job_name = 'GATHER_STATS_JOB'
order by

Depending on how many home-grown scheduler jobs you have created, how frequently they run, and the length of the retention period (the default is 7 days), this query can be quite slow because of a predicate on one of the underlying tables that really needs a function-based index, and the complete absence of indexes on the other table – as you can see from my sample execution plan:

| Id  | Operation           | Name                       | Rows  | Bytes | Cost (%CPU)| Time     |
|   0 | SELECT STATEMENT    |                            |  5440 |   382K|  3506   (1)| 00:01:46 |
|   1 |  SORT ORDER BY      |                            |  5440 |   382K|  3506   (1)| 00:01:46 |
|*  2 |   HASH JOIN         |                            |  5440 |   382K|  3505   (1)| 00:01:46 |
|*  3 |    TABLE ACCESS FULL| SCHEDULER$_EVENT_LOG       |  5411 |   211K|  1525   (1)| 00:00:47 |
|   4 |    TABLE ACCESS FULL| SCHEDULER$_JOB_RUN_DETAILS |   108K|  3381K|  1979   (1)| 00:01:00 |

Predicate Information (identified by operation id):
   2 - access("J"."LOG_ID"="E"."LOG_ID")
   3 - filter(DECODE(INSTR("E"."NAME",'"'),0,"E"."NAME",SUBSTR("E"."NAME",1,INSTR("E"."NAM
              E",'"')-1))='GATHER_STATS_JOB' AND "E"."TYPE#"=66)

As a guideline – if you create a job that runs every 10 seconds, then you’re going to end up with about 60,000 rows in each of the tables in a week; at once per minute you get about 10,000 rows per week. And both tables will be subject to a tablescan when you run this query.

Here’s a recent output showing the performance threat:

JOB_NAME             STATUS       ACTUAL_START_DATE                RUN_DURATION
-------------------- ------------ -------------------------------- --------------
GATHER_STATS_JOB     SUCCEEDED    21-JAN-10 +00:00 +000 02:26:12
GATHER_STATS_JOB     SUCCEEDED    22-JAN-10 +00:00 +000 02:37:48
GATHER_STATS_JOB     SUCCEEDED    23-JAN-10 +00:00 +000 01:23:25
GATHER_STATS_JOB     SUCCEEDED    25-JAN-10 +00:00 +000 05:06:38
GATHER_STATS_JOB     SUCCEEDED    26-JAN-10 +00:00 +000 00:11:09
GATHER_STATS_JOB     SUCCEEDED    27-JAN-10 +00:00 +000 00:09:54
GATHER_STATS_JOB     SUCCEEDED    28-JAN-10 +00:00 +000 03:15:44

Notice how the time to collect statistics varied over the week, with a minimum of 10 minutes and a maximum of five hours. Just think what impact this could have on a big overnight batch: one day the stats collection is nearly invisible, another day there’s a huge extra job hammering away for five hours as the batch runs.


The indexes that don’t exist on my, and possibly ought to exist, look something like the following (which obviously should not be created on a production system without approval from Oracle Support – especially since the tables are in the SYS schema):

create index jpl_fbi on scheduler$_event_log (
		0, name,

create index jpl_i1 on sys.scheduler$_job_run_details(log_id)


  1. […] Time on Stats […]

    Pingback by Simple scripts « Oracle Scratchpad — February 20, 2010 @ 6:38 pm GMT Feb 20,2010 | Reply

  2. Hi Jonathan,

    If you only want to know about the automatic stats collection job running, you can look at DBA_OPTSTAT_OPERATIONS, which itself looks at sys.WRI$_OPTSTAT_OPR. There is no index on this underlying table, but as it only hold information about statistic gathering operations that covered schemas, system, dictionary or everything (the automatic stats job) it contains relatively few records.

    If you have a system with very large tables (hundreds of GB or even TB) it is very important to check how long the automatic stats job ran for. Under V10 at least, it can get “stuck”. It spends so long gathering stats on the very large table or tables that it never finishes, then tries to gather stats on the same table(s) the next night and fails again… and again…until the weekend when it gets more time to run.

    If things get really bad, it will fail over the weekend too and, in effect, your automatic stats gathering stops. You are not warned about this by the system, you have to go and check!


    Comment by mwidlake — February 20, 2010 @ 7:40 pm GMT Feb 20,2010 | Reply

    • Martin,
      Good idea, thanks for that.
      Something like:

      	end_time - start_time
      	operation = 'gather_database_stats(auto)'
      order by 
      	start_time desc

      (Formatting left as exercise).

      The only drawback is that this doesn’t give you a success/fail indicator – which the scheduler does.

      I’ve seen a similar, but possibly funnier, example of your “stuck” statistics. Big batch load to enormous table causes Oracle to fail to collect stats every night for a week – finally succeeding on the big table at 8:00 pm Sunday night … just in time for the table to be dump and reloaded.

      I see that you’ve got a mini-series on your blog about stats collection silently failing.

      Comment by Jonathan Lewis — February 20, 2010 @ 8:33 pm GMT Feb 20,2010 | Reply

  3. Hi Jonathan,

    I had a related problem last week, gather_schema_stats job running two days in my sqlplus session(oracle version, it was a request from client to collect stats for a schema at 10am on Monday morning, as they loaded 8m rows into a table which already got 10m rows, so to become 18m rows. Table is not partitioned.

    From dba_scheduler_job_run_details automatic stats collection job status was ‘STOPPED’ for Monday and Tuesday night runs, as weekday window is from 22pm to 6am.

    Wednesday afternoon i cancelled my sqlplus session, I was using gather_schema_stats with estimate_percent as auto sample size, method_opt_column size also auto. Then I tried the same with degree set to 7, server got 8 physical cpu’s, still no luck, it was taking hrs, there is problem sql reported in metalink for running gather_schema_stats with auto options, and I found same sql in my awr report.

    finally I tried with sample size 1 and column size 1 (without histograms) in sqlplus, job finished within few minutes, and automatic stats collection job was successful from that night onwards. I didn’t change any auto option values using SET_PARAM procedure.


    Comment by Vineeth Kalanthodi — February 21, 2010 @ 11:27 am GMT Feb 21,2010 | Reply

  4. That’s an interesting predicate:


    I wonder why it is not written as: “E”.”NAME” LIKE ‘GATHER_STATS_JOB%’

    Not strictly the same, but at least an index wouldn’t have to be function-based to be useful.

    Comment by David Aldridge — February 22, 2010 @ 7:31 am GMT Feb 22,2010 | Reply

  5. David,

    I supplied the predicate:

    job_name = ‘GATHER_STATS_JOB’

    The view definition for column job_name is:


    One day the optimizer may be able to handle the type of inference that you are suggesting – but it can’t do so at present.

    Comment by Jonathan Lewis — February 22, 2010 @ 2:43 pm GMT Feb 22,2010 | Reply

  6. Although it is true that gathering statistics cause a (varying, depending on the amount of change of the analyzed tables) load on the database, there is no sensible alternative, or is there?
    All we can do is to determine, whether there is a critical task done in the time window where the statistic gathering job is running – and then modify that time window accordingly. That has got easier in 11g, where we have a different scheduler window used to determine the statistics gathering task for each weekday.

    Comment by Uwe Hesse — February 22, 2010 @ 5:38 pm GMT Feb 22,2010 | Reply

    • Hi Uwe,

      A problem I have seen a couple of times and that I know others have, is that the default stats gathering window falls at the same time as the batch process window for many systems. A lot of Data warehouses have their busiest processing time preparing data overnight, in time for the users to look at it in the morning. Changing the schedular window to the ealry evening, so the stats gathering usually finishes before midnight can help.

      An another alternative, which is very manual and a bit dirty, is you check for very large objects that have reached 8 or 9% changed and do a manual gather on the table and it’s indexes before the auto job picks it up. You can pick low values and parallel processing to get it done quickly. I did that “by hand” for a few weeks once :-)

      Comment by mwidlake — February 22, 2010 @ 6:07 pm GMT Feb 22,2010 | Reply

      • Martin,

        Timing can be critical – and performance “tuning” can sometimes be as simple as rescheduling.

        Of course, there are times when collecting stats about what the data looked like yesterday just before loading today’s data can lead to odd execution plans in the morning ;)

        Comment by Jonathan Lewis — February 23, 2010 @ 9:27 am GMT Feb 23,2010 | Reply

    • Uwe,

      It is certainly necessary to ensure that the statistics you have are a reasonable model of the data – but that doesn’t mean you have to allow the automatic stats collection job to run.

      And if you do have the automatic job running, it’s probably a good idea to change the method_opt to avoid collecting histograms – and collecting histograms is (a) the most expensive bit and (b) the bit most liable to be sensitive to change.

      My chapter in the second Oak Table compendium is all about statistics – and end with suggestions about the need (on larger, more complex systems) to devise a simple alternative to the automatic stats collection job.

      Comment by Jonathan Lewis — February 23, 2010 @ 9:24 am GMT Feb 23,2010 | Reply

  7. Hi Jonathan,

    I know that this is an old post, but I recently hit the “stuck auto stats job” problem in and it took me a long time to figure out why it was stuck, so thought this may be useful for some other poor soul dealing with this horrible black box job!

    In a large DW environment, the automatic job just kept being stopped at the end of the window with the REASON=”Stop job called because associated window was closed” JOB_INFO as shown by querying DBA_AUTOTASK_JOB_HISTORY.

    I could not find what it had been doing by querying the DBA_TABLES and DBA_TAB_PARTITIONS LAST_ANALYZED column being between the maintenance window start and end times because there was nothing much shown as having been analyzed within the window. I could however see from querying DBA_OPTSTAT_OPERATIONS that coinciding with the end of the window time there was always a RESTORE_TABLE_STATS operation. At least this gave me the clue as to which large table (with several years worth of daily partitions going back to 2011) the job was getting stuck on.

    But I knew that every partition and subpartition of that table had current statistics already gathered on it. I double checked this by also running DBMS_STATS.GATHER_DATABASE_STATS(LIST AUTO)[ using something akin to your script in ] and afterwards GATHER AUTO, which only took a few minutes to run successfully because so few stats were stale or empty.

    Finally, I decided to run the auto job manually using EXEC DBMS_AUTO_TASK_IMMEDIATE.GATHER_OPTIMIZER_STATS, whilst at the same time disabling the automatic task from running in its maintenance windows in the next few days and look at what is being analysed using the LAST_ANALYZED query. This revealed that the job was re-analysing partitions that had not been analysed for approximately over a year, even though those partitions were not stale as their contents had not changed (and they were not being done by the DBMS_STATS.GATHER_DATABASE_STATS(GATHER AUTO)

    The conclusion that I draw is that the auto stats job has some additional rules regarding what needs to be analysed over and above what is in DBMS_STATS.GATHER_DATABASE_STATS. The documentation and Metalink only mentions that this job additionally prioritises the order of gathering objects, but I am now suspicious that it also has some time-based rule regardless of whether monitoring shows the object as stale or not. Have you ever come across this or do you infer some different cause that I may have missed?



    Comment by Tony — January 15, 2015 @ 4:04 pm GMT Jan 15,2015 | Reply

    • Tony,

      I haven’t heard this problem before – and it seems likely that someone else should have seen it, so perhaps a search on MoS might turn something up.

      If I had to guess my first choice would be an error in the way the partitions are identified or selected, so I’d try running the job again with tracing enabled (with bind capture as well) to see if I could spot the error. A possible area of interest would be the flush from memory to dba_tab_modifications (mon_mod$ etc.) because maybe that goes wrong and then triggers the selection of the wrong partitions – and then the problem destroys the evidence after collecting the replacement stats, of course.

      Comment by Jonathan Lewis — January 16, 2015 @ 5:52 pm GMT Jan 16,2015 | Reply

    • Hello Tony,

      I seems hit the same issue with you, on 11203 and after a app release, gather never successfully with the same error msg: REASON=”Stop job called because associated window was closed”

      tried with DBMS_AUTO_TASK_IMMEDIATE.GATHER_OPTIMIZER_STATS to see any chance to finish the gathering. will update later.

      Comment by rickyzhu — July 6, 2016 @ 5:21 am GMT Jul 6,2016 | Reply

  8. […] it had been around for several years, though before I wrote little script (possibly prompted by a comment from Martin Widlake) that I’ve used occasionally since to see what’s been going on in the past, how […]

    Pingback by Stats time | Oracle Scratchpad — September 15, 2016 @ 9:41 am GMT Sep 15,2016 | Reply

  9. […] why it’s useful to check these tables as Johnathans note and @MDWidlakes’s comment here should give you all you need to […]

    Pingback by Stats Collection Time Anomaly | Neil Chandler's DBA Blog — September 16, 2016 @ 6:17 pm GMT Sep 16,2016 | Reply

RSS feed for comments on this post. TrackBack URI

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Blog at