Oracle Scratchpad

February 20, 2010

Time on Stats

Filed under: Infrastructure,Statistics — Jonathan Lewis @ 6:35 pm BST Feb 20,2010

In a blog that’s now three years old I pointed out the issue of the automatic stats collection job and the way it could affect performance by changing object-level statistics behind your back. There’s another reason it could have an annoying side effect, though – it actually takes time to collect statistics.

The automatic job is, by default, allowed to run from 10:00 pm to 6:00 am on weekdays and for even longer over the weekend – and the time it needs to complete can vary dramatically from day to day. So it’s nice to have a little script to cross-check when it started and how long it ran. All you have to do is query the view dba_scheduler_job_run_details:

column job_name format a20
column status format a12
column actual_start_date format a36
column run_duration format a14

select
        job_name, status, actual_start_date, run_duration
from
        dba_scheduler_job_run_details
where
        job_name = 'GATHER_STATS_JOB'
order by
        actual_start_date
;

Depending on how many home-grown scheduler jobs you have created, how frequently they run, and the length of the retention period (the default is 7 days), this query can be quite slow because of a predicate on one of the underlying tables that really needs a function-based index, and the complete absence of indexes on the other table – as you can see from my sample execution plan:

--------------------------------------------------------------------------------------------------
| Id  | Operation           | Name                       | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT    |                            |  5440 |   382K|  3506   (1)| 00:01:46 |
|   1 |  SORT ORDER BY      |                            |  5440 |   382K|  3506   (1)| 00:01:46 |
|*  2 |   HASH JOIN         |                            |  5440 |   382K|  3505   (1)| 00:01:46 |
|*  3 |    TABLE ACCESS FULL| SCHEDULER$_EVENT_LOG       |  5411 |   211K|  1525   (1)| 00:00:47 |
|   4 |    TABLE ACCESS FULL| SCHEDULER$_JOB_RUN_DETAILS |   108K|  3381K|  1979   (1)| 00:01:00 |
--------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("J"."LOG_ID"="E"."LOG_ID")
   3 - filter(DECODE(INSTR("E"."NAME",'"'),0,"E"."NAME",SUBSTR("E"."NAME",1,INSTR("E"."NAM
              E",'"')-1))='GATHER_STATS_JOB' AND "E"."TYPE#"=66)

As a guideline – if you create a job that runs every 10 seconds, then you’re going to end up with about 60,000 rows in each of the tables in a week; at once per minute you get about 10,000 rows per week. And both tables will be subject to a tablescan when you run this query.

Here’s a recent output showing the performance threat:

JOB_NAME             STATUS       ACTUAL_START_DATE                RUN_DURATION
-------------------- ------------ -------------------------------- --------------
GATHER_STATS_JOB     SUCCEEDED    21-JAN-10 22.00.03.629836 +00:00 +000 02:26:12
GATHER_STATS_JOB     SUCCEEDED    22-JAN-10 22.00.05.236441 +00:00 +000 02:37:48
GATHER_STATS_JOB     SUCCEEDED    23-JAN-10 06.00.11.181440 +00:00 +000 01:23:25
GATHER_STATS_JOB     SUCCEEDED    25-JAN-10 22.00.02.629410 +00:00 +000 05:06:38
GATHER_STATS_JOB     SUCCEEDED    26-JAN-10 22.00.01.182149 +00:00 +000 00:11:09
GATHER_STATS_JOB     SUCCEEDED    27-JAN-10 22.00.04.178122 +00:00 +000 00:09:54
GATHER_STATS_JOB     SUCCEEDED    28-JAN-10 22.00.01.500210 +00:00 +000 03:15:44

Notice how the time to collect statistics varied over the week, with a minimum of 10 minutes and a maximum of five hours. Just think what impact this could have on a big overnight batch: one day the stats collection is nearly invisible, another day there’s a huge extra job hammering away for five hours as the batch runs.

Footnote:

The indexes that don’t exist on my 10.2.0.3, and possibly ought to exist, look something like the following (which obviously should not be created on a production system without approval from Oracle Support – especially since the tables are in the SYS schema):

create index jpl_fbi on scheduler$_event_log (
	decode(instr(name,'"'),
		0, name,
		   substr(name,1,instr(name,'"')-1)
	)
)
;

create index jpl_i1 on sys.scheduler$_job_run_details(log_id)
;

10 Comments »

  1. [...] Time on Stats [...]

    Pingback by Simple scripts « Oracle Scratchpad — February 20, 2010 @ 6:38 pm BST Feb 20,2010 | Reply

  2. Hi Jonathan,

    If you only want to know about the automatic stats collection job running, you can look at DBA_OPTSTAT_OPERATIONS, which itself looks at sys.WRI$_OPTSTAT_OPR. There is no index on this underlying table, but as it only hold information about statistic gathering operations that covered schemas, system, dictionary or everything (the automatic stats job) it contains relatively few records.

    If you have a system with very large tables (hundreds of GB or even TB) it is very important to check how long the automatic stats job ran for. Under V10 at least, it can get “stuck”. It spends so long gathering stats on the very large table or tables that it never finishes, then tries to gather stats on the same table(s) the next night and fails again… and again…until the weekend when it gets more time to run.

    If things get really bad, it will fail over the weekend too and, in effect, your automatic stats gathering stops. You are not warned about this by the system, you have to go and check!

    Martin

    Comment by mwidlake — February 20, 2010 @ 7:40 pm BST Feb 20,2010 | Reply

    • Martin,
      Good idea, thanks for that.
      Something like:

      select 
      	operation, 
      	start_time, 
      	end_time - start_time
      from 
      	dba_optstat_operations
      where
      	operation = 'gather_database_stats(auto)'
      order by 
      	start_time desc
      ;sourcecode]
      
      

      (Formatting left as exercise).

      The only drawback is that this doesn’t give you a success/fail indicator – which the scheduler does.

      I’ve seen a similar, but possibly funnier, example of your “stuck” statistics. Big batch load to enormous table causes Oracle to fail to collect stats every night for a week – finally succeeding on the big table at 8:00 pm Sunday night … just in time for the table to be dump and reloaded.

      I see that you’ve got a mini-series on your blog about stats collection silently failing.

      Comment by Jonathan Lewis — February 20, 2010 @ 8:33 pm BST Feb 20,2010 | Reply

  3. Hi Jonathan,

    I had a related problem last week, gather_schema_stats job running two days in my sqlplus session(oracle version 10.2.0.4), it was a request from client to collect stats for a schema at 10am on Monday morning, as they loaded 8m rows into a table which already got 10m rows, so to become 18m rows. Table is not partitioned.

    From dba_scheduler_job_run_details automatic stats collection job status was ‘STOPPED’ for Monday and Tuesday night runs, as weekday window is from 22pm to 6am.

    Wednesday afternoon i cancelled my sqlplus session, I was using gather_schema_stats with estimate_percent as auto sample size, method_opt_column size also auto. Then I tried the same with degree set to 7, server got 8 physical cpu’s, still no luck, it was taking hrs, there is problem sql reported in metalink for running gather_schema_stats with auto options, and I found same sql in my awr report.

    finally I tried with sample size 1 and column size 1 (without histograms) in sqlplus, job finished within few minutes, and automatic stats collection job was successful from that night onwards. I didn’t change any auto option values using SET_PARAM procedure.

    Regards,
    Vineeth

    Comment by Vineeth Kalanthodi — February 21, 2010 @ 11:27 am BST Feb 21,2010 | Reply

  4. That’s an interesting predicate:

    DECODE(INSTR(“E”.”NAME”,’”‘),0,”E”.”NAME”,SUBSTR(“E”.”NAME”,1,INSTR(“E”.”NAME”,’”‘)-1))=’GATHER_STATS_JOB’

    I wonder why it is not written as: “E”.”NAME” LIKE ‘GATHER_STATS_JOB%’

    Not strictly the same, but at least an index wouldn’t have to be function-based to be useful.

    Comment by David Aldridge — February 22, 2010 @ 7:31 am BST Feb 22,2010 | Reply

  5. David,

    I supplied the predicate:

    job_name = ‘GATHER_STATS_JOB’

    The view definition for column job_name is:

    DECODE(INSTR(“E”.”NAME”,’”‘),0,”E”.”NAME”,SUBSTR(“E”.”NAME”,1,INSTR(“E”.”NAME”,’”‘)-1))

    One day the optimizer may be able to handle the type of inference that you are suggesting – but it can’t do so at present.

    Comment by Jonathan Lewis — February 22, 2010 @ 2:43 pm BST Feb 22,2010 | Reply

  6. Although it is true that gathering statistics cause a (varying, depending on the amount of change of the analyzed tables) load on the database, there is no sensible alternative, or is there?
    All we can do is to determine, whether there is a critical task done in the time window where the statistic gathering job is running – and then modify that time window accordingly. That has got easier in 11g, where we have a different scheduler window used to determine the statistics gathering task for each weekday.

    Comment by Uwe Hesse — February 22, 2010 @ 5:38 pm BST Feb 22,2010 | Reply

    • Hi Uwe,

      A problem I have seen a couple of times and that I know others have, is that the default stats gathering window falls at the same time as the batch process window for many systems. A lot of Data warehouses have their busiest processing time preparing data overnight, in time for the users to look at it in the morning. Changing the schedular window to the ealry evening, so the stats gathering usually finishes before midnight can help.

      An another alternative, which is very manual and a bit dirty, is you check for very large objects that have reached 8 or 9% changed and do a manual gather on the table and it’s indexes before the auto job picks it up. You can pick low values and parallel processing to get it done quickly. I did that “by hand” for a few weeks once :-)

      Comment by mwidlake — February 22, 2010 @ 6:07 pm BST Feb 22,2010 | Reply

      • Martin,

        Timing can be critical – and performance “tuning” can sometimes be as simple as rescheduling.

        Of course, there are times when collecting stats about what the data looked like yesterday just before loading today’s data can lead to odd execution plans in the morning ;)

        Comment by Jonathan Lewis — February 23, 2010 @ 9:27 am BST Feb 23,2010 | Reply

    • Uwe,

      It is certainly necessary to ensure that the statistics you have are a reasonable model of the data – but that doesn’t mean you have to allow the automatic stats collection job to run.

      And if you do have the automatic job running, it’s probably a good idea to change the method_opt to avoid collecting histograms – and collecting histograms is (a) the most expensive bit and (b) the bit most liable to be sensitive to change.

      My chapter in the second Oak Table compendium is all about statistics – and end with suggestions about the need (on larger, more complex systems) to devise a simple alternative to the automatic stats collection job.

      Comment by Jonathan Lewis — February 23, 2010 @ 9:24 am BST Feb 23,2010 | Reply


RSS feed for comments on this post. TrackBack URI

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

The Rubric Theme Blog at WordPress.com.

Follow

Get every new post delivered to your Inbox.

Join 3,507 other followers