Oracle Scratchpad

June 15, 2008

Event snapshots

Filed under: Oracle,Performance,Troubleshooting — Jonathan Lewis @ 8:13 pm BST Jun 15,2008

[Back to part 2]

Some time ago I wrote a note about using dbms_lock() to slow down your SQL, and followed up with an example of using this type of code to help with trouble-shooting.

One of the most useful views I’ve used this trick on is v$event_histogram – the view which gives you a more detailed breakdown of where the time is going on each of the events reported by v$system_event. As with my example of faking v$session_wait_history, the idea is simple – create a “union all” view with a time-wasting query in the middle.

First we create a simple function that waits for a number of seconds specified by the first input, then returns the second (numeric) input:

create or replace function jpl_numeric_wait (
	i_seconds in number,
	i_value in number default 0
) return number
deterministic
as
begin
	dbms_lock.sleep(i_seconds);
	return i_value;
end;
/

 Then we query v$event_histogram twice – with a delay in the middle:

define m_event = 'db file sequential read'
define m_pause = 60

column tstamp new_value m_tstamp

select to_char(sysdate, 'dd_hh24_mi') tstamp from dual;

spool snap_&m_tstamp

select
	event, wait_time_milli, sum(wait_count)
from (
	select
		event, wait_time_milli, -1 * wait_count  wait_count
	from	v$event_histogram
	where	event = '&m_event'
	union all
	select
		null, jpl_numeric_wait( &m_pause ,-1), null
	from	dual
	union all
	select
		event, wait_time_milli, wait_count
	from	v$event_histogram
	where	event = '&m_event'
	)
where
	wait_time_milli != -1
group by
	event, wait_time_milli
order by
	event, wait_time_milli
;

prompt Output file: snap_&m_tstamp..lst

spool off

You need only create the function once, of course, and you will have to have the privilege to execute the dbms_lock package to do so. The rest of the code can be used in a script which can be modified to pick you favourite events, set the snapshot time, and so on.

Here’s a sample of output from a slightly stressed client system – using the event and time period of the sample code:

EVENT                          WAIT_TIME_MILLI SUM(WAIT_COUNT)
------------------------------ --------------- ---------------
db file sequential read                      1           43430
db file sequential read                      2            6558
db file sequential read                      4            3738
db file sequential read                      8           12241
db file sequential read                     16            4388
db file sequential read                     32             214
db file sequential read                     64             125
db file sequential read                    128              18
db file sequential read                    256               5
db file sequential read                    512               1
db file sequential read                   1024               0
db file sequential read                   2048               0
db file sequential read                   4096               0
db file sequential read                   8192               0

14 rows selected.

Output file: snap_13_14_09.lst

 As you can see, this system has done about 70,000 single block read requests in 60 seconds – and luckily most of them (about 50,000 – the ones that took less than 2 milliseconds) came out of a SAN cache, or event (for those less than 1 millisecond) a file-system cache.

The spike of 12,200 reads in the 4 – 8 millisecond region is a good response time, with a trailer of 4,400 out into the 8 – 16 millsecond range that’s a little suspect but allowable for a reasonably loaded system.

But the tail is very worrying – there’s one read that took more than a quarter of a second, and could have been anything up to half a second; there’s 149 reads in total that took more than 32 milliseconds to complete. It’s only a small percentage, but it’s an indication that this is an I/O subsystem that’s on the fringes of catastrophe.  (If you know Cary Millsap’s work, you’ll recognise the knee of the response-time curve in these numbers).

It’s a useful little script: average wait times can often look reasonable when there’s really a catastrophe waiting in the wings.

[Back to part 2]

7 Comments »

  1. Nice one.

    Times ago, I was thinking about redefining some Oracle V$iews, or introduce some new ones, so I can have history in them.

    They would be great for historic-events-analysis.

    Comment by lascoltodelvenerdi — June 16, 2008 @ 8:43 am BST Jun 16,2008 | Reply

  2. [...] Event Histograms – 2 Filed under: Statspack, Troubleshooting — Jonathan Lewis @ 8:38 pm UTC Jan 31,2007 [Forward to part 3] [...]

    Pingback by Event Histograms - 2 « Oracle Scratchpad — June 16, 2008 @ 9:11 am BST Jun 16,2008 | Reply

  3. I tried this out on 11g and used the new pivot clause to produce a cross-tab from multiple collection samples.


    create table evh (
    snap_time date,
    event varchar2(30),
    wait_time_milli number,
    wait_time_count number
    );

    insert into evh
    select sysdate snap_time, iv.event event, iv.wait_time_milli, iv.wait_time_count
    from (
    select
    event, wait_time_milli, sum(wait_count) wait_time_count
    from (
    select
    event, wait_time_milli, -1 * wait_count wait_count
    from v$event_histogram
    where event = '&m_event'
    union all
    select
    null, jpl_numeric_wait( &m_pause ,-1), null
    from dual
    union all
    select
    event, wait_time_milli, wait_count
    from v$event_histogram
    where event = '&m_event'
    )
    where
    wait_time_milli != -1
    group by
    event, wait_time_milli
    order by
    event, wait_time_milli
    ) iv;

    with data as (
    select to_char(snap_time,'dd-mon-yy hh24:mi:ss'), event, wait_time_milli, wait_time_count
    from evh
    )
    select *
    from data
    pivot (
    max(wait_time_count)
    for wait_time_milli
    in (1,2,4,8,16,32,64,128,256,512,1024,2048 )
    )
    order by 2,1

    Comment by Jeroen — June 16, 2008 @ 10:56 am BST Jun 16,2008 | Reply

  4. Jeroen,
    Nice enhancement; it’s prompted me to write a version of the original code for 11g.


    column event format a32
    define m_event='db file%'
    define m_pause=20

    with subq as (
            select
                    event, wait_time_milli, sum(wait_count) wait_count
            from (
                    select
                            event, wait_time_milli, -1 * wait_count  wait_count
                    from    v$event_histogram
                    where   event like '&m_event'
                    union all
                    select
                            null, jpl_numeric_wait( &m_pause ,-1), null
                    from    dual
                    union all
                    select
                            event, wait_time_milli, wait_count
                    from    v$event_histogram
                    where   event like  '&m_event'
                    )
            where
                    wait_time_milli != -1
            group by
                    event, wait_time_milli
            order by
                    event, wait_time_milli
            )
    select
            *
    from
            subq
            pivot (
                   max(wait_count)
                   for wait_time_milli in (1,2,4,8,16,32)
            )
    order by
            event
    ;

    EVENT                        1      2      4      8     16     32
    ---------------------- ------- ------ ------ ------ ------ ------
    db file parallel read        0      0      0      0      0      0
    db file parallel write      18      0      0      0      0      0
    db file scattered read       1      1      0      0      0      0
    db file sequential read   1379      5      4     11     33     34
    db file single write         0

    I’ve just used the pivot() function here on my origrinal query, but changed the equality to a like. This has lined up a few similar events in a nice tabular form.

    If you want to get clever, you could even change the like to some variant of a regexp() function and pick out (say) all file reads and writes.

    Update: Note that the pivot() clause is limited to keep the output readable on WordPress.

    Comment by Jonathan Lewis — June 17, 2008 @ 8:08 pm BST Jun 17,2008 | Reply

  5. [...] want to run this for too long at a time ( you could implement a short wait in there with a nice trick by Jonathan Lewis [...]

    Pingback by Advanced Oracle Troubleshooting Guide, Part 7: Sampling latch holder statistics using LatchProf « Tanel Poder’s blog: Core IT for geeks and pros — July 8, 2008 @ 11:56 am BST Jul 8,2008 | Reply

  6. [...] (approximately) v$session_wait_history in older versions of Oracle. Rapid snapshots of the event histogram view. Comments [...]

    Pingback by Slow Code « Oracle Scratchpad — December 30, 2009 @ 3:11 pm BST Dec 30,2009 | Reply

  7. […] redundant – in line (or “with”) pl/sql functions. I’ve described the use of “slowdown” functions in the past, but with 12c I don’t need to create a separate row_wait function – I can […]

    Pingback by 12c Funny | Oracle Scratchpad — June 30, 2013 @ 7:11 pm BST Jun 30,2013 | Reply


RSS feed for comments on this post. TrackBack URI

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

The Rubric Theme. Blog at WordPress.com.

Follow

Get every new post delivered to your Inbox.

Join 3,910 other followers