One of the most useful views I’ve used this trick on is v$event_histogram – the view which gives you a more detailed breakdown of where the time is going on each of the events reported by v$system_event. As with my example of faking v$session_wait_history, the idea is simple – create a “union all” view with a time-wasting query in the middle.
First we create a simple function that waits for a number of seconds specified by the first input, then returns the second (numeric) input:
create or replace function jpl_numeric_wait ( i_seconds in number, i_value in number default 0 ) return number deterministic as begin dbms_lock.sleep(i_seconds); return i_value; end; /
Then we query v$event_histogram twice – with a delay in the middle:
define m_event = 'db file sequential read' define m_pause = 60 column tstamp new_value m_tstamp select to_char(sysdate, 'dd_hh24_mi') tstamp from dual; spool snap_&m_tstamp select event, wait_time_milli, sum(wait_count) from ( select event, wait_time_milli, -1 * wait_count wait_count from v$event_histogram where event = '&m_event' union all select null, jpl_numeric_wait( &m_pause ,-1), null from dual union all select event, wait_time_milli, wait_count from v$event_histogram where event = '&m_event' ) where wait_time_milli != -1 group by event, wait_time_milli order by event, wait_time_milli ; prompt Output file: snap_&m_tstamp..lst spool off
You need only create the function once, of course, and you will have to have the privilege to execute the dbms_lock package to do so. The rest of the code can be used in a script which can be modified to pick you favourite events, set the snapshot time, and so on.
Here’s a sample of output from a slightly stressed client system – using the event and time period of the sample code:
EVENT WAIT_TIME_MILLI SUM(WAIT_COUNT) ------------------------------ --------------- --------------- db file sequential read 1 43430 db file sequential read 2 6558 db file sequential read 4 3738 db file sequential read 8 12241 db file sequential read 16 4388 db file sequential read 32 214 db file sequential read 64 125 db file sequential read 128 18 db file sequential read 256 5 db file sequential read 512 1 db file sequential read 1024 0 db file sequential read 2048 0 db file sequential read 4096 0 db file sequential read 8192 0 14 rows selected. Output file: snap_13_14_09.lst
As you can see, this system has done about 70,000 single block read requests in 60 seconds – and luckily most of them (about 50,000 – the ones that took less than 2 milliseconds) came out of a SAN cache, or event (for those less than 1 millisecond) a file-system cache.
The spike of 12,200 reads in the 4 – 8 millisecond region is a good response time, with a trailer of 4,400 out into the 8 – 16 millsecond range that’s a little suspect but allowable for a reasonably loaded system.
But the tail is very worrying – there’s one read that took more than a quarter of a second, and could have been anything up to half a second; there’s 149 reads in total that took more than 32 milliseconds to complete. It’s only a small percentage, but it’s an indication that this is an I/O subsystem that’s on the fringes of catastrophe. (If you know Cary Millsap’s work, you’ll recognise the knee of the response-time curve in these numbers).
It’s a useful little script: average wait times can often look reasonable when there’s really a catastrophe waiting in the wings.