Oracle Scratchpad

January 18, 2009

Analysing Statspack (11)

Filed under: Statspack — Jonathan Lewis @ 11:38 am BST Jan 18,2009

[Further Reading on Statspack]

Here’s an interesting example of a Statspack report that appeared recently on the OTN Database General Forum. It’s from a user who wants to know why an update to seg$ appears in second place in the “SQL ordered by …” sections of the report. I’ve set the link to open in a new window so that you can read the report and my comments at the same time.

We start by noting that the report is from 8.1.7.0, with a duration of 42.70 minutes (2,562 seconds).  A quick glance at the “Load Profile” shows an interesting 19,396 blocks read per second (so we might want to check exactly what type of block reads those are and where they come from). The “Top 5″ section reports 4,790,767 db file sequential reads in 22.74 seconds – which is 5 microseconds per read … which doesn’t seem to be realistic, even for sustained reads from the file system cache.

A quick check of the “init.ora Parameters” at the end of the file shows that timed_statistics is set to false – so any timing information we get is suspect.

Reading the rest of the “Event” listing, we also notice that we did 7,832,262 db file scattered reads in zero seconds (technically less than 1/100 second).

Interestingly, we see (in the “idle” events towards the end of the “Events” list) that we’ve spent 2,524.33 seconds in 605,000 SQL*Net message from client, at an average of about 3 milliseconds per trip.

From observations so far we can say with some confidence that there was probably one session with timed statistics enabled, and that all the time shown in the report came from that one session (2,524.33 +  22.74 comes very close to the snapshot time of 2,562 seconds) . This one session did roughly 605,000 roundtrips – which accounts for a very large fraction of the 258.4 “user calls” in the Load Profile, and since we also had 244 Executes in the load profile, perhaps this one session was busy doing single row inserts or single row updates/deletes [see notes 2/3]  for the duration.
[Ed: the OP has now added the entire statspack report to the thread, with the comment that the session that took the snapshots and created the report was the one which had enabled timed_statistics at the session level. With 20/20 hindsight, this is another "obvious" interpretation of the timing information - you take a snapshot, wait 42 minutes, then take another snapshot. The 42 minutes of wait time comes (mostly) from one very long wait, not from 600,000 short ones. The 600,000 waits (250 per second) are really coming from a number of active sessions - and we can see lots of SQL statements in the full report ("SQL ordered by Execution" which are very short, very simple, and suggestive of highly repetitious code).]

Having dismissed the times as unreliable, let’s follow up the original question about seg$ (the table of segments). Search down to the extract from “SQL ordered by Gets”

We’ve been given two statements, one executed 43 times at 927,000 buffer gets per execution, and the seg$ update, executed 1,383 times at 13,800 buffer gets per execution. The thing about seg$ though, is that it’s in a cluster with uet$ (the used extents), and there’s a cluster index on (file number, block number) that you would generally expect Oracle to use to get to the single segment that matched – and getting to a single segment by index shouldn’t really take 13,809 block visits unless something a little unusual has happened. Maybe Oracle isn’t using that index !

Note that the critical statement took 18,684,805 gets in total – how many of those turned into disk I/Os ?  We don’t have the complete trace file here, but we do have the section for “Buffer Pool Statistics”, and that says we totalled 47,803,200 buffer gets and 47,757,603 pyhsical reads – just 46,400 blocks that weren’t already cached.
[Ed: another follow-on from the full report - the buffer gets reported by the "Buffer Pool Statistics" are nowhere near the session logical I/Os reported by the "Instance Statistics". So we need to look at the "SQL ordered by Reads" to get a better picture of what's going on - and that part of the report shows that the seg$ query isn't doing 13,800 physical reads per execution that I thought it was).]

So both our top two queries were both doing physical reads nearly 100% of the time – and we saw in the “Top 5″ that we recorded only 4.7M db file sequential read waits - so most of the blocks read by these two statements must have come from tablescans.

The first query is in end-user space, and we just have to do the normal tuning things to see if we can make it behave better. (Check the SQL, check the indexing, check the statistics). The seg$ query is in sys-recursive space, and that makes things a little nastier.

We may be able to explain why the seg$ update is in trouble – there seem to be about 13,800 blocks in the cluster and (with the cluster definition and the user’s block size of 4KB) we could get about 17 sets of segment information per block, for a total of 240,000 segments. But when we look at the “Rollback segment storage” section of the report you’ll see that the rollback segments vary from 4MB to 838MB in size with some interesting variation in size and optimal. It’s possible that in the past the system has generated some rollback segments with a huge number of extents, exploding the storage cluster to a huge size, and giving Oracle the impression that the average number of cluster blocks per cluster key is very high (hence making a tablescan more efficient than an access by cluster key).

Charles Hooper (in one of the follow-up posts) has pointed out some anomalies in the sort statistics and parameters, suggesting that the number of calls to the update  (1,353) might be related to the way in which the user’s temporary tablespace is configured – and that seems a reasonable possibility to look into.

The problem may be even simpler than that, though. I’ve mentioned the large number of executes in the “Load Profile” – it might simply be that the big process that does lots of calls causes data and index objects to grow very regularly – those updates to seg$ could simply be due to extent allocation taking place (Notice the “Enqueue activity” shows HW (high water mark) and ST (space transaction) enqueues and waits, and that a lot of our waits are on segment header blocks and bitmap blocks – and those are the LMT bitmaps, not bitmap indexes). Maybe we have (and have had historically) some very large, rapidly growing objects in a locally managed tablespace with a very small extent size – that would explain the large size of the cluster, and the large “average data blocks per key” that is making the optimizer choose a tablescan on that cluster.

If this rapid assessment is correct, we can take three steps to address the problem:

a) make the first big query more efficient – this reduction in resource usage might make the seg$ query run faster – even though it would still be horribly inefficient.

b) make the seg$ query use an indexed access path – perhaps by hacking the statistics on the index cluster to make the index very desirable. (Note –  setting the optimizer_mode to first_rows, or fiddling with other optimizer parameters may have the same effect – but that’s a large-scale change for a very local issue). It would be worth checking why the optimizer is going cost-based, and what statistics it’s using to generate this plan, though – 8i shouldn’t collect stats on the dictionary tables, so the plan could be coming from the default “guestimate” statistics, and a proper collection (check with support) of statistics on this cluster may resolve the issue.  Watch out, though, as you could self-deadlock while collecting stats on this object if you have to allocate a temporary extent while collecting statistics on the extents table !

c) check to see if there are any objects with a large number of  very small extents where the number of extents grows fairly rapidly, and do something to limit the allocation rate.  (e.g. rebuild the object in a tablespace with a larger uniform extent size).

Unfortunately, if seg$ has got into a mess with a huge amount of nearly empty space below the cluster high water mark, the one thing you can’t do is clean up the mess that is in seg$ – unless you create a whole new database.

Updated 22nd Jan – a few days after the complete statspack report had been loaded onto OTN

My original comments were strongly biased by the fact that the “Buffer Pool Statistics” showed nearly 100% of buffer gets turned into physical reads. There are two obvious details that should have warned me that the buffer pool statistics were wrong:

    The “Load Profile” showed 70,000 logical reads per seconds against 19,000 physical reads
    The two SQL statements supplied in the extract totalled 57M buffer gets when the “Bufer Pool Statistics” reported 47M (unfortunately in my first read-throgh I added 39M to 18M and managed to get 47M – which matched so well I forgot to check my arithmetic).

Before addressing the original question, though, it’s worth noting that the full statspack report shows that we have a number of queries that don’t run very often (between 5 an 50 times in the interval) but do a lot of physical reads, and some of them look as if they could easily be made more efficient. Perhaps there are some missing indexes, or maybe there are some type conversions, or character set conversions that are making the existing indexes irrelevant. If there are performance problems, then maybe this is where we should be looking.

If we just want to look at the original question of why the seg$ update appears where it does, my earlier answer is basically unchanged; however I will point out that the optimizer is not doing the tablescan on the seg$ table that I originally proposed.

Note also that the third SQL statement in the “SQL ordered by Gets” is this:


select
	file#,block#,length
from
	uet$
where
	ts#=:1
and	segfile#=:2
and	segblock#=:3
and	ext#=:4

This statement accounts for the same number of executions (1,353) and buffer gets (18.7M) as the update to seg$ – so it’s doing the same work.

If you then look at the “Instance Activity” statistics you will see 37.7M “cluster key scan block gets”. This figure is consistent with the action of the two SQL statements. The queries against seg$ and uet$ are accessing the cluster c_file#_block#, based on the cluster key of (tablespace number, file number, block number).

According to the notes in sql.bsq, and given the blocksize of this database, this could mean there are some objects in this database that run to many tens of thousands of (dictionary-managed) extents !

Something very nasty has happened here – and it may be very nasty to clean the mess up because when you try to move or drop the object, smon could spend (literally) hours cleaning up the freed extents – making the database virtually unusable as it does so.

These two statements (and a couple of others against fet$ – which is in a different cluster) are executed when you add an extent to a segment – so it would be a good thing to find that segment and do something about its storage definitions. (I don’t think it’s a temporary segment, as the “Enqueue Statistics” don’t show any ‘TT’ enqeues – and I think that this version of Oracle would be acquiring a TT lock whenever it extended the temporary segment.

There are some clues in the report that might point towards the nasty object(s)

  • there is some LOB activity in the database, note the statistics about direct path lob reads and writes. Maybe there are a lot of LOB inserts, and updates on LOBs which are small enough to stay in row, but long enough to create havoc (There are lots of table fetch continued row, which could be chained rows due to LOBs that get close to the 3960 byte limit for in-line storage).
  • There have been 2,845 leaf node splits in 42 minutes – this isn’t inherently wrong, but it is worth looking at: are there some indexes with a lot of columns in their definitions that grow even more rapidly than the table they index ? (Be careful about building these, the impact of dropping the old index could be dire).
  • HW and ST enqueues. The highwater mark jumps every five blocks: given the ratio of ST to HW enqueues (roughly 1 to 2), we might have a very large, and growing, object with an extent size of about 32KB.
  • Rollback segment 24 suffered 2MB of writes, but 171 wraps – this means the average extent size (of the extents used in the 42 minutes) is 12KB. (There were no shrinks and extends in the period, so this rollback segment is not contributing to the seg$/uet$ activity, but the anomaly shows at least one sizing error in the database).

Bottom line – I think that something very nasty has appeared in this database. Find it. If you’re lucky, you’re looking at a relic of the past that no longer exists. If you’re unlucky you may have an object that is in such a nasty state that any attempt to get rid of it may lock your database up for a few hours. If my memory of temporary segments and the TT lock is wrong, it’s even possible that stopping and restarting the database could cause a clean-up of a massive temporary segment that ties up smon (and the database) for hours. The worst case scenario is that you may have to plan for a complete rebuild of the database to avoid a future denial of service.

[Further Reading on Statspack]

Footnote: Any advice about reading statspack reports is almost always relevant when reading AWR reports (and vice versa).

4 Comments »

  1. Two questions:

    1: You said: HW and ST enqueues. The highwater mark jumps every five blocks: given the ratio of ST to HW enqueues (roughly 1 to 2), we might have a very large, and growing, object with an extent size of about 32KB.
    Q1: ST/HW=1/2, so the size of one extent equals to the size allocated from two HWM jump, and one HWM jump is of size 5*4K=20K, so two HWM jump is of size 40K. Looks like the extent is of size 40K.

    2:This one session did roughly 605,000 roundtrips – which accounts for a very large fraction of the 258.4 “user calls” in the Load Profile, and since we also had 244 Executes in the load profile, perhaps this one session was busy doing single row inserts or single row updates for the duration.
    Q2A: Do you mean one roundtrip is one “user calls”? In my memory, user call is either of parse, fetch, exec or commit.
    Q2B: How are you sure it was doing ***single*** row inserts or ***single*** row updates? How are you sure it was not doing delete operation.

    Thanks,
    Daniel

    Comment by Daniel — July 28, 2009 @ 3:51 am BST Jul 28,2009 | Reply

    • Daniel,

      1: I can’t remember now why I chose to say 32KB rather than 40KB – but I’d guess it’s because it looks as if the extent size has been set deliberately low and 32KB is a more likely value for a computer-literate person to choose than 40KB (powers of two are prettier). Your comment does raise the question though: if the block size is 4KB and the extent size = 32KB (which is 8 blocks), does Oracle alternate bumping the high water mark by 5 then 3, or does it always bump by 5 even if it means pausing to add an extent for the final 2 blocks, or is there a special boundary case when the number of blocks left in the extent after a bump is less than 5. (Easy to test a few cases, of course, but probably not worth the effort any more).

      2A: Yes, one round-trip is one user call, and a user would be a parse, execute, or fetch. (Possibly a commit would be counted as an execute – it might be worth checking that – bearing in mind there are two code paths for the commit.) Just to avoid any risk of confusion, it’s worth mentioning that the converse is not true: a parse, execute or fetch need not be a user call, as it may have occurred within a pl/sql block which has been instigated by a user’s “execute” call.

      2B: Bear in mind my comment starts with “perhaps” – I was only making a tentative suggestion. In fact it was an oversight on my part (now corrected) that made me omit deletes. (I note that you’ve highlighted “single” in both cases – the reason why I suggested single row, in case you were wondering, is that the redo size per execution was very small.

      Comment by Jonathan Lewis — August 6, 2009 @ 3:34 pm BST Aug 6,2009 | Reply

  2. Jonathan,

    Your thoughts about Indexes leaf nodes split? and how to capture
    indexes with a lot of columns in their definitions that grow even more rapidly than the table they index?

    Thanks,

    Lu

    Comment by Lou — March 23, 2011 @ 3:30 am BST Mar 23,2011 | Reply

    • Lou,

      If you have reasonably accurate statistics on the indexes I have some code that reports indexes that are larger than expected. Since it’s fairly cheap to run I’d be inclined to use it to discover problem indexes on a system I didn’t know, then think about why those indexes (a) exist and (b) are badly behaved.

      I’ve been to many sites which have redundant indexes on tables because there has been no strategic analysis of how the table has to be queried – and sometimes this has resulted in many similar indexes with large numbers of columns.

      Comment by Jonathan Lewis — March 26, 2011 @ 8:13 pm BST Mar 26,2011 | Reply


RSS feed for comments on this post. TrackBack URI

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

The Rubric Theme Blog at WordPress.com.

Follow

Get every new post delivered to your Inbox.

Join 3,453 other followers