Here’s a little oddity that I came across at a client site recently.
The client called me in because they were having problems with Oracle error “ORA-01555: snapshot too old” appearing in a particular task after a few thousand seconds (typically 5,000 to 6,000) even though they had set the undo_retention to 14,400 seconds and had a huge undo tablespace running with autoextend enabled on the data files.
Someone posted a very pertinent question about blocks splits and index rebuilds on the OTN Database forum yesterday covering a detail of the cost/benefit equation that I don’t think I’ve mentioned before.
Since I answered the question on the forum I’ve posted a link here to make it available to a wider audience.
(This is a strategy I may adopt more frequently in the future – there’s a lot of useful material of mine all over the internet, and I really ought to make sure I don’t spend time repeating myself when simple pointer would do).
[Updated May 2009: The forum item has since gone missing - so maybe linking to notes I've written in public groups is not so smart after all]. Fortunately I happen to have been saving copies of the thread as it progressed, and the little point that I wanted to highlight was just the following:
“When you create or rebuild an index Oracle does not honour the pctfree setting in the branch blocks. Since the branch blocks are effectively 100% packed, the very first leaf block split in each branch block (except the last one) will almost certainly result in a branch block split. This means a single row insert into an 8KB block could result in 16KB+ of redo for the leaf block plus 16KB+ for the branch block split.”
[Further reading on rebuilding indexes]
I see that Tom Kyte has found a nasty little bug waiting to trap a few unlucky people as they patch to 10.2.0.4, or upgrade to 11g.
Here’s a surprising anomaly that showed up in a question on the OTN forums a little while ago. Consider a simple query that uses a hash join between two tables.
pq_distribute(t2 hash hash)
from t1, t2
where t2.n1 = t1.n1
and t2.small_vc = t1.small_vc
When it runs serially the join completes in memory and the only I/O you see comes from the two tablescans. When the query runs parallel something causes a spill to the temporary tablespace.
[Forward to Part 2]
Towards the end of April, I published a note about manual optimisation, and mentioned in one of the comments (#19) that as part of the discussion of the (slightly suspect) mechanism I had introduced I would eventually get around to talking about sorted hash clusters. So I’ve finally managed to make a start.
I’ve probably got a couple of comments about 10053 trace files lurking somewhere on this blog and on my website – and when I mention the 10053 I usually remember to say that it’s a last resort that I only use when I think there may be a bug that needs to be nailed.
So here’s a bit of a 10053 – which I only looked at because I thought it was going to show me a bug.
The worst type of Oracle bug is the one that seems to appear randomly and can’t be reproduced on demand. (Such as when Oracle support says “please send us a reproducible test case”).
Here’s one such (probable) bug that showed up at a client site that was reporting performance problems with a query that, on random days, chose a bad execution plan. The client was running one of the earlier versions of 9.2, and using the following call to dbms_stats to collect fresh table stats for each table in turn every night.
If you ever thought you could understand what’s printed in a trace file, you might like to read this comment from a recent SR:
The trace file is reporting the steps that are needed to troubleshoot an issue. The comments tags and entries in the files may not be consistent in every scenario and their meaning can change between tasks and versions. As they are not designed or intended for direct interpretation by customers, no rationale or consistency is guaranteed.
Further, no official documentation is provided for the raw trace files. The only behaviour that is considered ‘expected’ behaviour is that which is contained in the documentation and manuals.
Of course, this does make life a little confusing – is a note that’s been published on Metalink (such as 39817.1 and 338137.1) not official after all ?
Experiences like this one [Ed: Nov 2008 - the blog has become private since I wrote this note]are always worth reading about to remind yourself what you can do with the dbms_stats package when it’s really necessary.
And while I’m pointing to other URLs, here’s another one worth knowing about – event “Cursor: pin S wait on X”. It’s not surprising to see this wait event occasionally in a busy 10g system, but if you’re losing a significant amount of time, it could be a bug.
A couple of weeks ago, having just arrived home from Spain, I got a phone call in Heathrow airport from an old client that I hadn’t seen for about 18 months. They had a problem – as far as they could tell a business critical query had suddenly stopped using an index and was doing a tablescan instead. At peak load times the query was taking between 15 and 50 seconds to complete when it used to take less than 2 seconds. Did I have any time to come and help?
A recent thread on the Oracle newsgroup comp.databases.oracle.server started with the following tkprof summary from a trace file: (more…)
Or as Doug Burns put it on his blog: how can I tell the actual DOP used for my parallel query”
As Doug points out, you first have to ask “When are you asking the question”. Immediately after you have run the query you could check v$pq_tqstat to see what happened, but apart from that you have very little hope of finding out in retrospect what happened in a specific query.
A few days ago, I described a change to Statspack in 10g that allowed you to see how much time each snapshot took. In that article I suggested copying the code back to 9i; however this is a fiddly little task, and you may prefer to avoid it.
Here’s an alternative – actually a strategy with a much wider use – which does the same, but gives you even better information. Use a “before logoff” trigger to record the information you need. There are lots of ways you could create a suitable trigger; here’s an example – with just a couple of naughty features that you might want to change:
Following yesterday’s note on SQL Profiles, someone asked how I detect that an opt_estimate hint had been used – with specific reference to the “index_scan” option. The reason for this particular choice is that other opt_estimate hints have a highly visible impact in the 10053 trace files, but this one doesn’t.
A little while ago I wrote a note about an anomaly that someone was seeing with the index_ss() hint. Since then, they forwarded me a 10053 trace file of the question.
I’ve posted brief analysis of it on my website, just to give you an idea of the sorts of things you can look out for when you’re trying to work out why Oracle seems to be ignoring a hint.
[Further reading on "ignoring hints"]