The worst type of Oracle bug is the one that seems to appear randomly and can’t be reproduced on demand. (Such as when Oracle support says “please send us a reproducible test case”).
Here’s one such (probable) bug that showed up at a client site that was reporting performance problems with a query that, on random days, chose a bad execution plan. The client was running one of the earlier versions of 9.2, and using the following call to dbms_stats to collect fresh table stats for each table in turn every night.
If you ever thought you could understand what’s printed in a trace file, you might like to read this comment from a recent SR:
The trace file is reporting the steps that are needed to troubleshoot an issue. The comments tags and entries in the files may not be consistent in every scenario and their meaning can change between tasks and versions. As they are not designed or intended for direct interpretation by customers, no rationale or consistency is guaranteed.
Further, no official documentation is provided for the raw trace files. The only behaviour that is considered ‘expected’ behaviour is that which is contained in the documentation and manuals.
Of course, this does make life a little confusing – is a note that’s been published on Metalink (such as 39817.1 and 338137.1) not official after all ?
Experiences like this one [Ed: Nov 2008 – the blog has become private since I wrote this note]are always worth reading about to remind yourself what you can do with the dbms_stats package when it’s really necessary.
And while I’m pointing to other URLs, here’s another one worth knowing about – event “Cursor: pin S wait on X”. It’s not surprising to see this wait event occasionally in a busy 10g system, but if you’re losing a significant amount of time, it could be a bug.
A couple of weeks ago, having just arrived home from Spain, I got a phone call in Heathrow airport from an old client that I hadn’t seen for about 18 months. They had a problem – as far as they could tell a business critical query had suddenly stopped using an index and was doing a tablescan instead. At peak load times the query was taking between 15 and 50 seconds to complete when it used to take less than 2 seconds. Did I have any time to come and help?
A recent thread on the Oracle newsgroup comp.databases.oracle.server started with the following tkprof summary from a trace file: (more…)
Or as Doug Burns put it on his blog: how can I tell the actual DOP used for my parallel query”
As Doug points out, you first have to ask “When are you asking the question”. Immediately after you have run the query you could check v$pq_tqstat to see what happened, but apart from that you have very little hope of finding out in retrospect what happened in a specific query.
A few days ago, I described a change to Statspack in 10g that allowed you to see how much time each snapshot took. In that article I suggested copying the code back to 9i; however this is a fiddly little task, and you may prefer to avoid it.
Here’s an alternative – actually a strategy with a much wider use – which does the same, but gives you even better information. Use a “before logoff” trigger to record the information you need. There are lots of ways you could create a suitable trigger; here’s an example – with just a couple of naughty features that you might want to change:
Following yesterday’s note on SQL Profiles, someone asked how I detect that an opt_estimate hint had been used – with specific reference to the “index_scan” option. The reason for this particular choice is that other opt_estimate hints have a highly visible impact in the 10053 trace files, but this one doesn’t.
A little while ago I wrote a note about an anomaly that someone was seeing with the index_ss() hint. Since then, they forwarded me a 10053 trace file of the question.
I’ve posted brief analysis of it on my website, just to give you an idea of the sorts of things you can look out for when you’re trying to work out why Oracle seems to be ignoring a hint.
[Further reading on “ignoring hints”]
If you ever enable event 10132, don’t be surprised to find some of the resulting execution plans looking a little messy, for example (from a 9i trace file): (more…)
Just a little follow-up on my earlier note on buffer sorts. The following is an extract from a a tkprof output showing the rowsource operation for a query. Note especially the rows information. (more…)
If you look at a 10053 trace file, you will find references to Resc and Resp (sometimes rsc and rsp) lurking in the background whenever the trace file mentions Cost. When you get to the 10gR2 trace file, you see these described in the “Legend” section as: (more…)
I have made a few comments in previous articles about the use of bind variables and some of the peripheral details that can introduce surprises; and in the article on superfluous updates I made a throwaway comment about getting multiple child cursors for a single statement if you had columns of varchar2() or nvarchar2() defined to be longer than 32 bytes. It’s worth expanding on this point.
In an earlier article I mentioned the buffer sort in a footnote; I thought I would expand a little more on what I think it does and why it appears as a buffer sort in an execution plan rather than the more traditional sort (join).
Consider the trivial script:
I’ve just received an email from someone who had been on my “masterclass”, and he had a question about one of the comments I had made, namely: “bind variable peeking is always done, even if histograms are not generated”.
They quoted a reference to an Oracle white paper which said, on page 15: