It looks like Laimutis Nedzinskas has spent some time producing a list of new views, and changes to existing views, for trouble-shooting specialists. The lists cover the dynamic performance views and the AWR history views, and the differences between 10.2 and 11.2. The lists include, where relevant, convenient hyperlinks to the view definitions in the online copy of the 11g Server Reference manual.
I haven’t checked the lists in detail, but I can see myself taking advantage of his efforts from time to time:
You might think from the title that this little note is going to be about the index hash join – you would be half right, it’s also about how the optimizer seems to make a complete hash of dealing with index hash joins.
Let’s set up a simple data set and a couple of indexes so that we can take a closer look:
Here’s a set of Instance Activity stats I’ve never seen before, and I’d rather never see again. From an active standby running 220.127.116.11 on AIX:
This one’s so odd I nearly posted it as a “Quiz Night” – but decided that it would be friendlier simply to demonstrate it. Here’s a simple script to create a couple of identical tables. It’s using my standard environment but, apart from fiddling with optimizer settings, I doubt if there’s any reason why you need to worry too much about getting the environment exactly right.
I’ve pointed out in the past how bad the Instance Efficiency ratios are in highlighting a performance problem. Here’s a recent example from OTN repeating the point. The question, paraphrased, was:
After going through AWR reports (Instance Efficiency Percentages) I observed they have low Execute to Parse % but high Soft Parse %.
Please share if you had faced such issue and any suggestions to solve this
Time for another of those little surprises that catch you out after the upgrade.
Take a look at this “Top N” from a standard AWR report, from an instance running 18.104.22.168
Top 5 Timed Foreground Events
wait % DB
Event Waits Time(s) (ms) time Wait Class
------------------------------ ------------ ----------- ------ ------ ----------
direct path read 3,464,056 6,593 2 33.5 User I/O
DB CPU 3,503 17.8
db flash cache single block ph 2,293,604 3,008 1 15.3 User I/O
db file sequential read 200,779 2,294 11 11.6 User I/O
enq: TC - contention 82 1,571 19158 8.0 Other
Have you ever heard the suggestion that if you see time lost on event write complete waits you need to get some faster discs.
So what’s the next move when you’ve got 96GB of flash cache plugged into your server (check the parameters below) and see time lost on event write complete waits: flash cache ?
Here’s one I keep forgetting – and spending 15 minutes trying to think of the answer before getting to the “deja vu” point again. I’ve finally decided that I’ve got to write the answer down because that will save me about 14 minutes the next time I forget.
Q. In a Statspack or AWR report there is a section titles “Segments by Row Lock Waits”. Why could an index be subject to a Row Lock Wait ?
A. Try inserting into a table from two different sessions (without committing) two rows with the same primary key. The second insert will wait on event enq: TX – row lock contention, and show up in v$lock with a lock request for a TX lock in mode 4. When you issue a commit or rollback on the first session, and the second statement errors or completes (depending on whether you commit or rollback the first session) it will increase the value for row lock waits in v$segstat (and v$segment_statistics) for the index by 1.
There are variations on the theme, of course, but the key feature is uniqueness with one session waiting for another session to commit or rollback on a conflicting value. This includes cases of foreign key constraint checking such as inserting a child for a parent that has been deleted but not committed (and there’s an interesting anomaly with that scenario which – in 10g, at least – reports more row lock waits on the parent PK than you might expect.)
Here’s a deadlock graph the appeared on Oracle-L and OTN a couple of days ago.
Resource Name process session holds waits process session holds waits
TX-001a0002-0002a0fe 196 197 X 166 1835 S
TM-0000c800-00000000 166 1835 SX 196 197 SX SSX
It’s a little unusual because instead of the common TX mode 6 (eXclusive) crossover we have one TX and one TM lock, the TX wait is for mode 4 (S) and the TM wait is for a conversion from 3 (SX) to 5 (SSX).
How do you trouble-shoot a problem ? It’s not an easy question to answer when posed in this generic fashion; but perhaps it’s possible to help people trouble-shoot by doing some examples in front of them. (This is why I’ve got so many statspack/AWR examples – just reading a collection of different problems helps you to get into the right mental habit.)
So here’s a problem someone sent me yesterday. Since it only took a few seconds to read, and included a complete build for a test case, with results, and since it clearly displayed an Oracle bug, I took a look at it. (I’ve trimmed the test a little bit, there were a few more queries leading up to the error):
Many questions I see on the internet could be resolved by the simple expedient of looking at the information that’s in front of you; and many of the questions that clutter up forums and newsgroup would simply cease to appear if more people adopted this approach. Here’s an example of what I mean:
According to Metalink note 34559.1 the p1 value is the absolute file number. According to dba_data_files all my file numbers are one, two or three digits, but when I “select distinct p1 from v$session_wait” I am seeing a file number (p1) of 1413697536. Why is this appearing ?
No doubt someone reading this post will look at that number and immediately recognise it as coming from an SQL*Net message wait on a tcp connection (1413697536 = 0×54435000 -> 0×54 0×43 0×50 -> T C P) and may wonder why the Metalink note didn’t explain that p1 was the absolute file number only when the wait was related to some sort of file I/O. This was the thought that crossed my mind, so I logged on to MOS to see what the note said (because sometimes I fill in the feedback form to correct the errors or ask for a note to be withdrawn) and this is what I found:
Title of note: “db file sequential read” Reference Note
First line of note: This is a reference note for the wait event “db file sequential read” which …
I find that I can’t conjure up any circumstances where a person could know enough to query v$session_wait and dba_data_files, and yet not notice the significance of the title and first line of the metalink note.
Footnote: According to a story I heard recently, there is at least one site where the management has blocked access to any internet sites that cause too much traffic on the company network. Unfortunately the IT department seemed to spend a lot of time reading the Oracle online documentation. So whenever I see the expression RTFM I now interpret it as “read the firewalled manuals”.
I’ve said in the past that one of the best new features, in my view, in 11g was the appearance of proper virtual columns; and I’ve also been very keen on the new “approximate NDV” that makes it viable to collect stats with the “auto_sample_size”.
Who’d have guessed that if you put them both together, then ran a parallel stats collection it would break :(
The bug number Karen quotes (10013177.8) doesn’t (appear to) mention extended stats – but since virtual columns, function-based indexes, and extended stats share a number of implementation details I’d guess that they might be affected as well.
It’s about time I wrote a sequel to Mything in Action – and funnily enough it’s also about bitmap indexes. It starts with a note on the OTN database forum that prompted me to run up a quick test to examine something that turned out to be a limitation in the optimizer. The problem was that the optimizer didn’t do a “bitmap and” between two indexes when it was obviously a reasonable – possibly even good – idea. Here’s some sample code:
Here’s a quick tutorial in hinting, promped by a question on the OTN database forum.
The OP has a hash semi-join and Oracle appears to be ignoring a hint to use a nested loop:
So you’ve decided you want to audit a particular table in your database, and think that Oracle’s built in audit command will do what you want. You discover two options that seem to be relevant:
audit all on t1 by access;
audit all on t1 by session;
To check the audit state of anything in your schema you can then run a simple query – with a few SQL*Plus formatting commands – to see something like the following: