Here’s one I keep forgetting – and spending 15 minutes trying to think of the answer before getting to the “deja vu” point again. I’ve finally decided that I’ve got to write the answer down because that will save me about 14 minutes the next time I forget.
Q. In a Statspack or AWR report there is a section titles “Segments by Row Lock Waits”. Why could an index be subject to a Row Lock Wait ?
A. Try inserting into a table from two different sessions (without committing) two rows with the same primary key. The second insert will wait on event enq: TX – row lock contention, and show up in v$lock with a lock request for a TX lock in mode 4. When you issue a commit or rollback on the first session, and the second statement errors or completes (depending on whether you commit or rollback the first session) it will increase the value for row lock waits in v$segstat (and v$segment_statistics) for the index by 1.
There are variations on the theme, of course, but the key feature is uniqueness with one session waiting for another session to commit or rollback on a conflicting value. This includes cases of foreign key constraint checking such as inserting a child for a parent that has been deleted but not committed (and there’s an interesting anomaly with that scenario which – in 10g, at least – reports more row lock waits on the parent PK than you might expect.)
Here’s a deadlock graph the appeared on Oracle-L and OTN a couple of days ago.
Resource Name process session holds waits process session holds waits
TX-001a0002-0002a0fe 196 197 X 166 1835 S
TM-0000c800-00000000 166 1835 SX 196 197 SX SSX
It’s a little unusual because instead of the common TX mode 6 (eXclusive) crossover we have one TX and one TM lock, the TX wait is for mode 4 (S) and the TM wait is for a conversion from 3 (SX) to 5 (SSX).
How do you trouble-shoot a problem ? It’s not an easy question to answer when posed in this generic fashion; but perhaps it’s possible to help people trouble-shoot by doing some examples in front of them. (This is why I’ve got so many statspack/AWR examples – just reading a collection of different problems helps you to get into the right mental habit.)
So here’s a problem someone sent me yesterday. Since it only took a few seconds to read, and included a complete build for a test case, with results, and since it clearly displayed an Oracle bug, I took a look at it. (I’ve trimmed the test a little bit, there were a few more queries leading up to the error):
Many questions I see on the internet could be resolved by the simple expedient of looking at the information that’s in front of you; and many of the questions that clutter up forums and newsgroup would simply cease to appear if more people adopted this approach. Here’s an example of what I mean:
According to Metalink note 34559.1 the p1 value is the absolute file number. According to dba_data_files all my file numbers are one, two or three digits, but when I “select distinct p1 from v$session_wait” I am seeing a file number (p1) of 1413697536. Why is this appearing ?
No doubt someone reading this post will look at that number and immediately recognise it as coming from an SQL*Net message wait on a tcp connection (1413697536 = 0×54435000 -> 0×54 0×43 0×50 -> T C P) and may wonder why the Metalink note didn’t explain that p1 was the absolute file number only when the wait was related to some sort of file I/O. This was the thought that crossed my mind, so I logged on to MOS to see what the note said (because sometimes I fill in the feedback form to correct the errors or ask for a note to be withdrawn) and this is what I found:
Title of note: “db file sequential read” Reference Note
First line of note: This is a reference note for the wait event “db file sequential read” which …
I find that I can’t conjure up any circumstances where a person could know enough to query v$session_wait and dba_data_files, and yet not notice the significance of the title and first line of the metalink note.
Footnote: According to a story I heard recently, there is at least one site where the management has blocked access to any internet sites that cause too much traffic on the company network. Unfortunately the IT department seemed to spend a lot of time reading the Oracle online documentation. So whenever I see the expression RTFM I now interpret it as “read the firewalled manuals”.
I’ve said in the past that one of the best new features, in my view, in 11g was the appearance of proper virtual columns; and I’ve also been very keen on the new “approximate NDV” that makes it viable to collect stats with the “auto_sample_size”.
Who’d have guessed that if you put them both together, then ran a parallel stats collection it would break :(
The bug number Karen quotes (10013177.8) doesn’t (appear to) mention extended stats – but since virtual columns, function-based indexes, and extended stats share a number of implementation details I’d guess that they might be affected as well.
It’s about time I wrote a sequel to Mything in Action – and funnily enough it’s also about bitmap indexes. It starts with a note on the OTN database forum that prompted me to run up a quick test to examine something that turned out to be a limitation in the optimizer. The problem was that the optimizer didn’t do a “bitmap and” between two indexes when it was obviously a reasonable – possibly even good – idea. Here’s some sample code:
Here’s a quick tutorial in hinting, promped by a question on the OTN database forum.
The OP has a hash semi-join and Oracle appears to be ignoring a hint to use a nested loop:
So you’ve decided you want to audit a particular table in your database, and think that Oracle’s built in audit command will do what you want. You discover two options that seem to be relevant:
audit all on t1 by access;
audit all on t1 by session;
To check the audit state of anything in your schema you can then run a simple query – with a few SQL*Plus formatting commands – to see something like the following:
Here’s an example of how the passing of time can allow a problem to creep up on you.
A recent client had a pair of logon/logoff database triggers to capture some information that wasn’t available in the normal audit trail, and they had been using these triggers successfully for many years, but one day they realised that the amount of redo generated per hour had become rather large, and had actually been quite bad and getting worse over the last few months for no apparent reason. (You’ve heard this one before … “honest, guv, nothing has changed”).
Here’s one of those funny little details that can cause confusion:
SQL> select * from user_audit_object;
no rows selected
SQL> audit select on indjoin by session whenever successful;
4 indjoin ij
6 id between 100 and 200
7 and val between 50 and 150
1 row selected.
SQL> select * from user_audit_object where obj_name = 'INDJOIN';
no rows selected
An odd little discovery today. The peek command of oradebug is fairly well documented on the internet, and I’ve used it from time to time to find the contents of memory. Earlier on today, though, I was running 188.8.131.52 and it didn’t do what I was expecting. I was trying to dump a log buffer block from memory so after logging on as sys and finding the location for the block I duly typed:
Here’s a little gem I hadn’t come across before (because I hadn’t read the upgrade manuals). Try running the following pl/sql block in 9i, and then 10g (or later):
v1 := 256*256*256*256;
In 9i the result is 4294967296; but for later versions the result is:
ERROR at line 1:
ORA-01426: numeric overflow
ORA-06512: at line 4
It’s not a bug, it’s expected behaviour. The expression consists of integers only, so Oracle uses INTEGER arithmetic that limits the result to roughly 9 significant figures. If you want the block to work in newer versions of Oracle you have to add a decimal point to (at least) one of the operands to make Oracle use NUMBER arithmetic that takes it up to roughly 38 significant figures.
There’s an interesting question on the OTN database forum at present – why does an update of 300,000 rows take a billion buffer visits. (There are 25 indexes on the table – so you might point a finger at that initially, but only one of the indexes is going to be changed by the update so that should only account for around an extra 10 gets per row in a clean environment.)
For those looking to the next upgrade – here’s an early warning from Oracle:
ANNOUNCEMENT: Deprecating the cursor_sharing = ‘SIMILAR’ setting (Doc ID 1169017.1)
“We recommend that customers discontinue setting cursor_sharing = SIMILAR due to the many problematic situations customers have experienced using it. The ability to set this will be removed in version 12 of the Oracle Database (the settings of EXACT and FORCE will remain available). Instead, we recommend the use of Adaptive Cursor Sharing in 11g.”
The rest of the note contains some interesting information about the behaviour and side effects of this option – which may also help you debug some library cache issues if you’re currently running with this value set in 11g.
Here’s a nice example on the OTN database forum of Dom Brooks looking at the evidence.
- The query is slow – what does the trace say.
- There’s “row source execution” line that says we get 71,288 rows before doing a hash unique drops it to 3,429 rows.
- There’s a statement (upper case, bind variables as :Bn) in the trace file that has been executed 71,288 times
- A very large fraction of the trace file time is in the secondary statement
- There’s a user-defined function call in the original select list, before a ‘select distinct’.
Conclusion: the code should probably do a “distinct” in an inline view before calling the function, reducing the number of calls to the function from 71,288 to 3,429.
Footnote: There may be other efficiency steps to consider – I’m always a little suspicious of a query that uses “distinct”: possibly it’s hiding an error in logic, possibly it should be rewritten with an existence subquery somewhere, but sometimes it really is the best strategy. There’s are some unusual statistics names coming from autotrace in the OP’s system – I wonder if he’s installed one of Tanel Poder’s special library hacks.