I wrote a note for my website some time ago about the dangers of using scripts that ‘looked cool’ by virtue of their reference to v$ and x$ objects. So I was quite amused to get the following script emailed to me:
select distinct ws.osuser locked, ws.process, hs.osuser holding, hs.process, obj$.name, sw.seconds_in_wait time from v$session ws, v$session hs, v$lock wl, v$lock hl, obj$, v$session_wait sw where ws.sid = sw.sid and ws.row_wait_obj# = obj$.obj# and ws.row_wait_obj# > 0 and sw.seconds_in_wait > 0 and wl.sid = ws.sid and wl.id1 = hl.id1 and wl.id2 = hl.id2 and hl.block = 1 and wl.request != 0 and hl.sid = hs.sid ;
With the script came the request for assistance in tuning it because: “at the client site they’ve managed to use OEM tuning advisor on 10.2.0.2 to get the time down from 40 seconds to eight seconds, but on my system I don’t get the same execution plan and it behaves quite reasonably”.
The functional description of the statement was that it would show: “who was locking what and who was waiting for this lock”, with the comment that the statement had originally been acquired from the Internet.
So here’s the generic advice for tuning any SQL:
- understand what the code is supposed to achieve
- understand the data
- determine the rational order of work
- discover if Oracle can follow that order automatically
- make Oracle follow that order (if necessary)
Of course, there are numerous subtle points behind this stark list of actions – but it does give you the broad outline of a sensible strategy. But there’s a rule zero for any SQL you find on the Internet: don’t use it until you’ve worked out what it’s doing.
If you apply these rules to this particular statement you can immediately start to pick up all sorts of details.
First – have a look at v$session: in 10g most of the information from v$session_wait is built into v$session (the underlying x$ksusecst is part of the same C-structure as x$ksuse) so there’s a completely redundant join in the query.
Second – what’s that distinct doing? There are times when you look at the intent of a query and know that the distinct is only there because the original author hadn’t quite got the code right and needed to get rid of unexpected duplicates. That leads us to the point that in the self-join of v$lock, the join on lock type is missing – in this case it’s not likely to make a difference, but it is a mistake.
Third – v$lock is a very messy object. Its definition in 10.2.0.3 includes a 10 table union all followed by a join to x$ksuse (v$session again) and x$ksqrs: so it might be better if we avoided that self-join – and we could even try to get clever with the direct link to v$session ! In fact, if this is a piece of code that’s supposed to run as efficiently as possible when we see a problem, we might even take the whole thing down to the x$ objects.
Fourth – dynamic performance views are not guaranteed to give us self-consistent data (it’s in-memory, there’s no consistent read mechanism) – so if we look at v$lock and then track to v$session we might get to v$session at a moment when the lock has disappeared and the object number and seconds in wait have changed, so the tests for seconds_in_wait and row_wait_obj# might lose some useful data – or even give some misleading data. In fact, whatever we do with this query, there is some chance that some of the results will be misleading because of the consistency problem, and that’s a point that we should make clear in our documentation for the script.
Finally – we need to consider the data volumes. I rarely have more than 20 sessions on my laptop, but some of my clients have several thousand connected sessions with a few hundred active. Does this tell me anything about a general-purpose execution path; and since I’m querying something as variable as the dynamic performance views, should I stack my query with hints ?
The purpose of the query is to find out how locks are causing problems, so we probably ought to start the query at v$lock – to find out if there are any locks to worry about, then add value through v$session and obj$.
Since v$lock is such a messy view, and we’re going to join it to itself, it would probably be a good idea to take advantage of subquery factoring to materialize the smallest result set before joining to the rest of the views. We might start with something like the following:
with locks as ( select /*+ materialize */ sid, type, id1, id2, block, request, ctime from v$lock where block = 1 -- blockers or request != 0 -- waiters ) select /*+ qb_name(inline) */ wss.sid waiting_session, wss.process waiting_process, wss.osuser waiting_user, bss.sid blocker_session, bss.process blocker_process, bss.osuser blocker_user, blk.ctime blocker_hold_time, wlk.ctime waiting_time, blk.type blocker_type, blk.id1 blocker_id1, blk.id2 blocker_id2, wss.row_wait_obj# waiting_obj from locks blk, locks wlk, v$session bss, v$session wss where blk.block = 1 and wlk.request != 0 and wlk.type = blk.type and wlk.id1 = blk.id1 and wlk.id2 = blk.id2 and bss.sid = blk.sid and wss.sid = wlk.sid )
I’ve included a /*+ materialize */ hint to force Oracle to generate a (GTT) global temporary table to hold the subset of data I need from v$lock. You’ll also notice that I’ve given the query a slightly surprising query block name – which I will explain shortly.
I haven’t bothered to join this view to obj$ (as the original query does) because it’s a waste of resources – remember, this is supposed to be a low-impact query that runs when the system is in trouble! An object isn’t going to change its id, so if this query finds anything interesting, I’ll use the reported object id to get the object name from a follow-up query.
On the the other hand it might not cost much to do the join. But which object id do I want, the one from v$lock (if it’s a lock of type ‘TM’) or the one from v$session (for all other lock types) ? Bear in mind, by the way, that the lack of read-consistency may mean that the object id in v$session could be the wrong one when the lock type shows ‘TM’.
Clearly, I need the following join predicate:
obj$.obj# = decode(blocker_type, 'TM',blocker_id1, waiting_obj )
Unfortunately this has to be turned into an outer join because the object id in v$session will be -1 (which doesn’t exist as an object) if there is no guilty object. And you can’t do outer joins to two different tables, or you get Oracle error:
ORA-01417: a table may be outer joined to at most one other table
And that’s why I had the funny query block name above. I’ve folded that query into an inline view before joining to the obj$ table. Here’s the final query:
select /*+ leading(@sel$03ADD216 blk@inline wlk@inline) use_hash(@sel$03ADD216 wlk@inline) no_swap_join_inputs(@sel$03ADD216 wlk@inline) */ lock_view.*, obj$.name, obj$.subname -- plus a decode() for type ?, plus owner ? from ( with locks as ( select /* materialize */ sid, type, id1, id2, block, request, ctime from v$lock where block = 1 or request != 0 ) select /*+ qb_name(inline) */ wss.sid waiting_session, wss.process waiting_process, wss.osuser waiting_user, bss.sid blocker_session, bss.process blocker_process, bss.osuser blocker_user, blk.ctime blocker_hold_time, wlk.ctime waiting_time, blk.type blocker_type, blk.id1 blocker_id1, blk.id2 blocker_id2, wss.row_wait_obj# waiting_obj from locks blk, locks wlk, v$session bss, v$session wss where blk.block = 1 and wlk.request != 0 and wlk.type = blk.type and wlk.id1 = blk.id1 and wlk.id2 = blk.id2 and bss.sid = blk.sid and wss.sid = wlk.sid ) lock_view, obj$ where obj$.obj#(+) = decode(blocker_type, 'TM',blocker_id1, waiting_obj ) ;
I’ve had to include a couple of hints to get the correct behaviour inside the (non-mergeable) lock_view – without the hints the query crashed with an interesting ORA-00600 error when I tried to generate an execution plan for it.
You’ll notice that I’ve had to use the method I decribed in an earlier post to discover the name of a transformed query block so that I could make sure that the optimizer started by doing a hash join that used the blockers set (which is likely to be the smaller set) as the build table and the waiters set as the probe table.
In theory I should have included a load more hints to make sure that the joins into v$session (or rather x$ksuse and x$ksled) followed the paths I wanted – but I got a bit lazy, so I’ve left that as an exercise to the reader. In my case, the optimizer used a nested loop all the way down – but for a big system it is possible that a nested loop into the blockers followed by a hash join into the waiters might be more appropriate.
The final execution plan – excluding the bit that generates the locks global temporary table:
------------------------------------------------------------------------------- | Id | Operation | Name | Time | ------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 00:00:01 | | 1 | NESTED LOOPS OUTER | | 00:00:01 | | 2 | VIEW | | 00:00:01 | ... | 23 | NESTED LOOPS | | 00:00:01 | | 24 | NESTED LOOPS | | 00:00:01 | | 25 | NESTED LOOPS | | 00:00:01 | | 26 | NESTED LOOPS | | 00:00:01 | |* 27 | HASH JOIN | | 00:00:01 | |* 28 | VIEW | | 00:00:01 | | 29 | TABLE ACCESS FULL | SYS_TEMP_0FD9D661A_73691C | 00:00:01 | |* 30 | VIEW | | 00:00:01 | | 31 | TABLE ACCESS FULL | SYS_TEMP_0FD9D661A_73691C | 00:00:01 | |* 32 | FIXED TABLE FIXED INDEX| X$KSUSE (ind:1) | 00:00:01 | |* 33 | FIXED TABLE FIXED INDEX | X$KSLED (ind:2) | 00:00:01 | |* 34 | FIXED TABLE FIXED INDEX | X$KSUSE (ind:1) | 00:00:01 | |* 35 | FIXED TABLE FIXED INDEX | X$KSLED (ind:2) | 00:00:01 | | 36 | TABLE ACCESS BY INDEX ROWID | OBJ$ | 00:00:01 | |* 37 | INDEX UNIQUE SCAN | I_OBJ1 | 00:00:01 | -------------------------------------------------------------------------------
It’s probably about as efficient as it gets.
The only nasty thing about it is that it generates a global temporary table – so if the system is jammed in some way on the temporary tablespace, or temp space allocation lock, this query could just end up joining the jam (in which case you need to inline the locks subquery twice, and specify a /*+ no_merge */ hint each time).
I’m not going to guarantee this query – it’s just something I put together in 20 minutes. I think it’s correct and sensible, but I haven’t tested it properly. I still can’t think why the original query had a distinct in it, and that worries me a bit – have I overlooked something ?