Oracle Scratchpad

OC 4 Locks and Latches

Addenda and Errata for Oracle Core Chapter 4 Locks and Latches

Back to Index



(see comment below) The offset is 16, not 32. The three offsets are 8, 16, and 24 for owners, waiters, and converters respectively.

p.60 In the Note, there is a reference to enqueue locks with no underscore character) in the first line, and enqueues locks in the second. Both of these should read enqueue_locks
p.73 In the Note: two naming errors in one sentence: “as can be seen in the dynamic performance view v$latch_holder, which is underpinned by the structure x$ksuprlatch. The correct name for the v$ structure is v$latchholder (no underscore), the correct name for the x$ structure is x$ksuprlat
p.77 In the Note, fifth line, there is a reference to the parameter enqueues; this should be the hidden parameter _enqueue_locks.
p.78 Second paragraph, third line, makes a reference to “… a column x$ksqlres, which is …”; the correct name for the column is ksqlkres.
p.79 Bullet point 1 should be: “Session 37: Delete the only child of parent 1”
Bullet point 2 should be: “Session 36: Delete the only child of parent 2”
p.79 Figure 4-4, the label on the bottom right-hand rectangle read “Sid 35 0,5” when it should be “Sid 35 0,3
p.80 Three lines from bottom of page: “… session 29 started waiting …”, this should be “… session 39 started waiting …”
p.82 Finally, we can see that the backward pointer in line 3 and the forward pointer in line 4 are both (21a4cda0) pointing back to the resource address (21A4CD90), although the offset is 32 rather than 8.


p.69 Table 4-2. Statistic Misses – should be “misses”
p.70 Table 4-2. Statistics Sleeps, Immediate_gets, Immediate_misses, Wait_time are all capitalised when they should not be.
p.74 Last line of last Note: “Parsing and Optimising” should be (American) “Parsing and Optimizing”
p.77 Second paragraph of section Infrastructure, last line: “…the array definition are” needs a colon: “…  the array definition are:”
p.81 Paragraph after first note: “The command is” needs a colon: “The command is:”
p.82 First line of section headed Deadlock: “Looking back as …” should be “Looking back at …”
p.83 Last Note, end of second line: “… will say” needs a colon: “… will say:”
p.91 Second paragraph (style):  “Mutexes are small, and don’t have lists associated with them, we now have information pointing one way, the session knows which mutexes it is holding, but the mutex doesn’t know which sessions are holding it.” This would be better as: “Mutexes are small and don’t have lists associated with them. This means we now have information pointing only one way – the session knows which mutexes it is holding but a mutex doesn’t know which sessions are holding it.”

Back to Index


  1. Sir,
    Thanks for ur great book CBO fundamentals.
    This is the most interesting part for me. Will you show us any methods(dumping dtrace etc) using which we can explore latches by ourselves.

    Thanks sir.

    Comment by Somu — October 15, 2011 @ 9:13 am BST Oct 15,2011 | Reply

    • Somu,

      The notes I’ve written describe the function and purpose, but don’t go into detail on debugging and tracing. However I do point readers to which is the blog where Andrey Nikolaev has published a lot of his work on investigating latches and mutexes. (Andrey was one of my special reviewers, and this was the chapter I asked him to review.)

      Comment by Jonathan Lewis — November 13, 2011 @ 12:17 pm GMT Nov 13,2011 | Reply

  2. Jonathan,

    seems some words has missing near the start of the session “A Graphic Image of V$.ock” in chapter 04, I guess you want to say as below.

    1. Session 37: The only child of parent 1. <– should be: "Attempt to delete The only child of parent 1."
    2. Session 36: The only child of parent 2. <– should be: "Attempt to delete The only child of parent 2."
    3. Session 39: Attempt to lock the child table in exclusive mode (and start to
    4. Session 37: Attempt to delete parent 1 (and start to wait due to missing FK
    5. Session 35: Attempt to delete the only child of parent 3 (and start to wait).

    Comment by Sid — December 4, 2011 @ 6:59 am GMT Dec 4,2011 | Reply

    • Sid,

      Thanks for that – the missing word in both cases is simply “Delete”.

      If it weren’t so irritating this pair of errors would be an amusing example of how it’s possible to explain something perfectly and still have someone else misunderstand completely: in the proof copy the original text said “Update the only …” which was an error, so I highlighted the word “Update” and replaced it with “Delete” (which is exactly how the editor was supposed to be used, and how I has used it for several other corrections to the chapter). Unfortunately the next person to do anything with the proofs decided that this meant I wanted the word “update” to be deleted :(

      Comment by Jonathan Lewis — December 4, 2011 @ 9:05 am GMT Dec 4,2011 | Reply

  3. Jonathan,

    On page 77 in the Note section the book states, “… if you query the dynamic performance view v$lock, you are querying the structure defined by the parameter enqueues.” Is it possible that the parameter is actually named ENQUEUE_RESOURCES, rather than ENQUEUES? If so, the Oracle Database documentation states that the ENQUEUE_RESOURCES parameter is obsolete as of 10.2 (ref: ). The hidden parameter _ENQUEUE_RESOURCES exists in (defaulted to a value of 1308 in one database instance), and I assume that the same hidden parameter exists in 10.2.0.x.

    Comment by Charles Hooper — December 11, 2011 @ 3:29 am GMT Dec 11,2011 | Reply

    • Charles,

      Thanks for highlighting the error.

      It’s actually _enqueue_locks, rather than _enqueue_resources. The latter is used to represented things that can be locked, the former represents the action of acquiring a lock on a resource. Now added to the Errata list.

      Comment by Jonathan Lewis — December 11, 2011 @ 2:13 pm GMT Dec 11,2011 | Reply

      • Thanks for the correction.

        I think that I might have found another minor issue. On page 78 it appears that a $ character slipped into a column name. The sentence is found at the end of the paragraph that follows the first code section:
        “… exposed indirectly through v$lock through the sid (session id), and a column x$ksqlres, which is the address of the resource it’s locking, exposed indirectly through the type, id1, and id2.”

        Comment by Charles Hooper — December 11, 2011 @ 10:24 pm GMT Dec 11,2011 | Reply

  4. Hi Jonathan,

    I bought a softcopy of this book. Just wondering: how would these corrected errors be presented for the softcopy version? As an additional softcopy of errrata that we need to refer to, or will the corrections be merged into the original softcopy?

    Comment by Andy — December 12, 2011 @ 10:00 am GMT Dec 12,2011 | Reply

    • Andy,

      I don’t know what APress does about the errors that I’ve confirmed them. I’ve forwarded your question to my editor, and sent them the URL of your comment above. I’ll keep you informed of any response.

      Comment by Jonathan Lewis — December 14, 2011 @ 1:57 pm GMT Dec 14,2011 | Reply

      • Thank you very much Jonathan. You can post APress’s response here instead. I think there are other softcopy owners who might be interested to know that too. Besides, I read your blogs every week. Won’t miss anything :)

        Comment by Andy — December 15, 2011 @ 5:27 pm GMT Dec 15,2011 | Reply

      • Andy,

        The response from APress is that the errata will be posted onto the Apress website at some time in the future, but I think this probably means a table of errors, rather than as some sort of eBook document. I’ve sent a follow-up post pointing out the difficulty of “writing corrections in the margin” when the book isn’t paper, and asked if they have any ideas about how eBook owners can get the corrections into the softcopy.

        Comment by Jonathan Lewis — December 23, 2011 @ 8:13 am GMT Dec 23,2011 | Reply

        • Thanks for the update, Jonathan.
          It would be nice if the corrections could be merged into a fresh eBook. Hope Apress can keep up with the way people use eBook these days :) Not many people enjoy having a notebook/tablet to read it while having a printed copy on hand to cross-reference it. Doing that would be very …’backward’ , if not awkward ;)


          Comment by Andy — December 27, 2011 @ 6:38 am GMT Dec 27,2011

        • Andy,

          I’ve sent APress a follow-up note about the difficulty of writing corrections in the margin of a book if it’s an eBook, suggesting that it would be nice if they could come up with an idea for electronic corrections.

          I believe Apple and Amazon have some sort of “global comments” feature that allows you to see comments and annotations that other people have made on the book you are reading – it would be a nice addition if they had a special subsetting mechanism that restricted the visible comments to corrections by the author.

          Comment by Jonathan Lewis — December 28, 2011 @ 8:14 pm GMT Dec 28,2011

  5. Hi. Jonathan.
    Finally, we can see that the backward pointer in line 3 and the forward pointer in line 4 are both (21a4cda0) pointing back to the resource address (21A4CD90), although the offset is 32 rather than 8.

    I think that “offset is 32 rather than 8” is not correct. because 21A4CD90 – 21A4CD90 = 16 (decimal).

    Comment by sean_kim — October 6, 2012 @ 2:16 pm BST Oct 6,2012 | Reply

  6. Hi Jonathan,

    In the description of how an exclusive latch conceptually works on p. 67, I see the following potential that two processes/sessions 1 and 2 acquire the latch simultaneously as follows:

    Session 1: Set register X to point at latch address A
    Session 1: If value at address A is zero set it to 0xff -> check succeeds, value is set to 0xff by session 1
    Session 2: Set register X to point at latch address A
    Session 2: If value at address A is zero set it to 0xff -> check fails, session 2 does not modify the memory location A, the value has already been set to 0xff by session 1
    Session 2: If the value at address A is set to 0xff then you “own” the latch -> session 2 sees the value 0xff set by session 1 at address A and therefore concludes it owns the latch
    Session 1: If the value at address A is set to 0xff then you “own” the latch -> session 1 sees the value 0xff it previously set at address A and therefore concludes it owns the latch

    Now both session 1 and 2 believe they own the latch and can safely access the protected structure(s).

    I belief the logic would work by modifying the pseudo code at p. 67 as follows:

    Set register X to point at latch address A
    Set value at address A to 0xff and return previous value of memory location A into register Y ***
    If register Y contains 0 the current process set the value 0xff at address A and therefore “owns” the latch
    If register Y contains 0xff, the value has been written at address A by another process and the latch is therefore currently held by another process – go back to the top and try again for a couple of thousand attempts.

    As in your book, I used (***) to mark the line that has to be atomic. This is what the test-and-set instruction does on most CPUs according to my understanding.

    kind regards

    Comment by Martin Maletinsky — December 29, 2012 @ 5:38 pm GMT Dec 29,2012 | Reply

    • Martin,

      Thanks for the email.

      The point about the atomicity of the call “if value at address A is zero set it to 0xff” pre-empts the possibility of the collision occurring. If session 1 is executing the command, session 2 cannot even read the memory location until the call by session 1 is complete and the value has been changed.

      I avoided adding in the specific detail about swapping out the old value since I wasn’t sure that Oracle used used “test and set” on every port, and I wasn’t sure that that was how “test and set” was implemented on all chips. (The swap, of course, is a crucial part of the “compare and swap” that is commonly used for mutexes.)

      Comment by Jonathan Lewis — December 31, 2012 @ 10:57 am GMT Dec 31,2012 | Reply

      • Hello Jonathan,

        Thank you for your reply. I believe I initially misunderstood line 3 of the pseudo code which says “If the value at address A is set to 0xff then you “own” the latch”.

        I first understood it as “read the memory location A and if it contains value 0xff then you “own” the latch”, whereas now I believe it says “if the check in the line before succeeded and you therefore wrote 0xff at memory location A, you now “own” the latch”.

        thanks for the clarification
        kind regards

        Comment by Martin Maletinsky — December 31, 2012 @ 1:27 pm GMT Dec 31,2012 | Reply

        • Martin,

          You’re right – and I should have realised the risk of ambiguity when I wrote it: that’s the trouble with knowing what you mean to day, sometimes you don’t notice that you haven’t said it clearly enough.

          Comment by Jonathan Lewis — December 31, 2012 @ 2:24 pm GMT Dec 31,2012

  7. Hello Jonathan,

    I have two questions concerning the following statement on page 69: “The process actually starts by assuming that there are no competing readers, though, and its first action is to attempt to go straight to the holding value”.

    1) Do I understand correctly that this comments applies exclusively to the case where the latch is not held by any readers at the moment the process starts its attempt to acquire the latch (otherwise the reader count visible in the latch value indicates that readers are currently holding the latch and the process cannot go straight to the holding value, since that would overwrite the readers count).
    I.e. does the statement mean that when the process finds the latch value to be 0x00000000 the process will attempt to set it to bitand(0x20000000, process ID) straightaway rather than trying to set it to 0x40000000 first?

    2) Do you have any tests to verify this behaviour or is the statement based on internal knowledge of the Oracle code?

    thanks a lot
    kind regards

    Comment by Martin Maletinsky — December 29, 2012 @ 6:30 pm GMT Dec 29,2012 | Reply

    • Martin,

      On page 68, there’s a note thanking Andrey Nikolaev for reviewing this chapter. He supplied a number of corrections and enhancements to the detail, and I think this was one of them. Neither of us has access to the internal Oracle code, but he is very proficient, and very thorough, in his use of various tracing tools, and if I recall correctly this is a detail he was able to infer by using Dtrace on Solaris and examining the list of Oracle library calls.

      Andrey’s blog is listed in the list of blogs to the right of the page – and I see that he’s recently published an 82 slide powerpoint on the topicthat he presented at the Russian Oracle User Group three weeks ago, along with a number of test scripts. You’ll probably find the collection very interesting. You’ll also be interested in the July 2012 post that precedes that one as it describes a number of changes that appeared in a patch to

      Comment by Jonathan Lewis — December 31, 2012 @ 11:04 am GMT Dec 31,2012 | Reply

  8. Hello Jonathan,

    I have two questions related to different strategies which Oracle adopts for enqueues / enqueue resources on one side and for KGL locks and KGL pins / library cache entries on the other side.

    From what I understood KGL locks and KGL pins are mechanisms very similar to enqueues, the only conceptual difference is that they are attached to library cache objects rather than to enqueue resources (elements of x$ksqrs) and that different from enqueue resources, library cache objects only have a waiters and owners but no converters queue. On page 89 you mention that the similarity is so close, that the corresponding pictures would look the same (hence you only drew fig 4-5, which shows enqueues / enqueue resources).

    On page 90 you describe the memory allocation for KGL locks and KGL pins and how it caused fragmentation of the SGA in previous Oracle releases. Why is memory for KGL locks und KGL pins dynamically allocated and released on demand rather than using a (pre-allocated) array similar to x$ksqeq (and other enqueue structures)? Is there some additional difference between KGL locks and KGL pins which I missed to justify the different strategies for memory allocation?

    Is there any obvious reason why Oracle Corp. replaced the library cache latches by mutexes (as described in “Mutexes, Part 2” on page 90) but not the enqueue hash chain latches? What difference between the enqueue resources and library cache entries might explain this different approach? Is there maybe a different pattern of usage or a significant difference in quantities that could explain this difference?

    thanks a lot
    kind regards

    Comment by Martin Maletinsky — January 6, 2013 @ 7:03 pm GMT Jan 6,2013 | Reply

    • Martin,
      Thanks for your questions, I appreciate the trouble you’ve taken to work through the book so thoroughly – even though the resulting questions do force me to take some time to re-think some of my comments (and possibly correct some errors).

      i) I think there are probably many questions of this type where the answer is, in part at least, simply “history”. Unless my memory is wrong, the enqueue stuff appeared in version 6, but the major coding effort for sharing cursors – hence the shared pool etc. – appeared in version 7. Given the difference in timing it’s easy to imagine a different group working on the various different problems with the sharedd pool and coming up with the generic strategy for allocating memory and simply applying it to the KGL pins and locks; there’s also the possibility that someone looked at the enqueue strategy and saw that it made the DBA responsible for defining the number of KGL pins and KGL locks (particularly) at database startup time and saw this as a difficult problem.

      ii) I have no good ideas about this one. I could guess three possibilities (a) the question of scale is completely different, (b) don’t change code that isn’t causing a threat (c) you can’t do everything at once. In a similar vein, I have wondered if there are any plans to change the “cache buffers (LRU) chains” latches to mutexes

      Comment by Jonathan Lewis — January 12, 2013 @ 9:12 am GMT Jan 12,2013 | Reply

  9. Hello Jonathan,

    I have two questions related to the difference between latches and mutexes. On page 90 you write that the mutex has a “very short code path”, just before mentioning that there is one mutex per hash bucket while one latch was used to protect many hash buckets. On page 112 you write “a latch is a fairly large structure with a long code path, so we’d like to keep the number of latches to a minimum”.

    i) If I correctly understand “long code path” as “many CPU instructions are required to request, acquire and release a latch”, I don’t see how the length of the code path is related to the decision to use the respective serialization mechanism to protect a single item (mutex) or a set of items (latch). I’d even expect the opposite to be true, you are more inclined to avoid misses on latches (because of the cost you have to pay for every retry) and therefore rather use one latch per item.
    I understand from your statement on page 112 that a latch is large, i.e. needs more memory than a mutex (around 7 times as much according to Tom Kyte, “Oracel Databse Architecture”, p. 240). To me this seems to be a valid reason to be more generous when using mutexes compared to when using latches.
    Did I misunderstand you and you aren’t saying the length of the code path is the reason to use one mutex per item vs. one latch for several items or is there a rational for this claim which I missed?

    ii) Mutexes are less instrumented than latches (there aren’t that many statistics on the use of mutexes throughout the runtime of the instance as there are on the use of latches, e.g. V$LATCH_… views). Does the shorter code path entirely result from this fact or are there other major differences between mutexes and latches that significantly reduce the code path of mutexes?

    Thank you
    kind regards

    Comment by Martin Maletinsky — January 6, 2013 @ 8:17 pm GMT Jan 6,2013 | Reply

    • Martin,

      i) I think what you’re asking (rhetorically) and answering is the question: “why invent mutexes,, why not simply put a latch on every single item?”, and then arguing the case that the size of the memory structure would be the most significant reason for creating a new mechanism – and I agree; in fact I thought I’d made that point somewhere in the book, but perhaps I hadn’t. The thought behind the length of the code-path is that the bulk of the code exists to populate the parts of the latch structure that aren’t the “latch” itself. I’ll have to check what Tom says about the “7 times” – the size of the latch structure has varied dramatically over versions, and I’d have put the size difference closer to a factor of 25.

      ii) I think a signiicant part of the code path is about instrumentation (and, in fact, somewhere in 10g or 11g, Oracle took out some of the code to count the number of sleep cycles a latch had been through); but there are also parts of the code that populate the structures with details of who is using the latch and from where in their code path. This secondary code can be removed for mutexes because a held mutex implicitly idenfies the holder, and the state object for the holder holds the information for why and where it is being held.

      Comment by Jonathan Lewis — January 12, 2013 @ 10:13 am GMT Jan 12,2013 | Reply

      • Hello Jonathan,

        Thanks a lot for your answers and for the time you take to share your knowledge about Oracle. I have a follow-up question regarding your response (ii). What do you mean by “…a held mutex implicitly identifies the holder”? In the book you write (p. 91) “the session knows which mutexes it is holding, but the mutex doesn’t know which sessions are holding it”. Is that what you refer to when writing that the mutex implicitly identifies the holder?

        thank you
        kind regards

        Comment by Martin Maletinsky — January 17, 2013 @ 10:07 pm GMT Jan 17,2013 | Reply

        • Martin,

          My brain produced a technical glitch there – for some reason I was thinking only of a mutex being held exclusive. Since only one process can hold exclusive there’s enough space in the mutex itself to identify the SID of the holder.

          Comment by Jonathan Lewis — January 18, 2013 @ 9:37 am GMT Jan 18,2013

      • Hello Jonathan,

        Just one more word regarding the size of the latch structure. I noticed that on page 66 you write there are between 100 and 200 bytes of infrastructure and instrumentation (in addition to the memory location used for the state and session information). If this really is the entire size of a latch, it would mean that a mutex would only use between 4 and 8 bytes of memory (assuming the factor 25 you mention)?
        Also this would mean that the size of the latch structure is similar to the size of the buffer header (page 96). If this is the case, it seems very little to have only one latch per 32 hash buckets. With the number of hash buckets being twice the number of buffers, Oracle would therefore only allow 1/16 of the memory it uses for the buffer headers to be used for the latches protecting the hash chains – given the importance of concurrency I believe this is very little.

        Do you agree with this considerations or did I misunderstand some point?

        thank you
        kind regards

        Comment by Martin Maletinsky — January 23, 2013 @ 8:39 pm GMT Jan 23,2013 | Reply

        • Martin,

          I don’t think you’ve misunderstood any of the points. But I think there’s always the need to remember history. Some of the things that might look like unbalanced in terms of (say) CPU or memory utilisation may have seemed much more sensible 15 or 20 years ago; but the cost of change may override any consideration of change to a better strategy now.

          Having said that, I have only come across a couple of occasions where the number of cache buffers chains latches really did need to be made much larger (factor of 2 or 4) than their default – so the current choice doesn’t seem to have too much of a negative impact

          Comment by Jonathan Lewis — January 28, 2013 @ 6:13 pm GMT Jan 28,2013

  10. Jonathan,

    at the bottom of page 79 you say:
    “… only then can you attach yourself to the owners queue – provided the lock mode you want is compatible with the modes held by everyone currently on the owners queue”

    Could you elaborate on what exactly “lock compatibility” means?

    Comment by Wojciech — January 20, 2013 @ 2:16 pm GMT Jan 20,2013 | Reply

    • Wojciech,

      The lock mode can anything from level 0 to level 6 (see ), and there are rules about when it is legal for different sessions to lock the same object in different modes.

      For example, if you are locking a table in mode 3 (row-exclusive, probably updating a few rows) I cannot lock the same table in mode 6 (exclusive) until you have committed or rolled back. In the book example I would get to the head of the waiters queue, and have to wait for you (and any others updating the table) to get off the owners queue before I could join the owners queue.

      Comment by Jonathan Lewis — January 22, 2013 @ 12:14 pm GMT Jan 22,2013 | Reply

  11. Hi Jonathan,

    First let me congratulate you on this wonderful book. As many have pointed out, it requires several round of readings to get the internal workings and mechanism clear. But, it is worth putting these efforts.

    On the Exclusive and Shared Latch concept, as I understand, as the exclusive get is compatible with shared, the latch can be obtained in an exclusive mode. However, the work to be done in an exclusive mode will only be initiated once all the reader value is dropped to ZERO. In this case, while a process has obtained a latch or mutex (a session) in exclusive mode, it will wait until the count drops to ZERO. The session wating for the count drop however not wait for “Library Cache Mutex” wait. But, in the same scenario, if there is another session that requires the latch to be obtained in exclusive mode, since there is already a session that has set a write bit and is waiting for the reader count to drop zero, the new session has to wait on “Library Cache Mutex”. Also, the duration for this wait will become longer, if there are many readers holding the latch in shared mode. Am I correct ?


    Comment by Vicky — April 3, 2013 @ 2:01 pm BST Apr 3,2013 | Reply

  12. Hello Jonathan,

    I just finished reading chapter 7 which motivated me to revisit mutexes in chapter 4 (and to have a look at my previous questions / your answers relating to mutexes). In doing so, a new question regarding mutexes crossed my mind. On page 91 you write “Mutexes are small, and don’t have lists associated with them, we now have information pointing one way, the session knows which mutexes it is holding, but the mutex doesn’t know which sessions are holding it.”.
    My question is, what happens when a session is holding mutexes and terminates unexpectedly (e.g. somebody killing the OS process with kill -9 on Unix)? There must be a way to notice that the mutex is being held by a no longer existing session, otherwise e.g. a library cache hash chain might remain unaccessible for all other sessions throughout the lifetime of the instance which I believe would not be acceptable.
    So the question is where is the information stored that points from the session to the mutexes it holds? From my understanding of the Oracle memory achitecture, if it was in the PGA, it would not be accessible to other processes of the instance and therefore the information would be lost, once the session’s server process terminates. So I believe it must be in the SGA (or are there other, less known memory areas that are shared memory accessible to all processes of the instance)? Can you confirm (or rectify) this assumption and if it is in the SGA, do you know in which of the SGA components this information is stored?

    Thank you
    kind regards

    Comment by Martin Maletinsky — May 1, 2013 @ 4:45 pm BST May 1,2013 | Reply

  13. Hi Jonathan

    A small query. If a transaction hits a block which does not have free ITL slot to offer and gets blocked. What would be enqueue resource for this transaction for TX type lock ? Keeping in mind that this blocked transaction is going to hit a not-locked-in-any-sense row in the block.
    I guess logically it must not be transaction id of any of active transactions.

    Vineet Ranjan
    Adelaide, Australia

    Comment by Vineet Ranjan — June 30, 2016 @ 5:07 am BST Jun 30,2016 | Reply

  14. Hello Jonathan,

    I read chapter 4 again after a while and i came across a question I didn’t think of previously- as I did not read the entire book this time (but only parts of chapter 4) please excuse if the question is answered elsewhere in the book and please just give me a short hint where to find the answer in this case.

    As explained on page 71 ff. if a session unsuccessfully attempts to get a latch it links itself to a latch wait list and goes to sleep until woken. As the latch wait list is a linked list (I suppose) there needs to be yet another synchronization mechanism to avoid corruption of this latch wait list by concurrent access from different sessions.
    – What is this synchronization mechanism?
    – Are there any statistics and/or wait events reporting concurrent access to the latch wait list and delays which may result from this?

    thank you very much
    kind regards

    Comment by Martin Maletinsky — September 3, 2019 @ 10:15 am BST Sep 3,2019 | Reply

  15. Martin,

    There are two mechanisms,, depending on version of Oracle (and one of them disappeared by 10gR2, if not earlier).

    If you check v$latch you’ll find that there are two latches called “post/wait queue” and “latch wait list” – the latter was the one that disappeared by 10g.

    There are a lot of child latches named “post/wait queue”, though it’s not “one post/wait queue” child for each different parent latch, but there’s probably a hashing algorithm that says pick a “post/wait queue” child latch based on the name of the latch that a session has to wait for.

    If you want the most detailed and up-to-date information that’s available on the web it’s probably in a series of posts and pdfs on Andrey Nikolaev’s blog:

    Comment by Jonathan Lewis — September 3, 2019 @ 12:13 pm BST Sep 3,2019 | Reply

    • Hello Jonathan,

      Thank you very much for your answer.
      Do I assume correctly, that the post/wait queue latch is not shareable, i.e. can only be taken by one session at a time (is there any way to see if a given latch is shareable or not)?
      How does Oracle protect the latch wait list of the post/wait queue latch? Do the various post/wait queue latch children mutually cover each other with deadlocks being prevented by one session only holding at most one post/wait queue latch at a time?

      thank you
      kind regards

      Comment by Martin Maletinsky — September 3, 2019 @ 12:53 pm BST Sep 3,2019 | Reply

  16. Hello Jonathan,

    Once again I am re-reading chapter 4 (and I discover new thoughts each time I do this).

    This time it is two observations I didn’t think of before and which I would like to share (and to ask if I understood that correctly and if you made any further investigations into that).

    (i) Based on the description on page 68 it seems possible to me that under heavy load even readers could block readers, i.e. that you might observe sleeps on latches even if they are solely requested in shared mode.
    In my understanding this could happen due to the usage of the compare and swap operation as described on top of page 68
    Set flag F to zero
    Set register X to point to latch address L
    Set register Y to hold the current value stored at L *
    Set register Z to hold a new value you want to see at L
    If “value in Y” = “value in L” then set L to “value in Z” and set flag F to 1 ***

    If between the lines * and *** another reader acquired the latch (and thus incremented the value stored in L by 1) the check in line *** will fail (as the value in L is now different) and thus the latch acquisition fails due to another reader. If this happens repeatedly it could eventually lead to a sleep. This is probably the last scenario mentioned in table 4-3?

    (ii) As described on page 72 processes waiting for a latch are woken up one by one when a holder releases the latch. This can (again in a scenario where you have heavy load on one latch) serialize access to a resource protected by the latch (even between processes whose utilization of the resource would be compatible) in that you might have only readers in the wait list and still they will acquire the latch (and get access to the protected resource) only one by one due to the wakeup mechanism that wakes up only the one process at the top of the list.
    I can imagine that this is a reasonable trade-off as otherwise you might run into congestion when waking up multiple processes simultaneously who would than immediately compete for the latch and possibly causing the problem described in (i).

    thank you
    kind regards

    Comment by Martin Maletinsky — May 2, 2020 @ 12:49 pm BST May 2,2020 | Reply

    • Martin,

      Your suggestion is correct. Oracle has made all sorts of changes over the years to minimise “wasted” CPU and unnecessary sleeps on latches – including the introduction of mutexes, of course, (rapidly followed by variations in the back-off strategies for mutexes).

      The key problem is heavy load. Historically (ca. 8i) people made a big fuss about sleeps on latches when in fact that should have started worrying before that i.e. when “spin-gets” was high because that was an indicator that they were contending for latches spending more CPU doing a job than they would want to, and in the early days it wasn’t unusual to see cases where “the performance problem” was (indirectly) a latch problem that no-one could see because “there were no latch sleeps”.

      (Actually, in the old days there were lots of “indirect” CPU problems because of the “buffer cache hit ratio > 99%” mantra that left people burning up CPU on buffer visits which started with latch gets.

      Jonathan Lewis

      Comment by Jonathan Lewis — May 20, 2020 @ 11:23 am BST May 20,2020 | Reply

RSS feed for comments on this post.

Comments and related questions are welcome.

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Website Powered by

%d bloggers like this: