Oracle Scratchpad

January 16, 2009

Concurrency

Filed under: Infrastructure — Jonathan Lewis @ 7:00 pm GMT Jan 16,2009

When you get into big, busy, systems one of the final barriers you have to overcome is the concurrency issue; and after you’ve designed and fiddled and tweaked everything else it’s latch acquisition that is often the final barrier to extreme levels of concurrency.

If two people want to execute the same SQL statement there’s some interference in the library cache as they both hit the relevant “library cache” latch; if they both want to view the same data block there’s some interference in the buffer cache as they both hit the relevant “cache buffers chains” latch.

In 10.2, Oracle introduced their “mutex” mechanism to the library cache to reduce the contention and allow increased concurrency in the library cache.

In 11g, Oracle has introduced a new “consistent get” mechanism which reduces contention in the buffer cache.

If you check the statistics relating to logical I/O, you will find that there is a new “fastpath” mechanism for consistent gets. Here, for example, is a subset of the stats for a query that used to show roughly 100,000 “consistent gets” and 200,000 “buffer is pinned count” in 10g:


consistent gets                           2,886
consistent gets from cache                2,886
consistent gets from cache (fastpath)     2,301
consistent gets - examination               306
buffer is pinned count                    1,421

Most significantly, the dramatic drop in the number of consistent gets resulted in a drop from 200,000 to about 4,000 for the cache buffers chains latch, with a noticeable drop in the CPU usage.

I haven’t tried to figure out what mechanisms have been changed – possibly every buffer header has it’s own mutex; maybe buffers are copied to the session’s PGA if the circumstances seem appropriate.

For extreme systems this enhancement is worth testing carefully, and could be a good reason for thinking seriously about upgrading to 11g.

For more examples of the change, take a look at Alex Fatuklin’s blog.

9 Comments »

  1. Btw, in 11g most of the library cache latches are replaced with KGX mutexes as well…

    http://blog.tanelpoder.com/2008/08/03/library-cache-latches-gone-in-oracle-11g/

    Comment by Tanel Poder — January 18, 2009 @ 4:29 pm GMT Jan 18,2009 | Reply

  2. Hi Jonathan-

    Very insightful and this is a hot Oracle topic that I am writing about in my upcoming Oracle 11g book. I am looking for information on latches that are unique to RAC environments with Oracle 11g. Now the question for you is how does one solve latch contention issues in a busy production Oracle database? Most of the notes on Metalink point to tuning the shared pool or SQL to resolve most latch contention issues. Curious to find out what your research has been with most latch contention problems especially for 11G.

    Cheers,
    Ben

    Comment by Ben Prusinski — January 22, 2009 @ 9:19 pm GMT Jan 22,2009 | Reply

    • Ben,

      Generically, latch contention appears when processes do a very large number of very small jobs – so the generic solution to latch contention is to eliminate the “busy” code. Inevitably this means cut down on optimisation, execution, and buffer gets for most systems. If reduction isn’t possible, the increasing the number of latches to spread the load is the other alternative. After that there are only a few oddities which often can only be fixed by Oracle changing their code. (Look at the introduction of the touch count on the buffer cache as a perfect example of this, and the increasing use of mutexes in 11g).

      I haven’t yet come across any production issues where latching was the significant problem in 11g – although Alex Fatuklin has also pointed out in another of his articles (possibly on the Pythian blog) that there is a single latch protecting the result cache and that’s obviously a bad idea – expect it to change in a future release of 11g.

      Comment by Jonathan Lewis — January 30, 2009 @ 6:20 pm GMT Jan 30,2009 | Reply

  3. […] Infrastructure, Troubleshooting, Tuning — Jonathan Lewis @ 10:25 pm UTC Feb 1,2009 I wrote a little note recently about concurrency, thinking particularly about latching. I see that Chen Shapira has published an interesting note on […]

    Pingback by Concurrency 2 « Oracle Scratchpad — February 1, 2009 @ 10:26 pm GMT Feb 1,2009 | Reply

  4. Hi Jonathan,

    From your book, Practical Oracle8i page 26, I was bewildered with number of logical I/O of “update a row” operation especially “Index” section. How did you come across with “3+3” ? (Although you described on second paragraph with 4 for index entry change, plus 4 for where the changed entry should go).

    Secondly, will this mutex change the logical I/O estimation ?

    Comment by Bundit — June 25, 2009 @ 8:51 am BST Jun 25,2009 | Reply

    • Bundit,

      Thanks for the note. The error is in the “four plus four” in the earlier paragraph, which should be “three plus three”. The idea is that to “update” an index entry, you find the entry for the old value to mark it as deleted then find the place to put the entry for the new value . Each search goes: “root, branch, leaf”.

      Regarding mutexes – the code keeps changing, and there are different codepaths for accessing data blocks, and different pinning strategies, so even when I wrote this chapter it was just a simple approximation.

      Since I haven’t checked why the massive change in “gets” appeared in my example (and the mutex idea was only a “maybe”) I wouldn’t like to guess how much impact the change in code might have in the general case and what a more refined estimate would look.

      Comment by Jonathan Lewis — June 26, 2009 @ 7:19 am BST Jun 26,2009 | Reply

  5. [blockquote]I haven’t tried to figure out what mechanisms have been changed – possibly every buffer header has it’s own mutex; maybe buffers are copied to the session’s PGA if the circumstances seem appropriate.[/blockquote]
    Jonathan,

    do you happen to find out any details on this topic? It seems like Oracle does some optimizations in this area “silently” without externalizing it in session statistics counters, and it’s not helpful at all.

    Comment by Timur Akhmadeev — November 26, 2010 @ 6:37 pm GMT Nov 26,2010 | Reply

    • Timur,

      Nearly missed this one. Sorry, no extra information yet – this type of investigation usually goes on a todo list and only gets unearthed when I see a possibly related problem at a customer site.

      Comment by Jonathan Lewis — November 27, 2010 @ 7:05 pm GMT Nov 27,2010 | Reply

  6. […] that this optimization has already been mentioned several times, but there was always some confusion so far whether this optimization was related to another […]

    Pingback by Logical I/O Evolution « Ukrainian Oracle User Group — August 13, 2011 @ 1:28 am BST Aug 13,2011 | Reply


RSS feed for comments on this post. TrackBack URI

Leave a reply to Jonathan Lewis Cancel reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Website Powered by WordPress.com.