Oracle Scratchpad

May 10, 2008

CPU usage

Filed under: Performance,Troubleshooting — Jonathan Lewis @ 10:32 am BST May 10,2008

I made a few comments recently on a post in the Oracle Forum that raised the issue of CPU usage and how busy a CPU can get.

Shortly afterwards my blog dashboard showed a couple of incoming references from a blog entry that Chen Shapira had made about my comments. Her blog had received a couple of follow-up comments (from Nuno Souto, among others) that were worth collating, and also prompted me to demonstrate how nasty things can get if you start to work very hard in a very small area of the database.

If you follow the blog trail, you’ll see that I’ve mentioned a query that jumped from 15 seconds of CPU usage to 45 seconds – purely because of the effect of spin_gets on latches.

If you want to try the same test on different platforms and different versions of Oracle the code to build the test case is shown below. (It’s based on a simple script I wrote many years ago to demonstrate how pointless it was to depend on the buffer cache hit ratio as a source of meanigful information – subsequently hi-jacked by Connor McDonald for his ‘pick a hit ratio’ procedure).

rem
rem     Script:         kill_cpu.sql
rem     Author:         Jonathan Lewis
rem     Dated:          March 2001
rem

alter session set "_old_connect_by_enabled" = true;

create table kill_cpu_&1(n primary key)
organization index
as
select
        rownum n
from
        all_objects
where
        rownum <= 25 -- > Adjust number (slightly) to suit
;

pause Press return

set timing on

spool latch_test_&1

select
        count(*) X
from
        kill_cpu_&1
connect by
        n > prior n
start with
        n = 1
;

set timing off
spool off

A couple of notes:

The code expects an input parameter; uses that parameter as part of the table name it creates and queries, and as part of the name of an output file.  This means you can run multiple copies of the job against the ‘same’ table, although each copy will drop and recreate the table created by the previous copy.

To ensure that no query starts until all the copies of the script have dropped and recreated their tables, I have a ‘synchronisation’ step in the code.  The code above shows a simple “pause” – I’ll be posting a note some time in the future about the method I usually use.

Something else I’ve omitted from the sample is the code I use to take snapshots of various dynamic performance views to see where the work goes. Looking at v$session_event, v$mystat (your session’s slice of v$sesstat), and v$latch is very informative.  (In 10g, watch out for  variations in results that depend on whether or not your table happens to collide with the “simulator” latches).

Warning – on my laptop with a Turion dual-core CPU running at 1.6Ghz, the query (with rownum <= 25) ran for 91 seconds under 10.2.0.3. The run time will double for every row you add to the index organized table (IOT) that I’ve used to hold the driving data.

As a flavour of what you might see, here are a few results from my (dual-core) laptop; the first set compares a few versions of Oracle, the second set compares a few scenarios in the same version:

  • 8.1.7.4: Single run 59.35 CPU seconds
  • 9.2.0.8: Single run 64.56 CPU seconds
  • 10.2.0.3: Single run 91.5 CPU seconds

Now trying to cause collisions (using 9.2.0.8, which was the worst case by a significant margin):

  • Single run: 64.56 CPU seconds
  • Two concurrent runs, using different tables: 70.5 CPU seconds each
  • Two concurrent runs against the same table: 177.8 CPU seconds each, plus 7.7 seconds each on “latch free” waits.

The extra CPU time for the last test was basically the consequence of 33 million misses and spin_gets (out of 67M gets) for the cache buffers chains latch. The latch sleep time, by the way, came from 1,500 sleeps (each).

Moral: Although the test is deliberately designed to exaggerate the issue, latch sleeps aren’t necessarily your biggest performance problem when your system is running with a high degree of concurrency – misses and spin_gets can have a significant impact on your CPU usage: and the performance impact can be exacerbated if your basic CPU load is high.
Footnote: Here’s a link to an interesting little note from Hermant Chitale about CPU disappearing “outside” Oracle.

Update – June 2011

Following a question on OTN about creating waits for “Latch: cache buffers chains”, I’d like to point out that you can use the code, with minor changes, to demonstrate that problem. Simply create a single table (chop the _&1 of the end of the table name) once, and then run several sessions that query the same table. They will all be competing to access one block millions of times.

It’s quite instructive to see the way the number of “session logical reads” and latch collisions vary with versions of Oracle as the internal code changes. It might also be interesting to see how different platforms (or even just differences in CPU speeds) cause changes in results when using the same version of Oracle.

11 Comments »

  1. Hi Jonathan. Why the longer execution times as the release gets more recent? Has the spin_count changed that much or is there another effect at play?

    Comment by Noons — May 12, 2008 @ 9:46 am BST May 12,2008 | Reply

  2. Noons,

    I’ve no idea. Obviously we can expect the code path to get longer on each release – 9i, for example, adds in the code to update the v$segstat – and I am doing something that is designed to emphasize the overheads on buffer visits. But I saw no reason for the extreme change in 10g – possibly Oracle has implemented a new latch mechanism on this platform at this release.

    In fact, in 11.1.0.6 the time dropped back to 67 CPU seconds. But in this case the statistics had also changed dramatically, so I saw:

    8M latch gets instead of 32M
    8M consistent gets instead of 16M

    and the latter were mainly reported as

    8M consistent gets from cache (fastpath)

    Comment by Jonathan Lewis — May 12, 2008 @ 11:12 am BST May 12,2008 | Reply

  3. Jonathan,

    Thank you so much for the excellent example and clear explanations.

    Comment by prodlife — May 12, 2008 @ 10:13 pm BST May 12,2008 | Reply

  4. Very Nice article Jonathan..We will be waiting eagerly for the code you use :)

    Meanwhile I feel that reduced Latch gets could be due to use of Mutexes. Not sure if some of Database buffer cache latches has been modified to use Mutexes (Similar to Library Cache latches and Pins)

    Comment by Amit — May 13, 2008 @ 3:47 am BST May 13,2008 | Reply

  5. >>Two concurrent runs against the same table: 177.8 CPU seconds each, plus 7.7 seconds each on “latch free” waits.

    …I just ran this on a 2s8c Xeon Clovertown box with Linux 2.6.18 and 11.1.0.6.1 and it gets 12.6 seconds with one invocation…2 concurrent same table is 12.68.

    …a lot of what you are seeing is due to the fact that the turion can spin its way through the test and set ops quite fast (because that is all L2 spinning), but the load and store ops required for the latch get/work/release cycle is way off balance due to the (likely) poor memory latency of the laptop…just a way off-balance test I think…either that or it is an oddity of the Windows port (which I know little about)…what I’m trying to say is that one really shouldn’t see that many wait gets spin off to sleep…even on a 2 CPU laptop.

    Comment by kevinclosson — May 14, 2008 @ 12:10 am BST May 14,2008 | Reply

  6. Kevin,

    Thanks for the feedback – but could I just confirm that you didn’t manage to lose the line:

    alter session set "_old_connect_by_enabled" = true;

    When I take that out, my times on 11.1.0.6 are: 21.8 seconds for a single table, and 21.8 seconds for using the same table in two concurrent executions.

    In my concurrency test with the “old connect by”, 11i went from 67 CPU seconds (single) to 75 CPU seconds (2 x same table).

    The concurrency test I showed (9i) was the worst case one (a comment I’ve just added to the note) – and I picked that one because I’ve also demonstrated it on 9i running on HP-UX 11.00, whereas the 8i test showed latch sleeps. (I never got around to installing 10g on my old HP D370).

    One random thought – is there any chance that the variation across versions might be related to changes in options for latching through “compare and swap” rather than “test and set” ?

    Comment by Jonathan Lewis — May 14, 2008 @ 5:33 am BST May 14,2008 | Reply

  7. Kevin,
    Trying to eliminate a few options – I ran up the same test on my old RAC stack (using 10.2.0.1 linked in single-instance mode) under the Linux that was the one that the Oracle “build your own RAC” was keen on a couple of years ago. (A clone of Redhat 4.2, I think).

    Intel 830 (Pentium 4) running at 2.1Ghz with 1Mb L2 cache I think, and DDR2 memory clocked at 533MHz (in case that’s significant)

    Single run: 68 CPU seconds.
    Two runs against same table: 90 / 92 CPU seconds (7.6M latch misses).
    Running against two tables: 74 / 77 CPU seconds. (No latch anomalies)

    I’m not keen on the variation, between concurrent runs – I’d have to guess it was some CPU scheduling issue; and it was pretty consistent.

    I think this absolves the Windows port.

    Comment by Jonathan Lewis — May 14, 2008 @ 9:36 am BST May 14,2008 | Reply

  8. Amit,

    Nice thought. I don’t know the answer, but I don’t believe that Oracle uses mutexes on buffer headers.

    Comment by Jonathan Lewis — May 15, 2008 @ 6:03 pm BST May 15,2008 | Reply

  9. Jonathan,

    You are right…I flubbed the alter session set “_old_connect_by_enabled” = true…

    With that in place I see 45.9 with a single stream and two-streams-same-table
    come in at 57.2/57.4

    As for the variant kslget implementations (e.g., t,t&s versus compare and swap) really
    don’t make that much difference. In both cases they are looped and susceptible to
    external interference such as interrupts in the control code (loop). These simple
    spinlocks (latch) make too many presumptions such as the notion that the HOLDER of the
    lock is actually on a CPU or that allowing any interested party to acquire the lock
    is the best for the overall system…but, in the end, they do work pretty well with
    today’s hardware…I suspect they’ll deserve a closer look as core count goes to 8 and
    beyond though

    Comment by kevinclosson — May 15, 2008 @ 6:46 pm BST May 15,2008 | Reply

  10. […] (new style)) by executing a long-running, CPU-intensive query (such as the code from my “kill CPU” routine) in one session, then monitoring it from […]

    Pingback by CPU used « Oracle Scratchpad — May 26, 2009 @ 11:31 am BST May 26,2009 | Reply

  11. […] “First: check the following simple example of how wrong you can be in saying {‘using’ all of your CPU is a good thing} especially in a multi-user, shared […]

    Pingback by Faulty Quotes 6 – CPU Utilization « Charles Hooper's Oracle Notes — February 5, 2010 @ 6:03 am GMT Feb 5,2010 | Reply


RSS feed for comments on this post. TrackBack URI

Comments and related questions are welcome.

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Website Powered by WordPress.com.