Oracle Scratchpad

October 19, 2007

SSD

Filed under: Performance,Statspack,Troubleshooting — Jonathan Lewis @ 7:53 pm BST Oct 19,2007

Sometimes you think that SSD (solid state disk) is bound to speed up Oracle performance. But it’s surprising how wrong you can be – it’s not guaranteed to give you great performance.

Kevin Closson has written various pieces in the past about the log writer and how it behaves, and how it is sometimes too easy to be distracted by the idea of solid state disc (see for example this piece); but I was still quite astonished when I came across my first really exteme case of “RAM-only”  device not being good enough. Take a look at this extract from a 9.2.0.6 statspack report:

 
Load Profile 
~~~~~~~~~~~~                            Per Second       Per Transaction 
                                   ---------------       --------------- 
                  Redo size:          1,545,217.57             24,437.72 
              Logical reads:            457,631.52              7,237.47 
              Block changes:             42,431.65                671.06 
             Physical reads:              7,096.97                112.24 
            Physical writes:              2,770.65                 43.82   

                                                                   Avg 
                                                     Total Wait   wait    Waits 
Event                               Waits   Timeouts   Time (s)   (ms)     /txn 
---------------------------- ------------ ---------- ---------- ------ -------- 
log file parallel write           203,762          0        716      4      0.9     

So what do you think is at the end of those 3.5 m/s log writer writes ?

The answer isn’t brown and rusty. It’s a LUN carved out of the SAN cache at the end of a fibre channel link – pure WOM (write-only memory), it’s never destaged and it’s not even being copied to an archive destination: and it’s still 3.5 m/s per write. But as you can see, that’s a busy bit of fibre.

(I believe the site shares batteries with the Energizer Bunny).

26 Comments »

  1. Hi Jonathan,

    True, many people just expect the write cache access time to be negligible, which it is not. Having said that, this is really a bad response time for a “standalone” (isolated) storage array cache. 0,5 +/- 30% would be the write cache access time baseline on all boxes I worked on (HP/HDS/EMC/SUN STK). Yes indeed, that’s “a bit of a busy fibre”: I guess some other factors such as ISL oversubscription on the SAN side, hosts ports heavy utilization or synchronous storage array replication would explain the figures.

    : Some tests I made at a customer site to compare log file write time starting with a “no replication” baseline and contrasting it with software replications (vxvm or asm) and SRDF (EMC’s DMX replication): http://christianbilien.wordpress.com/2007/06/19/spotlight-on-oracle-replication-options-within-a-san-22/

    Cheers

    Christian

    Comment by christianbilien — October 20, 2007 @ 7:46 am BST Oct 20,2007 | Reply

  2. Christian,

    Good link – and one of the reasons I put your blog on my blogroll.

    Of course you’ve hit the nail on the head – it may be “fast because it’s memory”, but there’s all the infrastructure around as well, including the possible impact of forwarding the write to a remote site.

    Memory is fast, but all the ways you get at it, protect it and share it, give you all sorts of opportunities for slowing things down. The fact that this is memory at the end of a network gives you several places to create queues and contention.

    The fact that the machine is incredibly busy may even mean that the writes are actually much faster that reported – but it takes the log writer a (relatively) long time to notice that the writes are complete.

    Comment by Jonathan Lewis — October 20, 2007 @ 1:48 pm BST Oct 20,2007 | Reply

  3. > … but it takes the log writer a (relatively) long time to notice that the writes are complete.

    In fact that’s what the piece by Kevin Closson might suggest for this case.

    To recap, ‘log file parallel write’ is timed this way (as any event is):
    a) start_time = get current time
    b) issue write call(s) to the redolog file(s) (here, the SSD)
    c) end_time = get current time
    d) wait time = end_time – start_time

    when the system call(s) b) complete(s), the process is scheduled for running (i.e. is put on the ready-to-run queue of the OS scheduler) but it might not get on the CPU immediately if there are other processes running.

    How much this impacts the measured wait time depends (heavily) on the OS/HW type (and how it is configured), but it could be in the order of milliseconds for sure (so comparable of even more than the actual write time on a “perfect” SSD).

    Simplistic calculation – say each process is almost always able to use its time slice (quantum) since it performs only a few disk accesses or other system calls (as it could be the case here given the StatsPack fragment). For a quantum of 10msec, low CPU utilization U (almost no queueing), the probability that after b) another process is running is U, the average wait on the ready-to-run queue is (without preemption) (10/2)msec, hence the contribution to the wait time is U * 5 msec. For U=20%, that would be 1 msec.

    Of course in real life … it’s more complicated, but this simplistic exercise should capture the order of magnitude (milliseconds) in the “general” case (“general” as in “instance not optimized for the SSD special case”).

    Comment by Alberto Dell'Era — October 20, 2007 @ 10:01 pm BST Oct 20,2007 | Reply

  4. Hi Jonathan,

    I felt like I should comment on this post as it didn’t sound like you were aware that a RamSan is a particular product, not a category label.

    There is a large amount of variability in the performance the cache access time in arrays and in different solid state disks vendors product. The latency of a memory system is largely determined by the cache controller architecture, IO controller architecture, and the amount of logic that lies between the IO interface and the memory. And, of course if array based replication is enabled, then the timing can increase dramatically.

    Memory has nanosecond latency for an access and no one comes close to that access time in the SSD space or the array cache case, as the other factors in the “storage system” design add most of the delay. There is not a good way to predict a system’s memory latency without testing it. Solid State Disks tend to have much lower latencies than array caches, as the engineering decisions are made with latency keenly in mind. For array cache other factors drive engineering decisions and a lot of extra logic has to be traversed (cache logic, replication features, etc) to get to the memory and as long as the cache is quite a bit faster than disk accesses, it will still make a difference.

    The RamSan is a particular product model from Texas Memory Systems (with whom I am employed), and it is not accurate to consider a LUN carved out of cache as equivalent. For an example of the performance of the RamSan, here is a section of a Oracle 10 statspack for a insert and commit redo log stress test. For ten minutes in this interval all that was done was insert and commit, insert and commit.


    Load Profile
    ~~~~~~~~~~~~                            Per Second       Per Transaction
                                       ---------------       ---------------
                      Redo size:          5,168,779.51                511.66
                  Logical reads:             36,451.85                  3.61
                  Block changes:             40,490.35                  4.01
                 Physical reads:                  1.02                  0.00
                Physical writes:                194.54                  0.02
                     User calls:                  0.10                  0.00
                         Parses:                  1.37                  0.00
                    Hard parses:                  0.00                  0.00
                          Sorts:                  0.59                  0.00
                         Logons:                  0.03                  0.00
                       Executes:             10,104.30                  1.00
                   Transactions:             10,102.02

                                                                     Avg
                                               %Time  Total Wait    wait     Waits
    Event                               Waits  -outs    Time (s)    (ms)      /txn
    -------------------------- -------------- ------ ----------- ------- ---------
    log file parallel write         4,012,597     .0         429       0       0.7

    Note that the Avg Wait is rounded down, the actual time is 429s/4,012,597 = 0.107 ms

    -Jamon Bowen

    Comment by Jamon Bowen — October 24, 2007 @ 6:46 pm BST Oct 24,2007 | Reply

  5. Jamon,

    Thank you for your posting. To avoid any confusion, I’ve changed “RAM SAN” in the original posting to “RAM-only device”. I’ve also re-formatted your sample output to make it more readable.

    Your comments about controllers and logic are particularly pertinent. Many people forget that there are (many) layers of logic involved and that software takes time.

    Out of idle curiosity, I repeated your experiment on my laptop. (Single 7,200 rpm SATA drive, I think; running XP Pro). The results were a little surprising.

    With only a single drive, and since I didn’t want to create large log files, I just ran an “insert/commit” loop 44,000 times, and took snapshots of v$system_event and v$sysstat. The following are the most significant numbers:


    ---------------------------------
    System Waits:-  25-Oct 10:18:03
    Interval:-      19 seconds
    ---------------------------------
    Event                        Waits  Time_outs    Csec   Avg Csec
    -----                        -----  ---------    ----   --------
    log file parallel write     44,001          0   1,051       .024
    log file sync               44,000          0   1,384       .031

    ---------------------------------
    System stats:-  25-Oct 10:18:03
    Interval:-      19 seconds
    ---------------------------------
    Name                               Value
    ----                               -----
    user commits                      44,000

    CPU used by this session             564
    DB time                            1,950

    redo writes                       44,001

    commit immediate performed        44,000
    commit wait performed             44,000

    execute count                     44,007

    It looks to me like Windows isn’t doing quite what we expect – and that might be a problem (and maybe part of the reason why 11g introduced “lost write protection”). The calls to lgwr clearly happened – and the writes were reported. But I don’t think those write times can be real – not when they’re only 2.24 times as slow as your RamSan.

    Comment by Jonathan Lewis — October 25, 2007 @ 12:11 pm BST Oct 25,2007 | Reply

  6. Probably your SATA drive has some on-board RAM cache, so that even if Windows honours the “sync” directive (i.e. it doesn’t buffer the write using OS memory), the write gets buffered by the drive cache, and so returns very quickly (a directly-attached SATA drive should have very very low “network” latency).

    Of course if the drive cache is not able to survive a power failure, I would expect a robust OS to know that, and configure the drive accordingly (disabling the cache) for “sync” writes. I wonder (as in “I don’t know”) whether Windows is able to do that, and/or if the drive publishes to the OS its caching capabilities, or publishes that an on-board cache exists at all.

    But that looks like a memory write, especially since a 7500 rpm disk has a revolution period of 8msec, so an average revolution latency of 4msec – so that an 0.2389msec average write can’t happes through the disk heads.

    Comment by Alberto Dell'Era — October 25, 2007 @ 7:51 pm BST Oct 25,2007 | Reply

  7. “But that looks like a memory write, especially since a 7500 rpm disk has a revolution period of 8msec, so an average revolution latency of 4msec – so that an 0.2389msec average write can’t happes through the disk heads.

    Why not ? Rotational delay happens only when you switch from the current track to another track. So those 4 msec do not happen on every write, but only when you need to switch the read/write head to another track. So assuming the advertized sustained transfer rate(which includes the rotational and positional latency) of about 50MB/s for a 7200 rpm disk(Seagate Momentus 7200), you could write about 12 K of data during those 0.24 msec.

    However, since we do not know what actual block sizes were the lgwr was writing to the disk through Windows calls or whether the disk driver coalesced multiple sequential IO requests into one, we do not know how realistic 0.24 msec are !

    Comment by Val Carey — October 26, 2007 @ 2:19 pm BST Oct 26,2007 | Reply

  8. > Rotational delay happens only when you switch
    > from the current track to another track.

    That would be “seek” delay(latency), not rotational.
    Even if the head doesn’t need to move, the disk still needs to wait that the physical sector being written arrives under the head.

    Comment by Alberto Dell'Era — October 26, 2007 @ 3:35 pm BST Oct 26,2007 | Reply

  9. Your disk mental picture is too simple. The disk can have more than one platter in which case one switches the heads and the rotational delay occurs; when one moves to another cylinder, both the positional delay and the rotational delays happen. In the disk performance literature, “head switching” means changing heads within the same cylinder.

    Again, rotational delay does not necessarily takes place with every write operation as you seem to imply in your calculations, it all depends on the IO pattern. If the application and the OS (the driver and such) are fast enough to feed the data quickly enough, as is often the case with sequential operations, no rotational or positional delay will happen as long as the head stays on the same track obviously. The SRT (sustained transfer rate also known as sequential transfer rate) the manufactures advertise takes into account both rotational and positional delays. The raw disk bits transfer rate is called media or internal transfer rate.

    Comment by Val Carey — October 26, 2007 @ 4:26 pm BST Oct 26,2007 | Reply

  10. [...] piece on Jonathan Lewis’s Oracle Scratchpad discusses the performance of solid-state disc in Oracle. “Sometimes you think that SSD (solid state disk) is bound to speed up Oracle performance. [...]

    Pingback by Pythian Group Blog » Blog Archive » Log Buffer #68: a Carnival of the Vanities for DBAs — October 26, 2007 @ 6:19 pm BST Oct 26,2007 | Reply

  11. Val, we are discussing LGWR writes here in the context of Jonathan’s “insert/commit” loop :
    a) server process commits, posts LGWR
    b) LGWR issues a write request for k bytes at position N
    c) write request acknowledged, LGWR posts the server process
    d) server process commits again, posts LGWR
    e) LGWR issues a write request for k2 bytes at position N+k
    f) back to (c) …

    The disk rotates while (c) and (d) are being performed, (c) + (d) cannot take zero time (say they take m milliseconds, probably much less than 8 milliseconds in Jonathan’s scenario but definitely at least a fraction of millisecond at best), and since bytes N+k-1 and N+k are adjacent on disk with probability near to 100%, the disk needs to wait (8-m) milliseconds before starting honoring (e), so the rotational delay is going to be (8-m) milliseconds.

    In the marginal case when N+k-1 and N+k are not adjacent – *add* to (8-m) the head seek time (aka “the time it takes to the ‘actuator arm’ to move on the other cylinder”) only if they don’t happen to be on the same cylinder, since selecting the head serving another cylinder is just a matter of switching a couple of transistors. Actually, if head seek time is less than (8-m) milliseconds, you get it for free, and that should be the case if the OS was clever enough to place the redo log file on contiguous cylinders (so that the head needs to move by a fraction of a millimeter only).

    Comment by Alberto Dell'Era — October 26, 2007 @ 6:57 pm BST Oct 26,2007 | Reply

  12. Alberto, I agree with you general analysis.

    I ran a similar test on my laptop, and my times are 10 times higher than Jonathan’s but still too high to explain the absence of the rotational delay:


    declare
      i integer default 44000;
    begin
      loop
        insert into t1 values(1);
        commit;
        i:=i-1;
        exit when i = 0;
       end loop;
    end;

    --
    select event, average_wait from v$system_event where event = 'log file parallel write';


    log file parallel write 0.24

    I agree that the disk must have some write cache on board, just as you mentioned earlier, in which case the time between (c) and (d) is so small that a new request can come in while the old is still being executed from the write cache so that the head stays in the correct position for the next write.

    Comment by Val Carey — October 26, 2007 @ 9:15 pm BST Oct 26,2007 | Reply

  13. Val,

    The lgwr write size is always a multiple of the sector size. In my test case it was one sector (512 bytes).

    Radial delay times – one thought about that is that the typical 4 m/s quoted as a standard guideline is the average delay, and is half the traversal time from inner to outer cylinder. But I have also seen disk specs quoting a “single step” seek (i.e. next cylinder) of 0.4 m/s – presumably there is a movement time and a (relatively large) settle time.

    Rotational delay – it’s about 30 years since I wrote a disk driver (for a 5.25″ 360KB floppy drive on an Apple II !) but I remember the trick of using sector interleaves; meaning that sector 2 is a few sectors round from sector 1, sector 3 is a few away from sector 2 and so on. This might help to explain the different write performance of the log writer on different devices. If my sector interleave was large enough, the time between commits might be so short that the next ‘logical’ sector lined up “just in time” for the write, even though a couple of physical sectors had been passed.

    One other detail to consider: in my initial test, I did not duplex my log files. When I created two log files per log group the log file parallel write time jumped from 0.24 m/s to 0.94 m/s: still pdq and (I would say) an indication that caching, rather than lucky synchronisation, is the most likely explanation. I wonder if the disc has a micro-battery (capacitor) device that allows it so survive long enough to empty the cache in the face of a power outage.

    Comment by Jonathan Lewis — October 26, 2007 @ 9:44 pm BST Oct 26,2007 | Reply

  14. Jonathan,

    Here’s what happens when you disable the on-board cache:

    user commits 132079 (ran 3 times)

    log file parallel write 9.6
    log file sync 5.97

    To disable the cache: Control Panel->System->hardware->Device Manager-> Disk Drives->[your disk]->Properties->Policies->uncheck “Enable write caching on the disk”.

    They do say that enabling the cache can cause data loss if power outage occurs. Alas, apparently no battery backup ;) Interestingly, at least on my laptop, the cache was enabled by default.

    Comment by Val Carey — October 26, 2007 @ 11:11 pm BST Oct 26,2007 | Reply

  15. Val,

    Thanks for that. After disabling cache my laptop now averages 11.8 m/s per log file parallel write (thanks, in part, to some interference from control file reads and writes).

    Comment by Jonathan Lewis — October 27, 2007 @ 9:34 am BST Oct 27,2007 | Reply

  16. Val, very interesting!

    I’ve found this interesting article (for SqlServer, but SqlServer’s usage of the disks should be very similar to Oracle’s – e.g. AWL is similar to Oracle’s “write datafile blocks only after the redo protecting them has been synced on disk”):

    http://support.microsoft.com/kb/230785

    Especially relevant for this thread:
    “Many disk drives (SCSI and IDE) contain onboard caches of 512 KB, 1 MB, or larger. However, the drive caches usually rely on a capacitor and not a battery-backed solution. These caching mechanisms cannot guarantee writes across a power cycle or similar failure point. They only guarantee the completion of the sector write operations”

    So, I think I’m going to disable Write Caching on all my Windows instances, even if they are just development ones – unless I’m 100% confident that the on-board disk caches are battery-backed.

    Further, strange that this potentially fatal issue about Write Caching is not mentioned in the 10gR2 installation guide (or at least I haven’t been able to find it). 10gR2 RAC installation guide does mandate to disable write caching:

    http://download.oracle.com/docs/html/B13829_01/toc.htm#BACEACCI

    but only for “shared disks” – I wonder if I’m missing something, since on-board disk caches sit in front of the physical disk, so it can’t be an issue of “one instance might not see bytes written by the other”.

    Comment by Alberto Dell'Era — October 27, 2007 @ 10:49 am BST Oct 27,2007 | Reply

  17. Here’s an interesting side effect. With caching disabled it takes about four minutes for my laptop to hibernate. With caching enabled it takes about 30 seconds.

    Comment by Jonathan Lewis — October 27, 2007 @ 4:28 pm BST Oct 27,2007 | Reply

  18. “So, I think I’m going to disable Write Caching on all my Windows instances, even if they are just development ones – unless I’m 100% confident that the on-board disk caches are battery-backed.

    That’s a brave decision, I hope your users will have mercy on you after they have experience potentially severe performance problems ;)

    Comment by Val Carey — October 27, 2007 @ 5:30 pm BST Oct 27,2007 | Reply

  19. “With caching disabled it takes about four minutes for my laptop to hibernate. With caching enabled it takes about 30 seconds.”

    It matches the simple benchmarking I’ve just run. On my laptop, with WCE (write cache enabled) I get 20MB/s throughput for 64K OS-cache-bypass writes; without WCE I get only 3.6 MB/s.

    I wonder how Linux running on a laptop handles WCE.

    Comment by Val Carey — October 27, 2007 @ 6:03 pm BST Oct 27,2007 | Reply

  20. Val, the only users are the dev Oracle instance, they don’t commit frequently, and the buffer cache already makes for a nice safe write-back cache; those electronic users will suffer a lot if they can’t open their beloved databases due to a “stuck recovery” error (caused by a data block being physically written from the on-board disk cache before the redo protecting it, and a power outage happening in-between) or simply a corrupted block of, say, sys.tab$ (caused by the db_block_size being greater than the typical sector size, 512 bytes, which is the only thing of which the disk drive guarantees write atomicity).

    Jonathan, I was surprised initially since an hibernate dump of the system RAM should provide a continuous stream of sequential bytes, and sector interleave should cater for the small time it should take Windows to initiate the next sequential write. But I have researched a bit and discovered that XP makes compressed writes, so probably the time it takes to compress the next write is enough to make the disk spin over the next interleaved sector.

    Comment by Alberto Dell'Era — October 27, 2007 @ 6:51 pm BST Oct 27,2007 | Reply

  21. Alberto, Good thought – I only got as far as the “surely it’s a stream”.

    Val, I was assuming that the dump would be a screamingly fast sequential dump – so I was expecting the cache to be irrelevant after the first couple of seconds even if it were on. But a small lag for compression would explain why my hibernate results were consistent with your log writer results.

    Given the time I spend getting on and off trains, planes and whatever the third one was, I think I’ll have the cache ON most of the time, and remember to switch it off for critical collision tests.

    Comment by Jonathan Lewis — October 27, 2007 @ 7:10 pm BST Oct 27,2007 | Reply

  22. To add some more statistics regarding RamSan disks. I had some experience with them.

    From that project, we could achieve 0.5 ms write for about 13K write for LGWR (8K block read if interesting was about the same 0.4-0.5 ms). Response time wasn’t impacted by number of IOs but had some correlation with IO size. It was HP-UX PA-RISC platform and we used Oracle 9i with Veritas storage stack (and it was couple years ago as well). We used host based mirroring across two RamSan boxes and without it – same response time.

    To be fair, we allocated redo logs from 4 way striped volume from EMC and we had about 0.7 ms write time but (and here goes a big BUT) we observed some instability and as far as I could recall it was due to cache saturation in some circumstances.

    In more recent experience, during one RAC load testing project, I used RAM disk mounted by RAC nodes via NFS and few kilobytes writes and reads were stable in the range of 0.3 ms. I was quite pleased with such NFS performance. To be fair, it wasn’t a regular 1 GB Ethernet – proprietary high-speed “fabric” as vendor calls it.

    I couldn’t get 0.1 ms for LGWR write but I think it’s because my writes were relatively large. In Jamon’s somewhat artificial case, LGWR write size was probably in the range of 512 bytes to 1K. There is no statspack report time-frame but I assume that one commit caused about half to one LGWR write since it’s performance was dazzling fast. If Jamon is reading this comment, it would be nice of him to provide LGWR write size from that case.

    Comment by Alex Gorbachev — October 28, 2007 @ 4:33 am BST Oct 28,2007 | Reply

  23. Hi Jonathan and all,
    the disk caching side of this conversation is something that has bothered me for sometime (I run Oracle on Linux).

    I previously thought that the LGWR process would *always* guarantee my redo log changes were written directly to disk once I had committed. In other words, once LGWR had made its I/O request and returned to my session, my changes were safe.

    However, after investigation, apparently LGWR writes to disk (under Linux) using an I/O feature called DIRECT IO. After checking with Oracle and Redhat, DIRECT IO only bypasses the Redhat O/S I/O caching mechanism. It does not instruct the hardware to bypass any subsequent I/O controller caching or Physical disk caching.

    Therefore to be really sure that your changes are actually written to disk, you need to investigate the caching aspect of each hardware component between your server and physical storage!

    Stewart

    Comment by Stewart Forgie — October 29, 2007 @ 4:17 pm BST Oct 29,2007 | Reply

  24. Hi Lewis and Alex,

    I am glad to see that this post has blossomed quite a bit since my post. One quick side note – the disk in your Laptop is battery backed (by the laptop battery :) ).

    I am glad that you found the setting to disable the write cache on the disk as I think it helps explain more clearly that when you are testing memory access on any device there is no easy way to predict what the performance will be without testing.

    A quick note on Alex’s comment on RamSan performance, our write time is the sum of RamSan access time + HBA driver/Asic access Latency + OS/application access latency. For a few years back on an HPUX platform I would guess that a 2 Gbps HBA was used in the server. I end up doing a lot of FC performance testing and there have been dramatic advances in the access latency of 4 Gbps HBAs vs 2 Gbps HBAs. The 2 Gbps HBAs tended to add 0.2-0.3 ms to an access and the 4 Gpbs HBAs only add 0.03 to 0.04 ms to an access (assuming interrupt Coalescing is disabled). Our access time was 0.02 ms with our 2 Gbps controllers and 0.015 with our 4 Gbps controllers + transfer time (this varies by the block access size and link bandwidth for 13k and a 2 Gbps link it would be 13k/~200,000KB/s = 0.065 ms). Adding the OS and oracle overhead and I would expect the 0.5 ms time. There have been a lot of enhancements since then (largely concerning equipment that we do not control) that has allowed much lower write times.

    For the insert and commit loop the LGWR was definitely writing less than 1k per write.

    Here is the some more data from the statspack report, I would be happy to send the whole report if you want it (our email format is firstname.lastinitial@ramsan.com). We actually had Linux multipathing running and we were using linux mirroring on the RamSan volume, I may see what it looks like if we pull these component off and just duplex the logs.

    Also if you are going to Oracle world the week after next I would be happy to chat there.

    Thanks,
    Jamon Bowen


    WORKLOAD REPOSITORY report for

    DB Name         DB Id    Instance     Inst Num Release     RAC Host
    ------------ ----------- ------------ -------- ----------- --- ------------
    TMS3          2278435435 tms3                1 10.2.0.1.0  NO  amd-b63.texm

                  Snap Id      Snap Time      Sessions Curs/Sess
                --------- ------------------- -------- ---------
    Begin Snap:      2348 23-Oct-07 13:10:50        18       3.5
      End Snap:      2349 23-Oct-07 13:20:50        18       3.5
       Elapsed:               10.00 (mins)
       DB Time:                9.77 (mins)

    Cache Sizes
    ~~~~~~~~~~~                       Begin        End
                                 ---------- ----------
                   Buffer Cache:       392M       392M  Std Block Size:         8K
               Shared Pool Size:       104M       104M      Log Buffer:     6,216K

    Load Profile
    ~~~~~~~~~~~~                            Per Second       Per Transaction
                                       ---------------       ---------------
                      Redo size:          5,168,779.51                511.66
                  Logical reads:             36,451.85                  3.61
                  Block changes:             40,490.35                  4.01
                 Physical reads:                  1.02                  0.00
                Physical writes:                194.54                  0.02
                     User calls:                  0.10                  0.00
                         Parses:                  1.37                  0.00
                    Hard parses:                  0.00                  0.00
                          Sorts:                  0.59                  0.00
                         Logons:                  0.03                  0.00
                       Executes:             10,104.30                  1.00
                   Transactions:             10,102.02

      % Blocks changed per Read:  111.08    Recursive Call %:   100.00
    Rollback per transaction %:    0.00       Rows per Sort:    23.08

    Instance Efficiency Percentages (Target 100%)
    ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
                Buffer Nowait %:  100.00       Redo NoWait %:  100.00
                Buffer  Hit   %:  100.00    In-memory Sort %:  100.00
                Library Hit   %:  100.00        Soft Parse %:  100.00
             Execute to Parse %:   99.99         Latch Hit %:   99.97
    Parse CPU to Parse Elapsd %:             % Non-Parse CPU:  100.00

    Shared Pool Statistics        Begin    End
                                  ------  ------
                 Memory Usage %:   94.67   94.67
        % SQL with executions>1:   71.31   72.73
      % Memory for SQL w/exec>1:   61.76   61.91

    Top 5 Timed Events                                     Avg %Total
    ~~~~~~~~~~~~~~~~~~                                    wait   Call
    Event                               Waits  Time (s)   (ms)   Time Wait Class
    ------------------------------ ---------- --------- ------ ------ ----------
    CPU time                                        520          88.7
    log file parallel write         4,012,597       429      0   73.1 System I/O
    log file switch completion            191        41    212    6.9 Configurat
    log file switch (checkpoint in         82        31    383    5.4 Configurat
    LGWR wait for redo copy           291,581         1      0    0.2      Other
              ---------------------------------------------------------

    Statistic                                   Total     per Second   per Trans
    -------------------------------- ---------------- -------------- -----------
    physical write bytes                  956,456,960    1,593,699.2       157.8
    physical write total IO requests        4,021,945        6,701.6         0.7
    physical write total bytes          4,987,928,576    8,311,150.4       822.7
    physical write total multi block        3,423,986        5,705.2         0.6
    physical writes                           116,755          194.5         0.0
    physical writes direct                        415            0.7         0.0
    physical writes direct (lob)                    0            0.0         0.0
    physical writes direct temporary                0            0.0         0.0
    physical writes from cache                116,340          193.9         0.0
    physical writes non checkpoint            114,196          190.3         0.0
    pinned buffers inspected                        0            0.0         0.0
    prefetch warmup blocks aged out                 0            0.0         0.0
    prefetched blocks aged out befor                0            0.0         0.0
    process last non-idle time                      0            0.0         0.0
    recursive calls                        18,194,819       30,317.2         3.0
    recursive cpu usage                        43,813           73.0         0.0
    redo blocks written                     7,726,344       12,874.0         1.3
    redo buffer allocation retries                146            0.2         0.0
    redo entries                            8,606,700       14,340.9         1.4
    redo log space requests                       323            0.5         0.0
    redo log space wait time                    7,366           12.3         0.0
    redo ordering marks                        38,362           63.9         0.0
    redo size                           3,102,037,856    5,168,779.5       511.7
    redo synch time                                 0            0.0         0.0
    redo synch writes                             166            0.3         0.0
    redo wastage                          727,552,932    1,212,287.2       120.0
    redo write time                            46,272           77.1         0.0
    redo writer latching time                     370            0.6         0.0
    redo writes                             4,012,526        6,685.9         0.7

    Comment by Jamon Bowen — October 31, 2007 @ 8:05 pm BST Oct 31,2007 | Reply

  25. Thanks Jamon.
    Your note about 2 GB vs 4 GB is interesting.
    I won’t make it to OOW this year as I have another conference almost at the same time so it’s too much for me. I’ll drop you a note.

    Comment by Alex Gorbachev — November 8, 2007 @ 2:38 am BST Nov 8,2007 | Reply

  26. Hi all,

    The subject was discussed a long time ago, but maybe someone reads my post and provide a solution.
    The problem I’m facing is with regard to Oracle database(10.2.0.4) installed entirely on a SSD disk, on CentOS 6.
    From time to time, every one month or so, the SSD is put into offline mode, no I/O can be made and Oracle hangs. Usually this is resolved with a reboot, but had also some data corruption in one case.

    Here is a small fragment of the OS logs:

    Feb 16 05:11:21 oraclexx kernel: ata6.00: exception Emask 0x0 SAct 0xffff
    SErr 0x0 action 0x6 frozen
    Feb 16 05:11:21 oraclexx kernel: ata6.00: failed command: WRITE FPDMA QUEUED
    Feb 16 05:11:21 oraclexx kernel: ata6.00: cmd
    61/08:00:38:15:43/00:00:03:00:00/40 tag 0 ncq 4096 out
    Feb 16 05:11:21 oraclexx kernel: res
    40/00:00:00:00:00/00:01:00:00:00/a0 Emask 0x4 (timeout)
    Feb 16 05:11:21 oraclexx kernel: ata6.00: status: { DRDY }

    Feb 16 05:11:32 oraclexx kernel: ata6: hard resetting link
    Feb 16 05:11:32 oraclexx kernel: ata6: SATA link down (SStatus 0 SControl
    310)
    Feb 16 05:11:32 oraclexx kernel: ata6.00: disabled
    Feb 16 05:11:32 oraclexx kernel: ata6.00: device reported invalid CHS
    sector 0

    Feb 16 05:11:32 oraclexx kernel: end_request: I/O error, dev sdb, sector
    54727992
    Feb 16 05:11:32 oraclexx kernel: Buffer I/O error on device sdb1, logical
    block 6840743
    Feb 16 05:11:32 oraclexx kernel: lost page write due to I/O error on sdb1
    Feb 16 05:11:32 oraclexx kernel: sd 6:0:0:0: [sdb] Result: hostbyte=DID_OK
    driverbyte=DRIVER_SENSE

    Feb 16 05:11:32 oraclexx kernel: end_request: I/O error, dev sdb, sector
    34656896
    Feb 16 05:11:32 oraclexx kernel: sd 6:0:0:0: rejecting I/O to offline device
    Feb 16 05:11:32 oraclexx kernel: end_request: I/O error, dev sdb, sector 0
    Feb 16 05:11:32 oraclexx kernel: Buffer I/O error on device sdb1, logical
    block 4331856
    Feb 16 05:11:32 oraclexx kernel: lost page write due to I/O error on sdb1

    Feb 16 05:11:32 oraclexx kernel: end_request: I/O error, dev sdb, sector
    55003384
    Feb 16 05:11:32 oraclexx kernel: ata6: EH complete
    Feb 16 05:11:32 oraclexx kernel: ata6.00: detaching (SCSI 6:0:0:0)
    Feb 16 05:11:32 oraclexx kernel: Aborting journal on device sdb1-8.
    Feb 16 05:11:32 oraclexx kernel: EXT4-fs error (device sdb1):
    ext4_journal_start_sb:
    Feb 16 05:11:32 oraclexx kernel: JBD2: I/O error detected when updating
    journal superblock for sdb1-8.
    Feb 16 05:11:32 oraclexx kernel: EXT4-fs error (device sdb1) in
    ext4_reserve_inode_write: Journal has aborted

    Best regards and hope someone answers soon :)

    Comment by Dan — February 21, 2012 @ 1:25 pm BST Feb 21,2012 | Reply


RSS feed for comments on this post. TrackBack URI

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

The Rubric Theme. Blog at WordPress.com.

Follow

Get every new post delivered to your Inbox.

Join 4,015 other followers