Oracle Scratchpad

December 23, 2015


Filed under: Oracle,Philosophy — Jonathan Lewis @ 12:56 pm GMT Dec 23,2015

This post is a 100% copy of a message that Tanel Poder sent to the Oracle-L mailing list in response to a thread about the performance of SSD. It’s not just a good answer to the question, it’s a wonderfully succinct insight into how to think about what you’re really testing and it displays the mind-set that should be adopted by everyone.

If you measure write performance on an idle Exadata machine without any other load going on, you are not comparing flash vs disk, you are comparing flash vs the battery-backed 512MB RAM cache in the “RAID” controllers within each storage cell!

This is how the “disk” that’s supposed to have a couple of milliseconds of average latency (it still rotates and needs to seek + calibrate to next track even in sequential writes) gives you sub-millisecond write latencies… it’s not the disk write, it’s the controller’s RAM write that gets acknowledged.

And now when you run a real workload on the machine (lots of random IOs on the disk and Smart Scans hammering them too), your disk writes won’t be always acknowledged by the controller RAM cache. When comparing *busy* flash disks to *busy* spinning disks vs. *idle* flash disks vs *idle* spinning disks (with non-dirty write cache) you will get different results.

So, I’m not arguing here that flash is somehow faster for sequential writes than a bunch of disks when talking about throughput. But if you care about latency (of your commits) you need to be aware of everything else that will be going on on these disks (and account for this in your benchmarks).

Without queueing time included, a busy flash device will “seek” where needed and perform the write in under a millisecond, a busy disk device in 6-10 milliseconds. So your commits will end up having to wait for longer (yes, your throughput will be ok due to the LGWR writing multiple transactions redo out in a single write, but this doesn’t change the fact that individual commit latency suffers).

This latency issue of course will be mitigated when you are using a decent storage array with enough (well-managed) write cache.

So I’d say there are the following things you can compare (and need to be aware of which hardware are you really benchmarking):

1) Flash storage
2) Disk storage without (write) cache
3) Disk storage with crappy (write) cache
4) Disk storage with lots of well-managed & isolated (write) cache

And the second thing to be aware of:

1) Are you the single user on an idle storage array
2) Are you just one of the many users in a heavily utilized (and randomly seeking) storage array

So, as usual, run a realistic workload and test it out yourself (if you have the hardware :)

1 Comment »

  1. Nicely done. Two more bits: 0) Real persisted RAM storage in the 1-4 list, and under “And the second thing”: 3) Do you have a real load on the rest of the disk storage that gets “de-heated” (seeks interrupting a series of sequential read streams in particular) by moving the redo to something else? 4) Do you have archiving on? (read after write from segregated device versus more load on overall disk farm) [But that might have made the post too long.]

    Comment by rsiz — December 23, 2015 @ 5:23 pm GMT Dec 23,2015 | Reply

RSS feed for comments on this post. TrackBack URI

Comments and related questions are welcome.

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Website Powered by

%d bloggers like this: