Oracle Scratchpad

August 3, 2017

Block Sizes

Filed under: Block Size,Infrastructure,Oracle — Jonathan Lewis @ 7:31 pm BST Aug 3,2017

Here’s an old draft based on a very old tpc-c test that I’ve only just rediscovered. There are a couple of brief comments at the end that I’ve added just before publishing, otherwise this is just as I wrote it years ago.

Several years ago I copied some results from a couple of TPC-C reports of what were then, if I recall correctly, the top performance results for the current Oracle version. I made the copy May 2010, but never got around to writing up a note commenting on them – but the comment I have to make is pretty timeless so here it is, only seven years late [update: now 10 years, presumably I wrote these notes in 2017]

One of the reports was for a Sun SPARC Enterprise T5440 Server Cluster running RAC on version 11.1.0.7 with 12 instances, the other for an HP Integrity Superdome-Itanium2/1.6GHz/24MB iL3 running single instance 10.2.0.3. The results aren’t intended as any sort of reflection on current Sun and HP equipment, or current performance; I ‘ve published them together to highlight one simple idea. How do you answer someone who asks questions like: how big should my cache size be, should I use a keep or recycle cache, should I use a different block size for my indexes ?

Take a look at these two sets of figures from (we assume) the best experts in their particular platform when they are trying to address exactly the same performance problem.

 Sun Sparc  HP Superdome
 db_2k_cache_size  60G
 db_[4k_]cache_size  2G  1G  (default cache)
 db_[8K_]cache_size 65G (default cache)  10G
 db_16K_cache_size  95G  476G
 db_keep_cache_size  210G  1220G
 db_recycle_cache_size  40G  16G

Clearly the HP has about 1783G allocated to the total buffer cache compared to “only” 415G on the Sun – a factor of just over 4 difference – so it’s not surprising that the values for the individual cache are rather different. What is significant, though, is that the values are not consistently different.

Why isn’t the Sun using the 2K at all – surely if it’s a good idea to use very small blocks for something it’s a good idea for both platforms. In fact it was a common claim in some quarters that small blocks were a really good idea for RAC and it was the Sun that was running RAC.  (Of course that doesn’t really matter if you can rig your system to avoid cross-instance traffic.)

Why is the default cache on the Sun 6 times the size of the 8K cache on the HP rather than being in scale at roughly 1/4; alternatively why is the default cache on the Sun 65 times the size of the default cache on the HP (remember some tablespaces have to match the default block size).

Similarly the recycle caches show the same reversal in their relative sizes, Why ?

Why, since the 16K block size has been seen as “quite nice” has neither platform gone to the even nicer 32K ? One of the loudly touted myths of the time was that bigger blocks were good for indexes – but neither test uses the largest possible block size.  (Actually, I have to note that I think the HP implementation didn’t allow for 32KB Oracle blocks, but I may be wrong about that.)

The purpose of this post isn’t to question exactly why the two configurations are significantly different for the same task – the purpose is to point out that picking a block size, or set of block sizes, or use of the keep and recycle caches, isn’t a trivial task with a knee-jerk right answer that you can get by posting a question on OTN or reading a couple of brief articles on a web-site.

tl;dr

If you have to ask for help choosing the best block size – use the default for your platform, and don’t mess with the keep and recycle unless you’re prepared to pay an expert to spend a little time helping you, and then spend quite a lot of time doing the tests that they should have suggested.

 

Addendum July 2020

It’s a very long time since I read any of the tpc-c reports – they tend to run to dozens of pages of description of exactly what hardware and software has been used, what scripts were used to populate tables, what parameters were set, and so on. One thing I do remember from reading a few of them is that they weren’t realistic attempts to implement a working system – they were only attempts to take a “realistic” data set but get it into the database in a way that would allow the tpc-c test to run as quickly as possible.

For example –

if you use a 1TB disc to store 100GB of data you can position the data on the fastest part of the disc

if you create a critical table as a single table hash cluster you can access all the rows for a single key value very quickly if you pick a block size that will hold exactly all the rows for one key. And when you notice that the table can store all the data for a key in a 2KB block size you minimise the pressure on the buffer cache by putting that table into its own 2KB buffer cache. The fact that you have to predefine the size of the table/cluster doesn’t matter if the tpc-c test doesn’t worry about what happens “tomorrow” as more data arrives. (Note: I recall that in one 10g or 11g tpc-c test Oracle used a “future” feature of range partitioned hash clusters – which finally appeared in 12c)

 

 

 

 

3 Comments »

  1. Lies, damned lies, and benchmarks :-)

    Comment by mwidlake — July 6, 2020 @ 11:48 am BST Jul 6,2020 | Reply

  2. Thanks for the article! It reminded me, some time ago I developed a statistical report system. The database was quite huge, so I spent a lot of time carefully designing a set of materialized views, that together with appropriate logs and partitioning schemes allowed for fast refresh of the whole thing.
    Now after the customer has upgraded the system to RAC with 128 cores per node, fast SSDs and tons of RAM, all those fancy-schmancy pirouettes got useless, since full refresh with PARALLEL(128) finishes in under 10 minutes :)
    My point is, I suppose, to better or worse there will be less and less interest in careful tuning of every core parameter, since hardware gets cheaper and more available, business is more focused on new development and often willing to migrate to a cloud, and guys with sufficient expertise remain very expensive

    Comment by Viacheslav Andzhich — July 6, 2020 @ 12:03 pm BST Jul 6,2020 | Reply

    • Viacheslav,

      I think there’s a lot that can be said to support the claim that DSS / DW systems are now at a stage where it’s almost possible to throw hardware at any “vaguely reasonable” design and get an adequate response time, and there’s a lot less need to worry about tweaking every feature to the limit. Of course there are still tools which try to operate in a totally generic – hence hugely inefficient – way that can use the available resources very badly and introduce completely unnecessary overheads, but even their shortcomings can often be worked around fairly easily.

      There are still feature of large-scale (many users, high rate of change) OLTP systems, though, where extreme care has to be taken – more to avoid issues of concurrency and contention that to handle pure performance issues in most cases; and even in the mixed systems where high volume reports run against rapidly changing data a simple brute force approach can still fail because of read-consistency issues that have been overlooked.

      Regards
      Jonathan Lewis

      Comment by Jonathan Lewis — July 6, 2020 @ 5:16 pm BST Jul 6,2020 | Reply


RSS feed for comments on this post. TrackBack URI

Comments and related questions are welcome.

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Powered by WordPress.com.