There is a recent thread [ed Jan 2009: the following link is temporarily broken] on the OTN forum about block sizing and the impact it may have on performance. The thread contains some interesting insights into the problems of testing and trouble-shooting. Unfortunately the thread has accumulated so much extraneous material that it has become hard to separate the wheat from the chaff.
- [Ed Jan 2009]
According to this comment from Justin Kestelyn, extremely long messages were causing performance problems in the current version of the forum software and have have been temporarily archived to a separate location – and this thread contained some very long messagesSince it may be some time before it reappears on the forum I’ve posted a copy made just before it disappeared so that (until the original reappears) you can read the interesting bits here. Warning the file is a pdf of about 3MB – the file will open in a separate window so that you can read the file and my comments about the file at the same time. When I view the file it runs to 233 pages, so I’ve added page references after the links in the blog (the links, of course, do not currently work).
So I thought I’d make it easy to follow a couple of useful discussions by supplying a sequence of URLs that you can visit in turn.
Item 1: It’s hard to build a good test case.
This item starts with user sp009 being the first to construct and supply a demonstration (page 7: sp009) to support (or refute) some of the comments that had been flying around.
Unfortunately there was an important flaw (page 9: Lewis) in his example – he wasn’t running his queries he was simply looking at execution plans, and there was a configuration error that would have spoiled the tests anyway.
So sp009 created the test again giving us two postings – one to report some details about configuration, data sizes and execution plans (page 13: sp009), and the other containing the tkprof summaries (page 17: sp009) for the test queries.
Unfortunately the tests still had a configuration mismatch (page 21: Lewis) which obscured the source of the variation in timing between the tests, and the “wait” information had not been captured.
Following through sp009 recreated the tests (page 34: sp009) and, since he was using 10g, allowed Oracle to pick a multiblock read count rather than specifying one.
Many of the figures from this test showed that the “infra-structure” of the test was now appropriate; but a number of anomalies (page 40: Lewis) showed up. For example, the tests of different block sizes showed a time difference of 3.22 seconds over 24.39 seconds, but the unaccounted time in the tests was three of four times the size of this difference.
At this point sp009 dropped out of the test cycle but Greg Rahn had also been supplying us with some test results and produced a test case (page 58: Rahn) modelled on the example created by sp009 but using a more realistic data set. These results showed much better internal consistency than the final set from sp009. Even so, the time difference between Greg’s results (0.06 seconds over 24 seconds) was less than the accounting error (0.1 seconds) for a given block size.
Almost inevitably there was room to argue that Greg Rahn’s test was not appropriate for the general case (page 59: Lewis), largely because he was running 11g and Oracle chose to use the new ‘serial direct reads’ feature, which means it bypassed the (cache buffers chains) latch activity that can be a significant fraction of the CPU expense of logical I/O.
Building good test cases is very difficult. You constantly have to ask yourself if you’re constructing a valid model of the problem you are investigating; you always have to think about what other reasons there might be for the results you are seeing; and you have to make sure you can explain any transient anomalies in the results. Peer review helps enormously – the omissions, or assumptions, that you’ve made may stand out clearly to someone who doesn’t start with your preconceptions when they examine your test case.
Item 2: Tests highlighting bugs (no links)
It has been pointed out a few times in the Oracle Forum and elsewhere on the Internet that if you set the db_file_multiblock_read_count to zero in 10g Oracle will actually use the value one – which is not usually a good value for “multi-block” reads.
BUT – as a consequence of various tests being run and the results being published on the thread by people using different versions of 10gR2, it transpired that this behaviour was evidence of a bug (number 5768025) fixed in 10.2.0.4.
Item 3: When should testing stop
It can be difficult deciding when to implement the best fix that you can come up with, when it’s possible that just “a little more time” will allow you to identify the real problem.
Steve Karam introduced a case (page 157: Karam) through a link to his blog where moving a table from a tablespace using a 16KB blocksize to a tablespace with a 4KB block size made an enormous difference to performance. (Starting with just a simple export/import to the new block size that made the batch run drop from 45 minutes to 2 minutes 20).
Steve’s motive for trying this strategy was that there was another system already running with a 4KB block size that used the same code and completed quickly. It’s a good idea (especially when pushed for immediate results) to start by adopting the strategy of: “make this one look like the one that works and see what happens”.
Steve’s report, unsurprisingly, met with some disbelief – along the lines of “if it looks too good to be true, it is too good to be true” – and a couple of suggestions about why the performance might have changed.
But Steve had already covered the options suggested in some of the other tests he had run. Then he published some extracts of trace files taken from two runs of the job (page 161: Karam) , one on a database using 4KB blocks, and one on a database using 16KB blocks.
The results suggested that the problem (page 162: Lewis) was not directly associated with the block size, although it might have been triggered by an unexpected side effect (for which read bug) of using 16KB blocks.
But at this point you may think that you’ve already addressed the root cause; and even if you don’t think this, you might look at what you’ve achieved so far and take the view; “I’ve made the problem go away; I don’t really know why it went away; but how much more time can I afford to spend understanding what’s really happening?” The other time-dependent question you should ask is: “If this really is the solution, how long is it going to take to implement on production.”
In this case the 4KB block size made the problem go away – if you were the production DBA would you move the whole database to a 4KB blocksize, or run with one table in a special tablespace and move other tables as other batch jobs showed the same symptoms? What if the problem came back – in the 4KB block size – perhaps in several months time when the data sizes are larger ? It’s not an easy call to make.
Being a little curious, I tried to emulate Steve’s result by creating a test based on the brief description he had supplied. All I had to do was create a two-column table, then update every row … nothing dramatic happened. Then a couple of days later I had a flash of insight, and decided to modify the code so that my update changed a column from null to a non-null value (page 186: Lewis).
The problem seems to be a bug that you’ll only notice if you combine ASSM with larger block sizes and an unsuitable setting of PCTFREE in a table. And just to finish it off, Mark Williams tracked down (page 191: Williams) a possible reference to the bug on Metalink.
One of the difficulties with solving problems with Oracle is that sometimes you don’t know whether you’ve really solved a problem or merely postponed it for a while. It helps if you know Oracle so well that you can say (a) “that behaviour is not reasonable” and (b) “that approach should not have been a solution”.
It helps if you can give yourself a reasonable explanation of why a solution should work. If you can’t do that then maybe your solution isn’t really addressing the root cause. Even then, given the time restrictions imposed on most production DBAs, a temporary workaround may be better than nothing.
If you have to ask the question “What block size should I use?” then the answer is probably 8KB.
There are cases where you may get a small benefit from picking the “perfect” block size for your system – but for most of the systems I’ve seen, there’s usually been a lot more benefit to be gained from addressing the big performnace problems before trying to play games with changing block sizes.
If you want to see a more thorough investigation of how things can change (and the scale of that change) as you use different block sizes, Charles Hooper has a long series of demonstrations running through the entire thread.