I got a call to a look at a performance problem involving LOBs a little while ago. The problem was with an overnight batch that had about 40 sessions inserting small LOBs (12KB to 22KB) concurrently, for a total of anything between 100,000 and 1,000,000 LOBs per night. You can appreciate that this would eventually become a very large LOB segment – so before the batch started all LOBs older than one month were deleted.
The LOB column had the following (camouflaged) declaration:
LOB (little_lob) STORE AS BASICFILE ( TABLESPACE lob_ts ENABLE STORAGE IN ROW RETENTION NOCACHE LOGGING )
The database was 11gR2, the tablespace was defined with ASSM with uniform 1MB extents and a blocksize of 8KB (so the LOBs were all 2 or 3 chunks) and the undo retention time was 900 seconds. The effect of the “enable storage in row” is that the LOBINDEX didn’t have to hold any details of current LOB chunks (for in-row, the first 12 chunks are listed in the LOB Locator in the base table).
So, examining an AWR report covering the critical interval, reviewing the captured ASH data, and checking the database, a few questions came to mind:
- With 200 GB of current LOB data in the segment, why was the segment roughly 800GB ?
- With no need for current LOBs to be indexed, how had the LOB Index reached 500,000 blocks in size ?
- There had been 500,000 inserts that night – so why had Oracle done 6 Billion (cached) buffer gets on the (nocache) LOB segment ?
- Given that the LOB Segment had not changed size during the night, why had there been millions of HW enqueue wait on the inserts ?
Knowing the stuff that I did know about basicfile LOBs it seemed likely that the most significant problem was that the segment hadn’t been created with multiple freepools which, according to the very sparse (and not entirely self-consistent) documentation, exist to allow improved concurrency. So I thought I’d search the Internet for any useful information about freepools, how they worked, what impact they might have on this problem, why their absence might produce the symptoms I’d seen, and what the best course of action would be to address the problem.
Of course the “correct” solution according to MoS would be to convert from basicfile to securefile – with a strange insistence on using online redefinition, but no explanation of why a simple CTAS or alter table move is undesirable or dangerous. Unfortunately there are a couple of notes on MoS describing performance issues with “high” levels of concurrent inserts that need to be addressed by setting hidden parameters so I’m not (yet) keen on rebuilding 700GB of a production system to produce a change that might still not work quickly enough; especially since I couldn’t find anything on MoS that could quantify the time needed to do the conversion.
To my surprise I couldn’t find a single useful piece of information about the problem. The only articles I could find seemed to be little bits of cut-n-paste from the Oracle manual pages about using multiple freepools, and the best of those actually demonstrated rebuilding or changing the freepools settings on a LOB of a few megabytes. The most significant MoS note did say that the process “could be slow” and would lock the table. But surely someone, somewhere, must have tried it on a big system and had some idea of “how slow”.
In the end I had to start building some simple models and doing a few experiments to find out what happens and where the time goes and what causes the strange results and – most importantly – how freepools might help. Fortunately, following a call to the Oak Table for any ideas or links to useful documents, I got a pointer to the original Oracle patents which were enormously helpful in showing why freepools could help and why, in the wrong circumstances, you could still end up with a (slightly smaller) disaster on your hands.
To be continued …
If you’re interested, the patent numbers are: 5,999,943 and 6,061,678. Now I just need someone to tell me the numbers for the securefile LOBs patents.