Oracle Scratchpad

May 11, 2007

LOB sizing

Filed under: Infrastructure,LOBs,Oracle,Performance,Tuning — Jonathan Lewis @ 7:29 pm BST May 11,2007

Some time ago, I was asked to take a quick look at an application that had to handle a lot of LOBs. The LOB-specific part of the application was actually quite simple – contracts were stored as LOBs – but only for occasional visual reference; all the “structured” information from the contract was extracted and stored in relational tables. Some time after a contract had expired, the LOB could be deleted to reclaim space (in theory).  Historically, the client had purged a load of LOBs from time to time, but didn’t have a deliberate house-keeping task to do the job on a regular basis.

My task was to check how much benefit they could get by improving their house-keeping routines, and to suggest any another efficiency enhancements that I could spot.

As a starting point I whipped together the following piece of SQL, just to see what was in the critical table. The purpose of the SQL was to show:

  • How many LOBs were currently stored in the main table
  • How many LOBs would be stored if we purged 14 days after expiry.
  • How many LOBs would be stored if we purged 7 days after expiry.
  • The approximate space requirements (in 8KB blocks) in all three cases.

The code is followed by the first few lines of output.

column c_size format      9,999	heading "Blocks"
column c_ct format   99,999,999
column c_vol format 999,999,999          

column f_ct format   99,999,999
column f_vol format 999,999,999          

column w_ct format   99,999,999
column w_vol format 999,999,999          

select
	c_size, count(*) c_ct, c_size * count(*) c_vol,
	f_size, f_size * count(*) f_vol,
	w_size, w_size * count(*) w_vol
from	(
	select
		c_size,
		case when expiry_date > trunc(sysdate - 14)
			then c_size
			else null
		end							f_size,
		case when expiry_date > trunc(sysdate - 7)
			then c_size
			else null
		end							w_size
	from	(
		select
			expiry_date,
			ceil(dbms_lob.getlength(contract_text)/8000)	c_size
		from	contracts
		)
	)
group by
	c_size, f_size, w_size
order by
	c_size, f_size, w_size
;          

Blocks        C_CT        C_VOL F_SIZE        F_VOL W_SIZE        W_VOL
------ ----------- ------------ ------ ------------ ------ ------------
     1   5,123,187    5,123,187      1    5,123,187      1    5,123,187
     1      91,600       91,600      1       91,600
     1   4,979,383    4,979,383
     2      37,414       74,828      2       74,828      2       74,828
     2      12,437       24,874      2       24,874
     2     643,971    1,287,942
     3      15,652       46,956      3       46,956      3       46,956
     3       4,011       12,033      3       12,033
     3     228,683      686,049
     4       5,333       21,332      4       21,332      4       21,332
     4          96          384      4          384
     4       8,119       32,476       

The output from the query is neither elegant nor entirely self-explanatory, but one of the problems of checking production systems is that you don’t want to hit them too hard. In this case the main table was over 10GB (the LOB segment was about 0.25 TB) so I didn’t want to query it very often – especially since the query had to be a tablescan – so I was aiming for efficiency rather than pretty when I created the SQL.

I’ve used a simple scan of the table to extract the size (in units of roughly 8KB) and expiry date of each LOB. I’ve then in-lined this and wrapped it with a query that uses the current date to decide whether to include the LOB as less than a fortnight old (f_size) or less than a week old (w_size).

Once I’ve got the split, I just sum the LOB space.

Because of the way I’ve defined f_size and w_size, when I group by c_size, f_size, and w_size I end up with (up to) three rows for each possible LOB size – the order by clause then ensures that “fortnight” columns eliminate the third row of each group, and the “week” columns eliminate both the second and third.

The upshot of this report is that you can see two key features:

  • A very large fraction of the LOBs use less than 8KB each
  • If you only allow contracts to exist for 2 weeks after expiry, the data size drops by about 50%.

It just happened that the client had built his LOBs in a tablespace with a 16KB block size. Moving the LOB segment to a tablespace with an 8KB block size would automatically save 128GB.

A routine that ran every night to delete all LOBs older than 13 days would save another 64GB (Alternatively, running a routine every Saturday night to delete LOBs older than 7 days would have a similar effect).

Other options to consider:

Would switching to a tablespace with a 4KB block size reduce the storage further – you’d have to run the query again using 4,000 instead of 8,000 to get a clue about that (I did, and the answer was no).

Should the LOB be defined as ‘out of line storage’ only – that depends on what you do with the rest of the data in the table, but in this case the answer was yes.

Should the LOB be declared CACHE, CACHE READ, or NOCACHE; should it be declared LOGGING or NOLOGGING. Again, it depends how you use the data. In this client’s case, LOGGING was an obvious requirement, the caching decision was arguable.

Given that we were about to move the LOB back to a standard block size, caching the LOB could have a negative impact on the rest of the caching (even, for fairly subtle reasons, if you cache it in the recycle pool); on the other hand, setting it NOCACHE means all reads and writes to the LOB are “direct path” which could have an intermittent, and random, impact on the performance of other processes.

In the end we stuck with an 8KB block size, caching in the recycle pool, but it did occur to me that an alternative strategy would have been to declare the LOB in a tablespace with a 4KB block size (to give it a totally isolated cache) then set the LOB chunk size to 8KB so that Oracle could treat the LOB almost as if it were stored in a tablespace with an 8KB block size.

Update Oct 2009:

I was on a customer site a few weeks ago working on a couple of problems that they had asked me to investigate – including the apparently huge waste of space in their LOBs. As a quick check I wrote a little script (a variant of the one above) to calculate the sizes of the LOBs in a sample from each of the critical tables. The results were staggering – apparently 90% of their LOB allocation was empty space – and then I realised two things:  the getlength() function counts the number of characters in a CLOB, and they were using a multibyte character set (and multibyte character sets get stored as a fixed 2-byte character set a CLOB is stored in the database). So I had to modify my script to change the calls to getlength() to “2 * getlength()”.  (They were still wasting a few hundred gigabytes of LOB space – but not as much as the initial run suggested).

10 Comments »

  1. Jonathan,

    How much was ‘recycle pool’ size after caching? I wonder how much buffer cache is required to CACHE 0.25 TB of LOB.

    Thanks

    Comment by sid — May 14, 2007 @ 2:42 pm BST May 14,2007 | Reply

  2. “Moving the LOB segment to a tablespace with an 8KB block size would automatically save 128GB” – Did not understand this? I am missing a point.

    Comment by Sathish — May 14, 2007 @ 3:09 pm BST May 14,2007 | Reply

  3. Sid, we weren’t trying to cache the entire LOB segment (which was down to about 64GB) in the recycle pool. We were using the recycle pool so that (a) we could cycle the blocks through a caching mechanism – to avoid direct path reads and writes but (b) avoid thrashing the default pool to death because of the huge volume of LOB that would go through the cache over time. For the activity going on, sizes around 128MB were adequate.

    Satish, the minimum size for an ‘out of line’ LOB is one “chunk”, and the minimum size of a chunk is one block. Most of our LOBs were smaller than 8KB, but the tablespace for the LOB segment was 16KB – so most of our LOBs were using less than half a block. By halving the block size we saved half the space – the half that we were wasting in most blocks.

    Comment by Jonathan Lewis — May 14, 2007 @ 8:27 pm BST May 14,2007 | Reply

  4. Hi Jonathan,
    There were a lot of bugs related to LOBS (in 9.2 version)
    in combination with ASSM (last one we have seen it seems to be fixed only with latest patchset 10.2.0.3).
    In tests we have done in this context (inserts + delete in batches), it looks like that still with 10.2 version and MSSM – not only is more stable but performed better than BLOB in ASSM.
    (also RETENTION showed better results from performance perspective in all versions but difference was much more important in 9.2 version).
    What is your experience/comment ?

    How you decided RETENTION vs PCTVERSION ?

    Thank you and regards,
    Igor

    Comment by Igor — May 15, 2007 @ 11:51 am BST May 15,2007 | Reply

  5. Dear Jonathan,

    if you can please put some highlight some options or point out some way using which one can decide which option for LOB storage (CACHE, CACHE READ, or NOCACHE; LOGGING or NOLOGGING) should be used.

    In our system we are having tables with CLOB columns, the system is third party and mostly deletes on the tables with CLOB column is taking a lot of time. I was wondering if changing some storage settings will help us, presently we have ENABLE STORAGE IN ROW CHUNK 8192, NOCACHE LOGGING.

    Regards,
    Vijay.

    Comment by Vijay — October 2, 2009 @ 2:03 pm BST Oct 2,2009 | Reply

    • Vijay,

      Sorry about the delay in replying – sometimes I just lose track of comments because of all the travel I do. I’ll be writing up a short note in the next few days to answer this question.

      Comment by Jonathan Lewis — October 23, 2009 @ 4:00 pm BST Oct 23,2009 | Reply

  6. Hi Jonathan

    What are the implications of specifying the Storage preference for the $I, $N, etc tables?

    v_Attributes := ‘TABLESPACE storage(initial 1K next 1K pctincrease 0);
    CTX_DDL.Set_Attribute(,I_TABLE_CLAUSE,);

    OR is it just better to specify just the tablesapace and take the defaults.

    Regards

    Comment by Amin Adatia — December 20, 2011 @ 6:40 pm BST Dec 20,2011 | Reply

    • Amin,

      Unless you’re on a very old version of Oracle you shouldn’t be thinking about things like initial, next and pctincrease anyway.

      As a general guideline for tablespaces (and block sizes) – if you can’t think of a good reason for changing the defaults then use the defaults.

      In the case of context it’s probably more important to worry about setting the memory allocation attribute for creating the context index as this dictates the volume of data (e.g. number of files) that the code will read as a unit when creating the index – and the size of the LOB content of the index tables will be dictated by the volume of data read as a unit.

      Comment by Jonathan Lewis — December 23, 2011 @ 10:06 am BST Dec 23,2011 | Reply

      • Jonathan

        Thanks. I already have set the memory to the max allowed (2GB) and the default to 1GB. The tablespaces were created with “defaults” which happen to have pctincrease 50%. I am moving to the auto-allocate as I get the chance to do so. We also have partitioned tables (to mainly help with the deletes) and we use tablespaces in a round-robin (tablespace for the day essentially) in case a datafile gets corrupted ad the whole tablespace becomes unavailable until the data files are resolved.

        Anyway, how to find out if the memory allocations/parameters are working optimally?

        Comment by Amin Adatia — December 23, 2011 @ 1:33 pm BST Dec 23,2011 | Reply


RSS feed for comments on this post. TrackBack URI

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

The Rubric Theme. Blog at WordPress.com.

Follow

Get every new post delivered to your Inbox.

Join 3,985 other followers