Concatenating LOBs

July 9, 2009

Concatenating LOBs

Filed under: Infrastructure,LOBs,Oracle,Performance,Troubleshooting — Jonathan Lewis @ 6:24 pm BST Jul 9,2009

If you have to handle LOBs, it’s worth checking for “unusual” activity. Here’s an example of unexpected behaviour that I came across a couple of years ago.

The client had a table with data that had to be written to a flat file so that a number of other databases could import it using SQL*Loader. The table definition and the query to dump the data are shown below – note, particularly, the CLOB sitting in the middle of the table:


Name                    Null?    Type
----------------------- -------- ----------------
ID                               NUMBER(10)
VIEW_NAME                        VARCHAR2(32)
USER_ID                          VARCHAR2(16)
XML_FILE                         CLOB
FILE_PATH                        VARCHAR2(127)
ACCOUNT_ID                       VARCHAR2(16)
AREA_ID                          VARCHAR2(16)
LABEL_ID                         VARCHAR2(16)
COMPLIANCY_FLAG                  VARCHAR2(1)

select
	id  		|| chr(27) ||
	view_name	|| chr(27) ||
	user_id		|| chr(27) ||
	xml_file	|| chr(27) ||
	file_path 	|| chr(27) ||
	account_id	|| chr(27) ||
	area_id		|| chr(27) ||
	label_id	|| chr(27) ||
	compliancy_flag
from
	t1
;

The dump code is fairly simple – it just concatenates the columns in order, inserting a separator between them using a value that’s not allowed to appear in the data. (The client’s code was a little more subtle, but essentially the same).

I’ve inserted 10,000 rows into this table, with 500 bytes of data for each LOB, and when I execute a simple; “select dbms_lob.getlength(xml_file) from t1;” it takes 20,653 logical I/Os – 653 for the tablescan and 2 to read each LOB value (since I’ve disabled storage in row).

So why, when I run the dump code, do I see the following stats – the number of buffer visits is obviously a little surprising but notice, in particular the number of “lob writes” and the number of “db block changes”:


session logical reads              1,840,063
db block gets                      1,540,034
consistent gets                      300,029
db block changes                     550,037
consistent changes                   550,015
free buffer requested                119,891
lob reads                             10,000
lob writes                           220,000

Despite all the block changes, by the way, the session generated less than 2KB of redo log – and that’s a bit of a clue to what’s going on.

Here are the results of the same job after I’ve optimised the code in two different ways:


session logical reads                240,006
db block gets                        140,000
consistent gets                      100,006
db block changes                      50,000
consistent changes                    50,000
free buffer requested                 19,910
lob reads                             10,000
lob writes                            20,000

session logical reads                400,006
db block gets                        280,000
consistent gets                      120,006
db block changes                     100,000
consistent changes                   100,000
free buffer requested                 29,907
lob reads                             10,000
lob writes                            40,000

Of course, you might be more interested in the timing rather than the simple counts – the original code used 13.95 CPU seconds, the best version 3.49 CPU seconds, and the third version 4.73 seconds – and it was the third option that we had to implement.

So where does the extra work come from? The answer is in the different versions of the code … the point at which you introduce the LOB into the list of concatenated columns makes a big difference:

rem
rem     Script:         lob_concat.sql
rem     Author:         Jonathan Lewis
rem     Dated:          Mar 2008
rem

set long 320000
set longchunksize 32000

select
	id  		|| chr(27) ||
	view_name	|| chr(27) ||
	user_id		|| chr(27) ||
	file_path 	|| chr(27) ||
	account_id	|| chr(27) ||
	area_id		|| chr(27) ||
	label_id	|| chr(27) ||
	compliancy_flag	|| chr(27) ||
	xml_file
from
	t1
;

select
	start_piece || xml_file || end_piece
from
	(
	select
		id  		|| chr(27) ||
		view_name	|| chr(27) ||
		user_id		|| chr(27) 	start_piece,
/*								*/
		chr(27)		||
		file_path 	|| chr(27) ||
		account_id	|| chr(27) ||
		area_id		|| chr(27) ||
		label_id	|| chr(27) ||
		compliancy_flag			end_piece,
/*								*/
		xml_file
	from
		t1
	)
;

The minimum work appeared when the LOB was the last thing added to the concatenation; if the LOB wasn’t at the end of the list then every column (including all those chr(27)’s) resulted in the following extra work for every row selected:

	session logical reads                16
	db block gets                        14
	consistent gets                       2
	db block changes                      5
	consistent changes                    5
	free buffer requested                 1
	lob writes                            2

Essentially, every time you add a piece on the end of the LOB, Oracle creates and updates a new temporary LOB (hence two lob writes). Since they are temporary LOBs – associated therefore with the temporary tablespace – all those “db block changes” don’t cause any redo to be generated, but the overhead is significant.

In fact it’s quite easy to miss what may be the most significant figure – the “free buffer requested”. If you use code like this to dump a large amount of data you may find that everything else in the system starts doing more reads and writes because your session keeps demanding free buffers.

In the short term, we adopted the second, slightly more expense, option because the process was delivering data to lots of other systems that were going to parse it using SQL*Loader and the optimum code would forced us to supply a code release to all those remote systems to match the rearranged the column order. The slightly more expensive code kept the result looking the same while using the concatenation operator just once we had concatenated the LOB.

Conclsion:

Test carefully when you start using code that mixes concatenation and LOBs, and check for any unexpected increases in buffer activity.

Update (Aug 2018)

I’ve rerun the test on 11.2.0.4 and the behaviour relating to block accesses is basically the same, though the difference in timing has changed slightly. Moving on to 12.1.0.2, though, there seems to be no difference in the statistics whichever form of the SQL I run: something completely different is happening (and possibly something is happening that does not appear in the statistics). Among other changes, the “lob reads” and “lob writes” statistics disappear, and the number of “SQL*Net roundtrips to/from client” drops from 30,000 in 11g to 20,000 in 12c. I have some notes that the disappearing lob reads and writes may be due to the appearance of a special 256KB buffer in the PGA that is used for LOB manipulation – so these results may only be valid for “small” LOBs; if your LOB column is larger than about 128K the lob reads and writes may return.

The 12c tests run noticeably faster than the 11g tests – down to about fairly flat 5 seconds, compared to a time that varies from about 7 seconds (best) to 16 seconds (worst). In all the tests the largest fraction of the time was SQL*Net transfer time, but the worst 11g test showed much more CPU utilisation than the 12c tests and the better 11g tests.

There is one interesting anomaly in 12.1.0.2 – the statistic “table scan rows gotten” jumps from the 11g value of 10,000 to a staggering 349,750 in all three tests, with no clues that I could find for why that happened. (The same was true for 12.2.0.1).

Comments (22)

22 Comments »

Excellent post.

Comment by Asif Momen — July 10, 2009 @ 10:04 am BST Jul 10,2009 | Reply
Jonathan,

By setting up a test case with 10000 rows and CLOBs of 500, by simply replacing

xml_file || chr(27) ||

with

dbms_lob.substr(XML_FILE, dbms_lob.getlength(XML_FILE)) || chr(27) ||

I get (CPU time converted to seconds for convenience and clarity):

NAME DIFF ------------------------------------------ ---------- session logical reads 1783 db block gets 0 consistent gets 1783 db block changes 0 consistent changes 0 free buffer requested 0 lob reads 0 lob writes 0 CPU used by this session .19
9 rows selected.

Compare with what I get for the original baseline of the simple dump (comparable with your own

results)

NAME DIFF ------------------------------------------ ---------- session logical reads 1770019 db block gets 1540000 consistent gets 230019 db block changes 550000 consistent changes 550000 free buffer requested 110000 lob reads 10000 lob writes 220000 CPU used by this session 5.57
9 rows selected.

Would this have been viable in your case?

Feel free to edit if this comes out badly it’s my first attempt to post anything substantial :)

Chris

Comment by Chris Poole — July 10, 2009 @ 10:29 am BST Jul 10,2009 | Reply

Chris,

Nice idea – but dbms_lob.substr() returns a varchar2, which means the entire concatenated select list is limited to 4,000 bytes.

If you change the rpad(‘x’,500) to rpad(‘x’,4000), you’ll find that your modified code fails with error:
ORA-01489: result of string concatenation is too long

Comment by Jonathan Lewis — July 10, 2009 @ 11:13 am BST Jul 10,2009 | Reply

Absolutely. This is what I was driving it with ‘Would this have been viable?’. I wasn’t sure if your test case was limiting to 500b CLOBs for convenience or was genuinely representative of what they had.

I knew the problem was because of the underlying CLOB conversions going on that could be avoided by using DBMS_LOB. So my first thought was to write a PL/SQL function to do the work, during the writing of said function I wondered if you had, as had I, merely overlooked something simple.

Did you therefore discount a tiny PL/SQL function returning a CLOB?

Using a function that returns a CLOB and assuming the CLOBs are less than 32kb (is this fair?) I can get results that compare favourably with your best case. Of course if you had restrictions on using PL/SQL functions which you haven’t told us or the CLOBs are in fact over 32kb then this is not a fair comparison.

Comment by Chris Poole — July 10, 2009 @ 1:15 pm BST Jul 10,2009 | Reply
- Chris,
  
  Sorry about the misunderstanding – but at least the dialogue will have been of benefit to some of the other readers.
  
  I can’t remember all the details now, but I think the mechanism for dumping the data had to be a SQL statement – something to do with a dynamic generated filtering clause, possibly.
  
  The CLOBs were also pretty open-ended in size – one of the other little jobs I did was work out whether or not to move the LOB segment into a tablespace with a different blocksize.
  
  Comment by Jonathan Lewis — July 10, 2009 @ 2:19 pm BST Jul 10,2009 | Reply

Hi Jonathan, here is another funny thing – “invisible brackets”…

Connected to:
Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 - 64bit Production
With the Partitioning and Data Mining options

SQL> @test
SQL> -- 1.1
SQL> declare
  2    l_stmt clob;
  3    cl clob;
  4    N pls_integer := (1 + 255) + 4000;
  5  begin
  6    l_stmt := 'select to_clob(''x'')';
  7    for i in 2 .. N loop
  8      l_stmt := l_stmt || '||''x''';
  9    end loop;
 10    l_stmt := l_stmt || 'from dual';
 11    execute immediate l_stmt into cl;
 12    dbms_output.put_line(length(cl));
 13  end;
 14  /

PL/SQL procedure successfully completed.

SQL> -- 1.2
SQL> declare
  2    l_stmt clob;
  3    cl clob;
  4    N pls_integer := (1 + 255) + (4000 + 1); -- =4257
  5  begin
  6    l_stmt := 'select to_clob(''x'')';
  7    for i in 2 .. N loop
  8      l_stmt := l_stmt || '||''x''';
  9    end loop;
 10    l_stmt := l_stmt || 'from dual';
 11    execute immediate l_stmt into cl;
 12    dbms_output.put_line(length(cl));
 13  end;
 14  /
declare
*
ERROR at line 1:
ORA-01489: result of string concatenation is too long
ORA-06512: at line 11


SQL> -- 1.3
SQL> declare
  2    l_stmt clob;
  3    cl clob;
  4    N pls_integer := (100 + 255) + 3902; -- =4257
  5  begin
  6    l_stmt := 'select (to_clob(''x'')';
  7    for i in 2 .. N loop
  8      l_stmt := l_stmt || '||''x'''||case i when 100 then ')' end; -- enclosed first 100 into brackets
  9    end loop;
 10    l_stmt := l_stmt || 'from dual';
 11    execute immediate l_stmt into cl;
 12    dbms_output.put_line(length(cl));
 13  end;
 14  /

PL/SQL procedure successfully completed.

SQL> -- 1.4
SQL> declare
  2    l_stmt clob;
  3    cl clob;
  4    N pls_integer := (100 + 255) + 4000;
  5  begin
  6    l_stmt := 'select (to_clob(''x'')';
  7    for i in 2 .. N loop
  8      l_stmt := l_stmt || '||''x'''||case i when 100 then ')' end; -- enclosed first 100 into brackets
  9    end loop;
 10    l_stmt := l_stmt || 'from dual';
 11    execute immediate l_stmt into cl;
 12    dbms_output.put_line(length(cl));
 13  end;
 14  /

PL/SQL procedure successfully completed.

SQL> -- 1.5 :
SQL> declare
  2    l_stmt clob;
  3    cl clob;
  4    N pls_integer := (100 + 255) + (4000 + 1);
  5  begin
  6    l_stmt := 'select (to_clob(''x'')';
  7    for i in 2 .. N loop
  8      l_stmt := l_stmt || '||''x'''||case i when 100 then ')' end; -- enclosed first 100 into brackets
  9    end loop;
 10    l_stmt := l_stmt || 'from dual';
 11    execute immediate l_stmt into cl;
 12    dbms_output.put_line(length(cl));
 13  end;
 14  /
declare
*
ERROR at line 1:
ORA-01489: result of string concatenation is too long
ORA-06512: at line 11

Comment by Valentin Nikotin — March 27, 2013 @ 8:57 pm GMT Mar 27,2013 | Reply

Jonathan,

In case we are using LOBs but without concatenation, can we jump to a conclusion that we have better to put a CLOB at the end of our tables(for insert/update) or, as always in Oracle, “It depends”

Thanks

Comment by Mohamed — July 10, 2009 @ 1:21 pm BST Jul 10,2009 | Reply
- Mohamed,
  I don’t think there is any general conclusion you can make about position in the table based on this note.
  
  Depending on the size of your LOBs, and whether you enable storage in rows, you might decide that some LOBs ought to be at the end of the row – but there are several arguments you could make for or against.
  
  Comment by Jonathan Lewis — July 10, 2009 @ 2:12 pm BST Jul 10,2009 | Reply
[…] Jonathan Lewis – Concatenating LOBs […]

Pingback by Blogroll Report 03/07/2009 – 10/07/2006 « Coskan’s Approach to Oracle — July 10, 2009 @ 6:42 pm BST Jul 10,2009 | Reply
Hmm. Forcing a “bushy” expression eval when Oracle wants a “left deep” expression eval. I like it. :)

Comment by Jason Bucata — July 16, 2009 @ 6:59 pm BST Jul 16,2009 | Reply
[…] Jonathan Lewis has already given a good explanation here. https://jonathanlewis.wordpress.com/2009/07/09/concatenating-lobs/ […]

Pingback by Reducing CLOB concatenation « Dion Cho – Oracle Performance Storyteller — August 22, 2009 @ 12:25 am BST Aug 22,2009 | Reply
Hi, Jonathan
Great post!
I have a question for you:
what do you mean by “add a piece on the end of the LOB”?

Do you mean inserting a new data into the lob column(s)? or updating the existing data in the lob column(s)?

Comment by Gavin Liao — November 30, 2009 @ 8:27 am GMT Nov 30,2009 | Reply
- Gavin,
  Updating the existing data in the LOB column – although, since we are talking about a temporary LOB at this point, it’s not a “column” as such.
  
  Comment by Jonathan Lewis — November 30, 2009 @ 9:52 am GMT Nov 30,2009 | Reply
  - thanks for your clarification.
    
    Comment by Gavin Liao — December 1, 2009 @ 7:00 am GMT Dec 1,2009 | Reply

Hi Jonathan,
I have a question about inserting clob. (Sorry maybe not related to this thread excatly).
I’m wondering why the redo size is more than twice of the inserting data:
Please take a look at this testing:

  SQL&gt; desc lizh_clob
 Name                                      Null?    Type
 ----------------------------------------- -------- ----------------------------
 A                                                  CLOB

SQL&gt; col redosizeb new_value redosizeb
SQL&gt;  SELECT value redosizeb
  2     FROM v$mystat
  3    WHERE statistic# = (  SELECT statistic# FROM v$statname WHERE name='redo size'   );

 REDOSIZEB
----------
     16108

SQL&gt; insert into lizh_clob values(rpad('a',1000,'a'));

1 row created.

SQL&gt;  SELECT value - &amp;redosizeb
  2     FROM v$mystat
  3    WHERE statistic# =  (  SELECT statistic# FROM v$statname WHERE name='redo size' );
old   1:  SELECT value - &amp;redosizeb
new   1:  SELECT value -      16108

VALUE-16108
-----------
       2452

Could you please give me any clue?

Thanks,
John

Comment by Lizhuohua(John) — January 28, 2013 @ 1:33 am GMT Jan 28,2013 | Reply

My test env is 10GR2.
I tested it on 11g, Redo size is much smaller, most likely it is not logged. But I do set it logging.

SQL&gt; SELECT LOGGING FROM DBA_LOBS WHERE TABLE_NAME='LIZH_CLOB';

LOGGING
-------
YES

SQL&gt;  SELECT value redosizeb
  2     FROM v$mystat
  3    WHERE statistic# = (  SELECT statistic# FROM v$statname WHERE name='redo size'   );

 REDOSIZEB
----------
    787760

SQL&gt;
SQL&gt; declare
  2  lc_clob clob;
  3  begin
  4   lc_clob := rpad('a',10000,'a');
  5  for i in 1..100 loop
  6    insert into lizh_clob values(lc_clob);
  7  end loop;
  8  commit;
  9  end;
 10  /

PL/SQL procedure successfully completed.

SQL&gt;
SQL&gt;  SELECT value - &amp;redosizeb
  2     FROM v$mystat
  3    WHERE statistic# =  (  SELECT statistic# FROM v$statname WHERE name='redo size' );
old   1:  SELECT value - &amp;redosizeb
new   1:  SELECT value -     787760

VALUE-787760
------------
       83308

Comment by Lizhuohua(John) — January 28, 2013 @ 5:09 am GMT Jan 28,2013 | Reply

John,

Again you haven’t shown how you defined the CLOB. The obvious anomaly here is that the volume of redo (83KB) is less than the total volume of data you’ve inserted (1MB). Since your CLOB is set to logging I’d have to guess that you’ve enabled some securefiles deduplication or compression feature.

Comment by Jonathan Lewis — January 28, 2013 @ 5:55 pm GMT Jan 28,2013 | Reply

Hi Jonathan,
I created the clob by default. Even I set database force logging, I get the same output:

SQL&gt; select FORCE_LOGGING from v$database;

FOR
---
YES

SQL&gt;
SQL&gt; select table_name, DEDUPLICATION,COMPRESSION, LOGGING,IN_ROW,CACHE  from user_lobs;

TABLE_NAME                     DEDUPLICATION   COMPRE LOGGING IN_ CACHE
------------------------------ --------------- ------ ------- --- ----------
LIZH_CLOB                      NONE            NONE   YES     YES NO

SQL&gt;
SQL&gt; col redosizeb new_value redosizeb
SQL&gt;  SELECT value redosizeb
  2     FROM v$mystat
  3    WHERE statistic# = (  SELECT statistic# FROM v$statname WHERE name='redo size'   );

 REDOSIZEB
----------
     80980

SQL&gt;
SQL&gt; declare
  2  lc_clob clob;
  3  begin
  4   lc_clob := rpad('a',10000,'a');
  5  for i in 1..100 loop
  6    insert into lizh_clob values(lc_clob);
  7  end loop;
  8  commit;
  9  end;
 10  /

PL/SQL procedure successfully completed.

SQL&gt;
SQL&gt;  SELECT value - &amp;redosizeb
  2     FROM v$mystat
  3    WHERE statistic# =  (  SELECT statistic# FROM v$statname WHERE name='redo size' );
old   1:  SELECT value - &amp;redosizeb
new   1:  SELECT value -      80980

VALUE-80980
-----------
      75712

Acctually , I’m evaluating Clob and Long, and find Long is faster than Clob , not sure if there is any way to improve the performance of CLOB.

Thanks,
John

Comment by Lizhuohua(John) — January 29, 2013 @ 12:37 am GMT Jan 29,2013

After enabling archivelog, the redo size is 1.7M for the same testing. (data size : 10000, loop count : 100)

Comment by Lizhuohua(John) — January 29, 2013 @ 5:31 am GMT Jan 29,2013

John,

You haven’t told us how you’ve defined the LOB, but I’d guess that it’s defined as “disable storage in row”, cached, and logged.

The LOB data would be 1,000 bytes plus a little overhead. Then you have the LOB index entry, and the table entry. The LOB index and table entries would also generate some undo. As a starting point I’d expect about 250 – 300 bytes of redo for each of the three entries over and above their raw data – so 1,750 to 1,900 bytes. That’s still not enough to cover your 2,452 – but possibly this is the first insertion into the table, so there’s some space management stuff going on as well.

Comment by Jonathan Lewis — January 28, 2013 @ 5:50 pm GMT Jan 28,2013 | Reply

Hi Jonathan,
Thanks for your excellent explanation, I didn’t know there is an index for lob.
For 10g , I created the table by default like this :

   DROP TABLE lizh_clob;
   create table lizh_clob(a clob);
  (not the first insertion)
   SQL&gt; select LOGGING,IN_ROW,CACHE from user_lobs where table_name='LIZH_CLOB';

LOGGING IN_ CACHE                                                               
------- --- ----------                                                          
YES     YES NO                                                                  

SQL&gt; col redosizeb new_value redosizeb
SQL&gt;  SELECT value redosizeb
  2     FROM v$mystat
  3    WHERE statistic# = (  SELECT statistic# FROM v$statname WHERE name='redo size'   );

 REDOSIZEB                                                                      
----------                                                                      
  25173192                                                                      

SQL&gt; 
SQL&gt; declare
  2  lc_clob clob;
  3  begin
  4   lc_clob := rpad('a',10000,'a');
  5  for i in 1..100 loop
  6    insert into lizh_clob values(lc_clob);
  7  end loop;
  8  commit;
  9  end;
 10  /

PL/SQL procedure successfully completed.

SQL&gt; 
SQL&gt;  SELECT value - &amp;redosizeb
  2     FROM v$mystat
  3    WHERE statistic# =  (  SELECT statistic# FROM v$statname WHERE name='redo size' );
old   1:  SELECT value - &amp;redosizeb
new   1:  SELECT value -   25173192

VALUE-25173192                                                                  
--------------                                                                  
       2560392

Comment by Lizhuohua(John) — January 29, 2013 @ 12:34 am GMT Jan 29,2013 | Reply

[…] issue from my own coding experience, but also from threads on Oracle Technical Network and a blog post by Jonathan […]

Pingback by CLOB Concatenation | Paulzip's Oracle Blog — August 6, 2017 @ 3:49 pm BST Aug 6,2017 | Reply

RSS feed for comments on this post. TrackBack URI

Comments and related questions are welcome. Cancel reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Oracle Scratchpad

July 9, 2009