Oracle Scratchpad

November 30, 2020

In-row LOBs

Filed under: Infrastructure,LOBs,Oracle — Jonathan Lewis @ 11:27 am GMT Nov 30,2020

If you’re like me there are probably lots of little things that you know but find terribly easy to forget. Here’s one of my bêtes noires – starting with a little quiz:

rem
rem     Script:         lob_in_row.sql
rem     Author:         Jonathan Lewis
rem     Dated:          Oct 2020
rem     Purpose:
rem
rem     Last tested
rem             19.3.0.0
rem             12.2.0.1
rem             11.2.0.4
rem

create table t1 (
        v1      varchar2(20),
        b1      clob,
        s1      clob
)
lob(b1) store as basicfile  b_lob,
lob(s1) store as securefile s_lob
;

insert into t1 values(
        rpad('x',20,'x'),
        rpad('x',20,'x'),
        rpad('x',20,'x')
)
/

commit;

execute dbms_stats.gather_table_stats(user,'t1');

select  column_name, avg_col_len
from    user_tab_columns
where   table_name = 'T1'
/

select  avg_row_len
from    user_tables
where   table_name = 'T1'
/

column rel_file_no new_value m_file_no
column block_no    new_value m_block_no

select
        dbms_rowid.rowid_relative_fno(rowid)    rel_file_no,
        dbms_rowid.rowid_block_number(rowid)    block_no
from
        t1
;

alter system flush buffer_cache;
alter system dump datafile &m_file_no block &m_block_no;

I’ve created a table with a varchar2() column, a basicfile CLOB, and a securefile CLOB, and inserted the same value into all three. By default the CLOB columns will be defined as ‘enable storage in row’ and since the inserted value is very short it will be stored “in row” for both these CLOBs. The question is:

Which column takes the most space in the data block?

To answer this question we need only gather stats and query user_tab_columns and/or user_tables – except the results may be a little surprising, so my script also uses the dbms_rowid package to find the file and block number where the row has been inserted, flushes the buffer cache to make sure that we don’t get confused by older versions of the block (you probably shouldn’t run this script on a big busy system) then dumps the block into the session trace file.

You’ll see why it’s necessary to look at the block dumps when I show you the results from gathering stats. The three sets of results below come from 11.2.0.4, 12.2.0.1, and 19.3.0.0 in that order. A key feature the databases have in common is that they were all created with a multi-byte character set:


11.2.0.4
========
COLUMN_NAME          AVG_COL_LEN
-------------------- -----------
V1                            21
B1                           123
S1                           117 

AVG_ROW_LEN
-----------
        261

12.2.0.1
========
COLUMN_NAME          AVG_COL_LEN
-------------------- -----------
V1                            21
B1                           171
S1                           165

AVG_ROW_LEN
-----------
        357

19.3.0.0
========
COLUMN_NAME          AVG_COL_LEN
-------------------- -----------
V1                            21
B1                            77
S1                            71

AVG_ROW_LEN
-----------
        169

There are three things that the three versions agree on.

  • First the row length is the sum of the column lengths.
  • Secondly the average length of the varchar2() column is 21 – remember that the avg_col_len includes the length byte(s) when we collect stats with the dbms_stats package.
  • Finally that the difference between the basicfile and securefile LOBs is 6.

But are the actual values reported for the LOB Lengths in any way truthful, or are we seeing reporting errors (that vary with version).

That’s why we need to look at the block dump, and this is what we find as the interesting bit of the dump – which is basically the same in all three cases:


block_row_dump:
tab 0, row 0, @0x1eec
tl: 172 fb: --H-FL-- lb: 0x1  cc: 3

col  0: [20]  78 78 78 78 78 78 78 78 78 78 78 78 78 78 78 78 78 78 78 78

col  1: [76]
 00 54 00 01 02 0c 80 00 00 02 00 00 00 01 00 02 75 f6 ff fb 00 38 09 00 00
 00 00 00 00 28 00 00 00 00 00 01 00 78 00 78 00 78 00 78 00 78 00 78 00 78
 00 78 00 78 00 78 00 78 00 78 00 78 00 78 00 78 00 78 00 78 00 78 00 78 00
 78

col  2: [70]
 00 54 00 01 02 0c 80 80 00 02 00 00 00 01 00 02 75 f6 ff fc 00 32 48 90 00
 2c 00 00 28 01 00 78 00 78 00 78 00 78 00 78 00 78 00 78 00 78 00 78 00 78
 00 78 00 78 00 78 00 78 00 78 00 78 00 78 00 78 00 78 00 78

end_of_block_dump

When Oracle 19c reported the avg_col_len (and maybe the same would be true of 18c) it was correctly reporting the space taken up in the row by the entries for the two LOBs (and their overheads) that  I had inserted.  The earlier versions of Oracle are doing something a little odd.

A key thing you’ll notice, of course, is that the varchar2() column is storing my input string as one byte per character, but because I used a multibyte characterset for my database characterset in all three cases Oracle has stored the LOB equivalents of the string using a two-byte fixed-width characterset that has effectively doubled the storage (and then has to include the “Lob Locator” overhead – hence the 20 character string turning into 70+ bytes).

This highlights two important issues. First that trying to work out how to “tune” your table storage based on the avg_col_len / avg_row_len is a lot harder to do if you’ve got LOB columns in the table, and your method of estimating storage will have to change to suit the version of Oracle. Secondly, and much more importantly, if you’re thinking of changing a varchar2() column (such as “description” or “comments” from a character data type to a CLOB to allow for a small number of cases where people want to supply longer bits of text then if you’re using a multibyte character set there are two awful consequences:

  • the storage requirement for the column will more than double
  • the LOB will go out of line when the in-row storage requirement exceeds 4,000 bytes – which means when the original string content exceeds something between 1,938 and 1,985 characters depending on your version of Oracle and whether you are using basicfile or securefile LOBs.  (And a “long” input string will exceed a single block in the LOB segment whent it goes over roughly 4,000 characters.)

So if you want to handle “long comments” in multibyte charactersets you might want to consider making your code more complex so that up to 4,000 bytes you store the data as a varchar2(), and only use a CLOB column when the data goes over that limit.

Footnote

If you’re wondering  what all the overheads are for the in-row LOB you’ll find that the 11g block dump gives you some extra details (though the extra details disappeared from my 12c and 19c block dumps).  Here’s the content that appeared immediately after the col1 and col2 data respectively in the 11g block dump.

LOB
Locator:
  Length:        84(76)
  Version:        1
  Byte Length:    2
  LobID: 00.00.00.01.00.02.75.f6.ff.fb
  Flags[ 0x02 0x0c 0x80 0x00 ]:
    Type: CLOB
    Storage: BasicFile
    Enable Storage in Row
    Characterset Format: IMPLICIT
    Partitioned Table: No
    Options: VaringWidthReadWrite
  Inode:
    Size:     56
    Flag:     0x09 [ Valid DataInRow ]
    Future:   0x00 (should be '0x00')
    Blocks:   0
    Bytes:    40
    Version:  00000.0000000001
    Inline data[40]
Dump of memory from 0x00007FB4244BED8D to 0x00007FB4244BEDB5
7FB4244BED80                            00780001              [..x.]
7FB4244BED90 00780078 00780078 00780078 00780078  [x.x.x.x.x.x.x.x.]
        Repeat 1 times
7FB4244BEDB0 00780078 54004678                    [x.x.xF.T]



LOB
Locator:
  Length:        84(70)
  Version:        1
  Byte Length:    2
  LobID: 00.00.00.01.00.02.75.f6.ff.fc
  Flags[ 0x02 0x0c 0x80 0x80 ]:
    Type: CLOB
    Storage: SecureFile
    Characterset Format: IMPLICIT
    Partitioned Table: No
    Options: VaringWidthReadWrite
  SecureFile Header:
    Length:   50
    Old Flag: 0x48 [ DataInRow SecureFile ]
    Flag 0:   0x90 [ INODE Valid ]
    Layers:
      Lengths Array: INODE:44
      INODE:
        00 00 28 01 00 78 00 78 00 78 00 78 00 78 00 78 00 78 00 78
        00 78 00 78 00 78 00 78 00 78 00 78 00 78 00 78 00 78 00 78
        00 78 00 78

Footnote 2

Oracle allows for an empty_clob() to populate a CLOB column with a zero length CLOB, and an empty_clob() is not the same as a null.

Many years ago, when I wrote “Practical Oracle 8i” I pointed out that if you were using a LOB column it would be better to use empty_[bc]lob() to represent an absence of data to avoid the need for checking the column in two different ways for “no data”.

There are two details to remember about this, though. First, empty_clob() take 36 bytes of storage for a basicfile and 30 bytes for a secure file (reported at 37/31 in 19c, 135/131 in 12cR2 and 103/97 in 11gR2). You may decide that’s expensive way of saying “nothing to see here”.

Secondly, while PL/SQL will allow you to compare a PL/SQL clob variable with empty_clob() you cannot use a simple “column = empty_clob()” predicate in SQL and will need something like:

where length(s1) = 0
where dbms_lob.getlength(s1) = 0
where dbms_lob.compare(s1, empty_clob) = 0

As a final closing comment about the oddity of avg_col_len and LOBs – my first test to check the lengths reported for an empty_clob() also inserted NULL into the v1 column. In this special case (all the columns in the row were empty_clob() or null) 11g reported the avg_col_len of s1 and b1 as zero!

Leave a Comment »

No comments yet.

RSS feed for comments on this post. TrackBack URI

Comments and related questions are welcome.

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Website Powered by WordPress.com.