Oracle Scratchpad

May 2, 2019

LOB length

Filed under: Infrastructure,LOBs,Oracle,Upgrades — Jonathan Lewis @ 11:32 am BST May 2,2019

This note is a reminder combined with a warning about unexpected changes as you move from version to version. Since it involves LOBs (large objects) it may not be relevant for most people but since there’s a significant change in the default character set for the database as you move up to 18.3 (or maybe even as you move to 12.2) anyone using character LOBs may get a surprise.

Here’s a simple script that I’m first going to run on an instance of 11.2.0.4:

rem
rem     Script:         lob_length_4.sql
rem     Author:         Jonathan Lewis
rem     Dated:          Apr 2019
rem

create table t1(
        n1      number,
        version varchar2(8),
        v1      varchar2(4000),
        c1      clob,
        nc1     nclob
)
lob (c1) store as securefile c1_lob(
        enable storage in row
        chunk 8K
        cache logging
)
lob (nc1) store as securefile nc1_lob(
        enable storage in row
        chunk 8K
        cache logging
)
;

insert into t1 values( 0,'General',  null,               empty_clob(),       empty_clob()      ) ;

insert into t1 values( 1,'11.2.0.4', rpad('x',1951,'x'), rpad('x',1951,'x'), rpad('x',1951,'x')) ;
insert into t1 values( 2,'11.2.0.4', rpad('x',1952,'x'), rpad('x',1952,'x'), rpad('x',1952,'x')) ;

insert into t1 values( 3,'12.1.0.2', rpad('x',1951,'x'), rpad('x',1951,'x'), rpad('x',1951,'x')) ;
insert into t1 values( 4,'12.1.0.2', rpad('x',1952,'x'), rpad('x',1952,'x'), rpad('x',1952,'x')) ;

insert into t1 values( 5,'12.2.0.1', rpad('x',1937,'x'), rpad('x',1937,'x'), rpad('x',1937,'x')) ;
insert into t1 values( 6,'12.2.0.1', rpad('x',1938,'x'), rpad('x',1938,'x'), rpad('x',1938,'x')) ;

insert into t1 values( 7,'18.3.0.0', rpad('x',1984,'x'), rpad('x',1984,'x'), rpad('x',1984,'x')) ;
insert into t1 values( 8,'18.3.0.0', rpad('x',1985,'x'), rpad('x',1985,'x'), rpad('x',1985,'x')) ;

begin
        dbms_stats.gather_table_stats(
                ownname     => null,
                tabname     => 'T1',
                method_opt  => 'for all columns size 1'
        );
end;
/

break on version skip 1 on report

compute avg of sys_op_opnsize(c1)  on report
compute avg of sys_op_opnsize(nc1) on report

select
        version, n1,
        length(v1), length(c1), length(nc1),
        sys_op_opnsize(v1), sys_op_opnsize(c1), sys_op_opnsize(nc1)
from
        t1
order by
        n1
;

select  avg_row_len
from    user_tables
where   table_name = 'T1'
;

select  column_name, avg_col_len
from    user_tab_cols
where   table_name = 'T1'
order by
        column_id
;

I’ve created a table holding a varchar2() a CLOB, and an NCLOB, then I’ve inserted some rows into that table, labelling the rows in pairs with what appear to be Oracle version numbers, with one extra row labelled “General” that holds the special “empty LOB value (note a NULL and an empty clob behave very differently). Then I’ve reported the lengths of the LOB columns in two different ways, once using the standard length() function (I could have used the dbms_lob.getlength() function) and once using the internal sys_op_opnsize() function that Oracle uses in its queries to gather table stats.

Here’s the output from the script. Since this first run is on 11.2.0.4 I want you to note, particularly, the numbers in the rows labelled 11.2.0.4

VERSION          N1 LENGTH(V1) LENGTH(C1) LENGTH(NC1) SYS_OP_OPNSIZE(V1) SYS_OP_OPNSIZE(C1) SYS_OP_OPNSIZE(NC1)
-------- ---------- ---------- ---------- ----------- ------------------ ------------------ -------------------
General           0                     0           0                                    96                  96

11.2.0.4          1       1951       1951        1951               1951               2048                3999
                  2       1952       1952        1952               1952               2049                  86

12.1.0.2          3       1951       1951        1951               1951               2048                3999
                  4       1952       1952        1952               1952               2049                  86

12.2.0.1          5       1937       1937        1937               1937               2034                3971
                  6       1938       1938        1938               1938               2035                3973

18.3.0.0          7       1984       1984        1984               1984               2081                  86
                  8       1985       1985        1985               1985               2082                  86

********                                                                 ------------------ -------------------
avg                                                                              1835.77778          1820.22222


AVG_ROW_LEN
-----------
       5410

COLUMN_NAME          AVG_COL_LEN
-------------------- -----------
N1                             3
VERSION                        9
V1                          1740
C1                          1837
NC1                         1822

The reminders:

  • A LOB that uses a multi-byte character set (and possibly, but I have not checked, an NCLOB that is using a single-byte character set) stores its content using a two-byte fixed width character set. So when I insert a character string of 1,951 (multibyte) characters the internal representation uses double the number of bytes as there are characters – so the data content will be 3,902 bytes.
  • The length() and dbms_lob.getlength() functions report the number of characters not the number of bytes. (There is a lengthb() function to report the number of bytes, but it doesn’t work with multibyte character set LOBs, raising error: “ORA-22998: CLOB or NCLOB in multibyte character set not supported”). Even though our NCLOB needs 3,902 bytes for its content it is reported – just like the CLOB – with a length of 1,951.
  • The sys_op_opnsize() internal function tells you about the number of bytes a column takes up in the row – though not always totally accurately, as we shall soon see – so it will tell you about the 3,902 bytes stored for the data plus the “structural” data that helps to describe the LOB. So for the row holding strings of length 1,951 we can see that the CLOB seems to take 1,951 + 97 = 2,048 bytes while the NCLOB seems to take 2 * 1951 + 97 = 3,999 bytes.
  • When the total storage for a LOB column exceeds 4,000 bytes the LOB is stored “out of line” even when it has been declared as mine have with “enable storage in row”. So when we insert strings of length 1,952 the CLOB column stays in line and reports a size of 1,952 + 97 = 2.049 bytes while the NCLOB exceeds the limit (2 + 1,952 + 97 = 4001) and goes out of row, reporting an “in-row” size of 86 bytes which is (we assume) the LOB metadata.

You might note, by the way, that the avg_col_len is the average (which I’ve reported for the CLOB and NCLOB columns) of the sys_op_opnsize() values rounded up plus 1 (for the “column count” byte); and – because the number of columns in the table is small – the avg_row_len is very similar to the sum of the avg_col_len.

You should also note that the storage size of an “empty” LOB is (or seems to be) 96 bytes in this version of Oracle. That’s quite a lot of space to say that there’s no data  – but using an empty_[n]clob() to represent “nothing” does mean that you don’t have to code for “is null or is empty” and it may help you to avoid incorrect results as a consequence. In fact a block dump shows that the actual space usage for the empty_[n]clob() is only 30 bytes – so it’s not quite as bad as it seems. The error is probably based around the older code using the maximum possible length of a lob locator as an assumption rather than checking the actual size in the table.

The warning

It’s a minor variation of the standard warning: “odd little things change when you upgrade and some of them may have a big side effect”. Here are the results from 18.3 (which, amongst other details, defaults to a multi-byte character set – which I’ll check before I show you the LOB results).

SQL> select * from v$nls_parameters where parameter like '%CHARACTERSET%';

PARAMETER                        VALUE
-------------------------------- --------------------
NLS_CHARACTERSET                 AL32UTF8
NLS_NCHAR_CHARACTERSET           AL16UTF16

This affects the storage of CLOBs if you’ve previously been using a default single-byte character set – you’re suddenly going to find your LOB segments are twice as big as they used to be – even though any report of “length()” will be unchanged. But there are other changes. Here’s the output from the same script running on 18.3.0.0:



VERSION          N1 LENGTH(V1) LENGTH(C1) LENGTH(NC1) SYS_OP_OPNSIZE(V1) SYS_OP_OPNSIZE(C1) SYS_OP_OPNSIZE(NC1)
-------- ---------- ---------- ---------- ----------- ------------------ ------------------ -------------------
General           0                     0           0                                    30                  30

11.2.0.4          1       1951       1951        1951               1951               3933                3933
                  2       1952       1952        1952               1952               3935                3935

12.1.0.2          3       1951       1951        1951               1951               3933                3933
                  4       1952       1952        1952               1952               3935                3935

12.2.0.1          5       1937       1937        1937               1937               3905                3905
                  6       1938       1938        1938               1938               3907                3907

18.3.0.0          7       1984       1984        1984               1984               3999                3999
                  8       1985       1985        1985               1985                132                 132
********                                                                 ------------------ -------------------
avg                                                                              3078.77778          3078.77778

AVG_ROW_LEN
-----------
       7912

COLUMN_NAME          AVG_COL_LEN
-------------------- -----------
N1                             3
VERSION                        9
V1                          1740
C1                          3080
NC1                         3080

Looking at the details for “General” and “18.3.0.0” what can we see that’s changed?

First, the length of an empty_[n]clob is now reported correctly at 30 bytes. This might (if the value is also used internally) explain why a LOB can now be 33 characters (66 bytes) longer before it’s moved out of line – it’s a small difference, but you might be at a boundary condition where it makes a big difference (for good, or for bad – who can tell) or your system.

Secondly, the LOB metadata for an out of line LOB seems to have jumped from 86 bytes to 132 bytes. Like the empty_[n]clob() issue, this is an error. The actual space usage in row was 38 bytes – consisting of the basic 30 bytes “empty” metadata, 4 extra bytes overhead, and a 4-byte block address linking to the stored LOB value. (For larger LOBs you will probably see that the “tail” of the in-row data grows to accomodate a short list of block addresses and chunk information).

Finally, you’re allowed to guess that the rows labelled 12.2.0.1 are there to give you a hint that in that version the NCLOB moves out of line at 1,938 characters – but you’ll probably want to run the test on 12.2 to confirm that claim. This does mean, though, that an upgrade to 12.2 from a lower version might be a bigger threat than any upgrade to 18.3. You’ll also find that in 12.2 the empty_[n]clob() is reported with a length of 124. (Again, this may explain the change in the break-point for going out of line: 124 – 96 = 28, and (1952 – 28/2) = 1,938. The arithmetic is consistent with the incorrect reporting of the size of the empty LOB metadata.

So the bottom line is this – if you do a lot of work with LOBs, do some careful checking of how much space you are using, how much space you will use on the next install, how changes in character sets can affect numbers, how the statistics might change – even when collected in exactly the same way – and what this might do to execution plans.

Footnote

This testing was all done using securefile LOBs – which has been the default for some time if you don’t specify the type of LOB you want to use. The numbers, particularly the break from in-row to out-of-row, are a little different if you switch to basicfile LOBs.

A quick test on LiveSQL (19.2) suggests that nothing has changed from 18.3 to 19.3

For further reading on quirky details of LOBs – here’s the article that has a footnote reporting another interesting change in LOB handling in 18.3; and a very old note about analysing the contents of a LOB segment with a view to assessing the impact of deleting old LOBs and potentially moving the LOB segment to a tablespace of a different size.

 

 

2 Comments »

  1. One area where I hit exactly this problem was setting up a rollback process for a database upgrade from 11.1. to 18.6. The process involved expdp the old 11.1 database, impdp into the new database. The rollback used expdp with the version=11.1 clause and impdp. One table had a couple of XMLTYPE fields and the IMPDP failed with these errors:

    KUP-11007: conversion error loading table "OWNER"."TABLE_NAME"
    ORA-26093: input data column size (2874) exceeds the maximum input size (2000)
    KUP-11009: data for row: XMLFIELD : 0X'003C003F0078006D006C002000760065007200730069006F00'
    

    This was down to the character set changing from the old database UTF8/AL16UTF16 and the new at AL32UTF8/AL16UTF16. This explanation matches my findings.

    Thanks Jonathan!

    Comment by John Chamberlain — May 14, 2019 @ 2:08 pm BST May 14,2019 | Reply


RSS feed for comments on this post. TrackBack URI

Comments and related questions are welcome.

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Powered by WordPress.com.