I was setting up a few tests on a copy of 12.1.0.2 recently when I made a mistake creating the table – I forgot to put in a couple of CAST() calls in the select list, so I just patched things up with a couple of “modify column” commands. Since I was planning to smash the table in all sorts of ways and it had taken me several minutes to create the data set (10 million rows) I decided to create a clean copy of the data so that I could just drop the original table and copy back the clean version – and after I’d done this I noticed something a little odd.
Here’s the code (cut down to just 10,000 rows), with a little output:
-- -- Script: 12c_vc32767_b.sql -- create table t1 ( id not null, date_open, date_closed, deal_type, client_ref, small_vc, padding ) nologging as select rownum id, trunc( add_months(sysdate, -120) + (rownum)* 3652 / 1e7 ) date_open, trunc( add_months( add_months(sysdate, -120) + (rownum) * 3652 / 1e7, 12 * trunc(dbms_random.value(1,6)) ) ) date_closed, dbms_random.string('U',1) deal_type, dbms_random.string('U',4) client_ref, lpad(rownum,10) vc_small, rpad('x',100,'x') vc_padding from dual connect by level <= 10000 -- > comment to avoid WordPress format issue ; alter table t1 modify deal_type varchar2(1); alter table t1 modify client_ref varchar2(4); create table t2 nologging as select * from t1; begin dbms_stats.gather_table_stats( user, 't1', method_opt=>'for all columns size 1' ); dbms_stats.gather_table_stats( user, 't2', method_opt=>'for all columns size 1' ); end; / select table_name, num_rows, blocks, avg_row_len from user_tables; TABLE_NAME NUM_ROWS BLOCKS AVG_ROW_LEN -------------------- ---------- ---------- ----------- T1 10000 304 139 T2 10000 218 139
There’s no hidden code – this is the code I ran, and the tables both went into the same tablespace- yet one table has used about 30% more blocks than the other on a simple “create as select”, even though the row lengths and row counts are the same!
When I first did this test it just so happened that the only place in the database I had to make the t2 copy was in a tablespace using freelist management, and I’d created the original table in a tablespace using ASSM, so I wasted a little time examining the ASSM space management (bitmap) blocks before I realised what had happened.
Suggestions about what has happened, and how to confirm your hypothesis, invited in the comments.
Update (following morning)
The explanation is in the comments: I had enabled extended (32,767 byte) varchar2(). My calls to dbms_random.string() had returned character values, and the return type defaults to the maximum character length, which meant they were declared as varchar2(32767) and those are implemented as (unhackable) CLOBs in 12.1.0.2.
When I modified the table to change the varchar2(32767) to shorter lengths Oracle let me get away with it because none of the actual stored values at that time was longer than my target declaration – but it didn’t physically change the stored column values to “short” varchar2(), it left them as CLOBs, and a “short” CLOB includes a LOB Locator which starts out at about 30 bytes.
When I created t2 from t1 Oracle first copied the (short) column definitions, and then copied the data, reading and converting the CLOBs to normal varchar2() storage, and that explains why t2 needed fewer blocks. The avg_row_len for the two tables matched because the code to gather stats simply applies the sys_op_opnsize() function to the varchar2() defined in t1, and doesn’t give you any indication about the LOB locator.
(If you use the extremely deprecated analyze command to gather stats on the two tables you’ll find that the avg_row_len of table t1 then allows for the LOB locator size.)
“Workaround:”
I should have used: cast(dbms_random.string(‘U’,4) as varchar2(4)) and the column definition would have been a proper varchar2(4) from the start. (As noted in a comment from Ivica Arsov below, substr(…, 1, 4) would also have worked. Of course I should have used cast() for the numerics as well so that I could declare them as things like number(8,0) rather than the slightly dangerous “number”.
Note: I’ve put “workaround” in quotes because it’s not really a workaround to a problem – it’s just an example of doing it right.
Footnote:
Here’s a symbolic dump of a single character (‘T’) being stored as a varchar2(32767):
LOB Locator: Length: 84(31) Version: 1 Byte Length: 1 LobID: 00.00.00.01.00.00.01.6b.ba.d7 Flags[ 0x01 0x0c 0x00 0x80 ]: Type: BLOB Storage: SecureFile Characterset Format: IMPLICIT Partitioned Table: No Options: ReadWrite SecureFile Header: Length: 11 Old Flag: 0x48 [ DataInRow SecureFile ] Flag 0: 0x90 [ INODE Valid ] Layers: Lengths Array: INODE:5 INODE: 00 00 01 01 54
Note the 84(31) at line 3: this is a reminder of how big a LOB locator could get for an out of line LOB, compared to the current size of the LOB locator. Rows that hold out of line LOBs can actually be much longer than avg_row_len tells you.
Addendum
Another side effect of enabling extended varchar2() types is the surprise you get when you use error logging in SQL (i.e. syntax like: “insert into t1 select … log errors”). When you create the “clone” table for the log it changes every column from the original into a varchar2(32767) e.g.:
desc target Name Null? Type ------------------------------------------------------------------------ -------- ------------------------------------------------- ID NUMBER N1 NUMBER N2 NUMBER SMALL_VC VARCHAR2(10) PADDING VARCHAR2(100) begin dbms_errlog.create_error_log('TARGET'); end; / desc err$_TARGET Name Null? Type ------------------------------------------------------------------------ -------- ------------------------------------------------- ORA_ERR_NUMBER$ NUMBER ORA_ERR_MESG$ VARCHAR2(2000) ORA_ERR_ROWID$ ROWID ORA_ERR_OPTYP$ VARCHAR2(2) ORA_ERR_TAG$ VARCHAR2(2000) ID VARCHAR2(32767) N1 VARCHAR2(32767) N2 VARCHAR2(32767) SMALL_VC VARCHAR2(32767) PADDING VARCHAR2(32767)
You probably hope that you’re not going to see many rows inserted into the error log table, so the extra space taken up in the table by error rows probably won’t be a problem – but the fact that you get two segments (the LOB and the LOBINDEX) created for every column in the original table might cause a problem – even if they are completely empty.