For those who don’t read Oracle-l and haven’t found Nikolay Savvinov’s blog, here’s a little note pulling together a recent question on Oracle-L and a relevant (and probably unexpected) observation from the blog. The question (paraphrased) was:
The developers/data modelers are creating all the tables with varchar2(4000) as standard by default “Just in case we need it”. What do you think of this idea?
The general answer was that it’s a bad idea (and unnecessary, anyway) and one specific threat that got mentioned was the problem of creating indexes and Oracle error ORA-01450; but coincidentally Nikolay Savvinov had written about a performance-related “bug” in August this year, which turned out, only last week, to be expected behaviour. You can read his articles for the details, but since he used a 4KB block size to demonstrate it I thought I’d repeat the exercise using an 8KB block size.
rem Script: column_length_threat.sql drop table t1 purge; create table t1 (id number(6), v1 varchar(40), v2 varchar2(40), v3 varchar2(40)); create unique index t1_i1 on t1(id); execute snap_redo.start_snap insert into t1 select object_id, object_name, object_name, object_name from all_objects where rownum <= 10000 ; execute snap_redo.end_snap drop table t1 purge; create table t1 (id number(6), v1 varchar(4000), v2 varchar2(4000), v3 varchar2(4000)); create unique index t1_i1 on t1(id); execute snap_redo.start_snap insert into t1 select object_id, object_name, object_name, object_name from all_objects where rownum <= 10000 ; execute snap_redo.end_snap
I’ve dropped and created the same table twice, once with varchar2(40) columns and once with varchar2(4000) columns.
I’ve created an index on a (non-character) column – the specific results vary depending on whether the index is unique or non-unique, and whether or not you have the index, and whether or not the table already holds data, and the effective clustering on the index columns etc. etc. but the key difference between the two sets of results doesn’t go away.
I’ve inserted object_name values (maximum usage 32 bytes) into the varchar2() columns, inserting 10,000 rows.
The snap_redo package is one of my simple pre/post packages that calculates changes in values in some dynamic performance view – in this case it’s looking at v$sysstat (system statistics) for statistics relating to redo generation, which means you need to run this test on an otherwise idle instance. Here are the two sets of results from an instance of 188.8.131.52:
Name Value ---- ----- messages sent 11 messages received 11 calls to kcmgcs 313 calls to kcmgas 37 calls to get snapshot scn: kcmgss 74 redo entries 769 redo size 1,317,008 redo wastage 3,888 redo writes 11 redo blocks written 2,664 redo write time 10 redo blocks checksummed by FG (exclusive) 2,242 redo ordering marks 1 redo subscn max counts 1 redo synch time 7 redo synch time (usec) 88,875 redo synch time overhead (usec) 1,810 redo synch time overhead count (<2 msec) 11 redo synch writes 11 redo write info find 11 undo change vector size 261,136 rollback changes - undo records applied 2 IMU undo allocation size 17,184 Name Value ---- ----- messages sent 8 messages received 8 calls to kcmgcs 222 calls to kcmgas 56 calls to get snapshot scn: kcmgss 52 redo entries 20,409 redo size 5,606,872 redo buffer allocation retries 1 redo wastage 1,248 redo writes 6 redo blocks written 11,324 redo write time 26 redo blocks checksummed by FG (exclusive) 571 redo ordering marks 32 redo subscn max counts 1 redo synch time 6 redo synch time (usec) 60,230 redo synch time overhead (usec) 159 redo synch time overhead count (<2 msec) 1 redo synch writes 1 redo write info find 1 undo change vector size 1,590,520 IMU undo allocation size 144
Notice, particularly, the great change in the number of redo entries and the total redo size when the character columns are defined at varchar2(4000). Note particularly that the number of redo entries is roughly “2 * number of rows inserted”; for each row that’s one for the row itself and one for the index entry. You can check the redo log content by dump the log file, of course (and Nikolay did), or you can take my word for it that Oracle is doing the equivalent of single row processing in the varchar2(4000) case and array processing in the varchar2(40) case.
When Oracle calculates that the row length definition (not data) is larger than the block size it falls back to single row processing; this can increase your redo generation significantly, and since the rate at which you can pump out redo is the ultimate rate at which you can load data this could have a significant impact on your data loading times. Declaring character columns as varchar2(4000) “just in case” is a bad idea.