Just in case

December 23, 2014

Just in case

Filed under: Infrastructure,Oracle,Performance,redo — Jonathan Lewis @ 10:37 am GMT Dec 23,2014

For those who don’t read Oracle-l and haven’t found Nikolay Savvinov’s blog, here’s a little note pulling together a recent question on Oracle-l and a relevant (and probably unexpected) observation from the blog. The question (paraphrased) was:

The developers/data modelers are creating all the tables with varchar2(4000) as standard by default “Just in case we need it”. What do you think of this idea?

The general answer was that it’s a bad idea (and unnecessary, anyway) and one specific threat that got mentioned was the problem of creating indexes and Oracle error ORA-01450. Coincidentally, though, Nikolay Savvinov had written about a performance-related “bug” in August this year which turned out, only last week, to be expected behaviour. You can read his articles for the details but since he used a 4KB block size to demonstrate it I thought I’d repeat the exercise using an 8KB block size.


rem
rem     Script:         column_length_threat.sql
rem     Author:         Jonathan Lewis
rem     Dated:          Dec 2014
rem
rem     Last tested 
rem             19.3.0.0
rem             12.2.0.1
rem             11.2.0.4
rem
 
drop table t1 purge;
create table t1 (id number(6), v1 varchar(40), v2 varchar2(40), v3 varchar2(40));
create unique index t1_i1 on t1(id);

execute snap_redo.start_snap

insert into t1 
select	object_id, object_name, object_name, object_name 
from	all_objects
where	rownum <= 10000 --> comment to avoid wordpress format issue
;

execute snap_redo.end_snap


drop table t1 purge;
create table t1 (id number(6), v1 varchar(4000), v2 varchar2(4000), v3 varchar2(4000));
create unique index t1_i1 on t1(id);

execute snap_redo.start_snap

insert into t1 
select	object_id, object_name, object_name, object_name 
from	all_objects
where	rownum <= 10000 --> comment to avoid wordpress format issue
;

execute snap_redo.end_snap

I’ve dropped and created the same table twice, once with varchar2(40) columns and once with varchar2(4000) columns.

I’ve created an index on a (non-character) column – the specific results vary depending on whether the index is unique or non-unique and whether or not you have the index, and whether or not the table already holds data and the effective clustering on the index columns etc. etc. but the critical difference between the two sets of results doesn’t go away.

I’ve inserted object_name values (maximum usage 32 bytes) into the varchar2() columns, inserting 10,000 rows with a single insert statement (so we should see optimum internal array processing).

The snap_redo package is one of my simple pre/post packages that calculates changes in values in some dynamic performance view – in this case it’s looking at v$sysstat (system statistics) for statistics relating to redo generation, which means you need to run this test on an otherwise idle instance. Here are the two sets of results from an instance of 11.2.0.4:


Name                                                                     Value
----                                                                     -----
messages sent                                                               11
messages received                                                           11
calls to kcmgcs                                                            313
calls to kcmgas                                                             37
calls to get snapshot scn: kcmgss                                           74
redo entries                                                               769
redo size                                                            1,317,008
redo wastage                                                             3,888
redo writes                                                                 11
redo blocks written                                                      2,664
redo write time                                                             10
redo blocks checksummed by FG (exclusive)                                2,242
redo ordering marks                                                          1
redo subscn max counts                                                       1
redo synch time                                                              7
redo synch time (usec)                                                  88,875
redo synch time overhead (usec)                                          1,810
redo synch time overhead count (<2 msec)                                    11
redo synch writes                                                           11
redo write info find                                                        11
undo change vector size                                                261,136
rollback changes - undo records applied                                      2
IMU undo allocation size                                                17,184


Name                                                                     Value
----                                                                     -----
messages sent                                                                8
messages received                                                            8
calls to kcmgcs                                                            222
calls to kcmgas                                                             56
calls to get snapshot scn: kcmgss                                           52
redo entries                                                            20,409
redo size                                                            5,606,872
redo buffer allocation retries                                               1
redo wastage                                                             1,248
redo writes                                                                  6
redo blocks written                                                     11,324
redo write time                                                             26
redo blocks checksummed by FG (exclusive)                                  571
redo ordering marks                                                         32
redo subscn max counts                                                       1
redo synch time                                                              6
redo synch time (usec)                                                  60,230
redo synch time overhead (usec)                                            159
redo synch time overhead count (<2 msec)                                     1
redo synch writes                                                            1
redo write info find                                                         1
undo change vector size                                              1,590,520
IMU undo allocation size                                                   144

Notice, particularly, the great change in the number of redo entries and the total redo size when the character columns are defined at varchar2(4000). Note particularly that the number of redo entries is roughly “2 * number of rows inserted” – that’s one for each row itself and one for each index entry. You can check the redo log content by dump the log file, of course (and Nikolay did), or you can take my word for it that Oracle is doing the equivalent of single row processing in the varchar2(4000) case and array processing in the varchar2(40) case.

When Oracle calculates that the row length definition (not data) is larger than the block size it falls back to single row processing; this can increase your redo generation significantly (one set of redo metadata per row instead of one per table block) and since the rate at which you can pump out redo is the ultimate rate at which you can load data this could have a significant impact on your data loading times. Declaring character columns as varchar2(4000) “just in case” is a bad idea.

Addendum (Sept 2018)

One of the strangest things about my own blog is that while searching for an article on one topic I often re-discover things I had forgotten about on a completely differentI topic . Today’s little bit of serendipity was finding a comment by Christo Kutrovsky that is relevant to the “long column definitions” while searching for something I’d written about using non-unique indexes to enforce unique constraints.

Here’s the link to the comment Christo made about the way that the mechanics of nested loop joins will change if the select list is greater than 8,100 bytes. (Inevitably the example raises the question of whether the limit could be related to the block size, or whether it’s an absolute value, and whether – three years later – it still applies in the latest versions of Oracle; but it’s an interesting point about how so many aspects of Oracle can run into strange edge cases.)

Update (Mar 2019)

Another related point has just been aired on the Oracle developer forum – where a database that had been upgraded to use extended character types (i.e. 32767 bytes) has run into a problem with function based indexes. A function returning a varchar2() alway returns the maximum length string – so if you try to create an index on such a function it has to fail with ORA-01450 since the string length will be larger than the typical 8KB block size and, indeed, larger than 75% of the largest possible Oracle block size. Workarounds are available in the ODC thread.

Update (Nov 2019)

I’ve just been prompted to re-run the test on newer versions of Oracle, and the effect is still present in every version I’ve tested up to 19.3.0.0.

I’ve also written a memo to myself to test whether this also holds true for:

The PL/SQL “forall” array-processing construct
Inserting across database links

There’s a little discussion in the comments to a posting on db_links that raised these questions.

Update (Nov 2023)

Just tested on 23.2.0.0 – and the behaviour hasn’t changed.

If you check Tim Halls notes on Immutable Tables you’ll see that the “V2” immutable table adds 24 columns totaling about 21KB to the table definition and the “V1” adds 10 columns totaling about 6KB. (In both cases only 2 “short” columns of the many extra columns appear to be used for the immutable functionality at present). This means that if you do request V2 tables any array based loading of the table will turn into single-row processing, because of columns your application can’t see and Oracle isn’t using (yet).

Comments (24)

24 Comments »

Hi Jonathan,

thanks for bringing this to the community’s attention! I still remember my frustration when a spent a lot of time moving a significant amount of code from row-by-row to bulk processing only to see some ridiculously small performance gains. I’m sure there are many people out there in a similar position, and this information can help them save some time (and nerves).

Best regards,
Nikolay

Comment by savvinov — December 23, 2014 @ 11:29 am GMT Dec 23,2014 | Reply
- Nikolay,
  
  Thanks for pursuing the issue with Oracle and getting a resolution.
  
  It’s an interesting example of a feature that has probably been around for a very long time but never been properly noticed because the impact has never been sufficiently apparent to prompt someone to look into it. (A bit like the ASSM / pctfree / chained rows bug that Steve Karam came across a few years ago.)
  
  Although the single-row/array argument has never been made in the “bigger blocks are better” canon, perhaps it was actually an (undocumented) observation of this anomaly many years ago that first prompted someone to create the legend.
  
  Comment by Jonathan Lewis — December 23, 2014 @ 11:52 am GMT Dec 23,2014 | Reply
  - Interesting. So what’s your position on the optimal database block size — do you think that except in some rare cases the default 8k is adequate or do you have some arguments in favor or smaller/larger block sizes for some typical configurations?
    
    Comment by savvinov — December 23, 2014 @ 12:07 pm GMT Dec 23,2014 | Reply
    - I would pick the default block size for the operating system, which would probably be 8KB in all cases nowadays, as the database block size unless you could give me a fairly convincing demonstration that an alternative was better.
      
      On very rare occasions I would consider a different block size for specific options. I have had a couple of cases where it seemed to make sense to use 16KB for an IOT (for performance reasons), and a case where 4KB for a LOB segment (for space saving reasons) was a good idea.
      
      Comment by Jonathan Lewis — December 23, 2014 @ 12:28 pm GMT Dec 23,2014
For two column “why should I use the RDBMS features?” key, value pair data models stuffed in a pair of varchar2(4000), this is indeed a (corner) case for using some bigger block size. I haven’t tested whether 10K block sizes still work, but that was indeed a size I used for this very purpose (though I hadn’t discovered the slow-by-slow effect and it may not even have been there circa 7.1) of avoiding multi-block rows circa 1992/3 handling an interesting case for a customer.

E-biz was demonstrably faster too, as makes sense if you have tens of 500 var char “attribute_n” columns and most of them are born and remain forever NULL so actual rows tend to fit in a single larger block. None of them would even be close to fitting under the “I’m going to row-by-row if an individual row can exceed a block size” rule.

Nikolay: Have you tested if this row-by-row processing is the effect with a single long or a single (possibly out of band) more modern very long type (like a LOB)? Do we know when (version wise) it came into play?

JL: I particularly like your sly migration from optimal database block size to default database block size in answer to Nikolay’s question. Thus you underscore that the default should be the sweet spot for the natural group of the physical resources being used (following OS perhaps further down the stack), while only provably (and likely rare) cases should vary from the default. Since the advent of it even being possible to have multiple database block sizes in a database that is the real and far more easily testable question than varying the default database block sizes and comparing throughput in aggregate.

Comment by rsiz — December 23, 2014 @ 1:25 pm GMT Dec 23,2014 | Reply
- I haven’t had a chance to conduct an in-depth analysis for inserts (or other DML) involving LOB’s. It would be interesting to look at that, but a thorough study would require a lot of time because of multiple possible combinations of parameters (in-row storage on or off, chunk size etc.). I’m a little bit constraint in terms of time right now so I’m not sure if I’d be able to test this behavior with LOBs any time soon.
  
  I don’t know in which version this behavior first appeared. I vaguely recall that Jonathan mentioned something about the history of this behavior in the thread to my original post on multi-row DML and block size (http://savvinov.com/2014/08/19/block-size-and-multirow-dml-peformance/), so you may be able to find some relevant information in there.
  
  Comment by savvinov — December 23, 2014 @ 2:38 pm GMT Dec 23,2014 | Reply
  - Regarding the effect, I’ve just run the test on 8.1.7.4 and it shows the same behaviour.
    
    I have to agree about the availability of time – there are plenty of things that I could test, but I won’t test until I have a project for which the test is important. Knowing that something needs to be tested, though, is half the battle in doing the right thing.
    
    Comment by Jonathan Lewis — December 23, 2014 @ 3:43 pm GMT Dec 23,2014 | Reply
This may not be an issue when using Direct Path Insert (APPEND or PARALLEL Insert). I have a large number of tables where the maximum row size is very much more than the block size and I use SQLLoader Direct Path and INSERT PARALLEL operations.

A simple test case :
```
drop table hkc_test_large_row_size purge;

create table hkc_test_large_row_size (id_column number, data_col_1 varchar2(4000), data_col_2 varchar2(4000), data_col_3 varchar2(4000));
create unique index hkc_tlrs_undx on hkc_test_large_row_size(id_column);

alter session enable parallel dml;





select distinct sid from v$mystat;

pause Get the redo stats 

insert /*+ APPEND */ into hkc_test_large_row_size
select rownum, to_char(rownum), to_char(rownum), to_char(rownum)
from dual
connect  by level < 1001
/

pause Get the redo stats 
commit;

alter table hkc_test_large_row_size parallel 4;

insert /*+ PARALLEL (MANUAL) */ into hkc_test_large_row_size
select rownum+2000, to_char(rownum), to_char(rownum), to_char(rownum)
from dual
connect  by level < 1001
/


pause Get the redo stats 

commit;
```
Comment by Hemant K Chitale — December 24, 2014 @ 1:52 am GMT Dec 24,2014 | Reply
- Hemant,
  
  Thanks for the comment and sample code.
  
  I think this would be because Oracle logs blocks rather than rows when doing direct path.
  
  A follow-on question for this example would then be how well does Oracle fill the blocks. (I mention this only because there used to be an oddity with direct path where the number of rows per block didn’t match the number of rows you’d get from a standard insert – and I haven’t ever checked to see if it has been fixed.)
  
  Comment by Jonathan Lewis — December 24, 2014 @ 11:18 am GMT Dec 24,2014 | Reply
Jonathan,
one additional question: do you have a hypothesis why this change in redo generation mode takes place? Since your test on 8.1.7.4 shows the same behaviour this is apparently not a result of newer optimizations (private redo for example).

Comment by Martin Preiss — December 24, 2014 @ 10:44 am GMT Dec 24,2014 | Reply
- Martin,
  
  My first guess would be that Oracle is anticipating an endless stream of chained rows, which would require some special form of single row processing to handle the chained component.
  
  Anyone running a data warehouse that uses tables with more than 255 columns might want to check if there’s a similar switch from array processing to single row processing as the column count for a table definition crosses the boundary. (I may have done that check in the past, but if I did I don’t recall on which version or what result I saw.)
  
  Comment by Jonathan Lewis — December 24, 2014 @ 11:22 am GMT Dec 24,2014 | Reply
  - Jonathan,
    just checked your assumption about tables with multiple row pieces and it seems you are right: for a table with 200 columns I got redo log entries with multi-row entries (slot[n]) and less than 1000 redo entries (in v$sesstat) for 1000 inserted rows. For a table with 300 columns (and two row pieces) the redo log entries are created per row and there are more than 2000 redo entries (in v$sesstat) for the 1000 rows. The behaviour was unchanged when I inserted values only into the first and the last column (that are located in different row pieces for the table with 300 columns) – and even an insert that only targeted one row piece (the first column) resulted in more than 1000 redo entries.
    
    Comment by Martin Preiss — December 25, 2014 @ 8:47 pm GMT Dec 25,2014 | Reply
    - Martin,
      
      Thanks for doing the tests and telling us the outcome.
      
      Comment by Jonathan Lewis — January 3, 2015 @ 1:04 pm GMT Jan 3,2015
  - Jonathan,
    Re your remark “Anyone running a data warehouse that uses tables with more than 255 columns might want to check if there’s a similar switch from array processing to single row processing” …
    
    See 9373758.8 and 1382613.1 I haven’t got around to testing this.
    
    Comment by Hemant K Chitale — December 26, 2014 @ 6:03 am GMT Dec 26,2014 | Reply
    - Hemant,
      
      Thanks for the reference – another variation to watch out for.
      The original bug report (9373758) is a nice warning of the scale of the chaining threat (on Exadata in particular)
      
      Comment by Jonathan Lewis — January 3, 2015 @ 1:05 pm GMT Jan 3,2015
Oracle Data Miner has a number of built-in data types with “just in case” VARCHAR2(4000) columns, like DM_NESTED_NUMERICAL, DM_NESTED_CATEGORICAL. I just did several tests with nested tables based on 1) these built-in data types and 2) “custom” data types that are using varchar2(40) and not ODM’s default varchar2(4000). No difference in redo generation for nested tables at least. Most likely because nested tables can’t use direct path anyway.. Thanks for bringing this to attention.

Comment by dautkhanov — December 26, 2014 @ 10:09 pm GMT Dec 26,2014 | Reply
- dautkhanov,
  
  Thanks for the comment.
  
  If you do write your results up in a blog please come back and add a link.
  
  It would be interesting to see if there are any variations in generating nested table data that do optimize – I think I’ve seen a couple of hints used internally for nested tables on export and import that might be there fiddle the system.
  
  Comment by Jonathan Lewis — January 3, 2015 @ 1:08 pm GMT Jan 3,2015 | Reply
To put some context around this, since we are the developers in question…. In short, this is not “Just in Case”

We are a Data Warehouse application that accepts data from multiple sources, but our main source (90%) is a completely unconstrained, multi-value, PICK database. The source application can, and does very frequently, mis place columns so that completely unexpected data can arrive in the wrong column or even the wrong file. It can be “man handled” so that very large text values can appear in a single character field. These are tip of the iceberg examples.

We have no influence on how the source system operates and there is no appetite to make significant changes to the source application that attempts to constrain the data. This is the way it is.

We looked at a number of ways to alleviate the problems in an attempt to allow the data in. Pre processing the data in order to capture the bad data early. Capturing faulty data at point of entry using exception clause. Letting it fail and managing the dependencies. It simply wasn’t simple and the DWH had to be operational in a very short time frame. A team of 4 people developed an Enterprise DWH from a complete standing start in 9 months. This is a large global company with a tiny IT budget.

The data was of such poor quality that sidelining faulty rows caused a significant skew in the results. Typically we observed that the bad data was confined to non numeric attributes – hence if we could get the data in we were likely to get value from it, and that has proved to be the case. Data Quality is now a valuable dimension of our data, and has enabled us to quantify and isolate many problems in the source systems whilst maintaing DWH up time.

Almost all of our operations are NOLOGGING parallel inserts or merges. Our Indexed Data is constrained as correctly as we can as it is rarely, if ever, valuable to allow faulty primary key data to persist and the PICK application does a good job of maintaining Primary Key integrity.

Our tables are extremely lightly indexed in our relational Staging Schemas, we always want to force set based processing on our large datasets. It seems that we are not witnessing any “row by row” processing and I am not aware of any redo issues, certainly in the 3 years since we have been operational this has never been mentioned.

In our Star Schemas for data publishing we are heavily indexed using Bitmap and Bitmap Join indexes on surrogate keys with Dimensions and Hierarchies defined, a comprehensive hierarchical Materialized View strategy with trusted rewrite forced. There are no null values in indexed columns, rather a surrogate key pointing to a “No Data” dimension. Hence we take advantage of all the Star related performance features and the ability to skip unrequired tables that may have been included in a Business Objects report.

The Client reporting application, Business Objects, does not suffer from any issues due to the Oracle meta data being defined as 4k.

I have written all of this here because, although the original question was raised on Oracle-l, many of the replies were simply confrontational. (terms like “Duh veloper” and “Grow Up” dont really move the conversation forwards). Nobody wondered why we had done it or tried to find out. Had they understood the reality of the data, business pressures, SLA’s etc then a more rounded solution could have been explored.

I understand and genuinely appreciate the descriptions and examples of the potential technical issues. We don’t appear to be suffering enough (or at all) to justify a change in strategy at the moment. Everything works. The business is happy.

The original questioner suggested that we double the size of fields to allow for potential growth. As described here, “growth” isn’t the issue – faulty data is. I just don’t understand the point of trying to describe data via metadata in a datawarehouse. An OLTP data entry system, yes. An OLAP reporting system, no.

Comment by Dave Noble — January 5, 2015 @ 3:02 pm GMT Jan 5,2015 | Reply
[…] text). Today I just read a tweet from Jonathan Lewis, directing me to one of his older blog posts – “Just in case”. I try to read every blog post from Jonathan, but sometimes it’s hard to keep up. This one I […]

Pingback by jcon.no: Oracle blog - One of the things you just need to know … — March 17, 2016 @ 2:22 pm GMT Mar 17,2016 | Reply
When Oracle calculates that the row length definition (not data) is larger than the block size it falls back to single row processing; ?

I see you have defined the column defination as 4000bytes (4KB) and database size is 8KB–how did the row length defination has exceeded the block size in this case?

Comment by Bhavani P Dhulipalla — September 29, 2018 @ 8:34 pm BST Sep 29,2018 | Reply
- Bhavani,
  
  There are three columns declared at 4000 bytes, so the defined length is around 12KB.
  
  Regards
  Jonathan Lewis
  
  Comment by Jonathan Lewis — September 30, 2018 @ 4:18 pm BST Sep 30,2018 | Reply
[…] Good question! The whole undo/redo infrastructure in Oracle is probably the most astounding technological achievement in the entire code base – so would you test it to see that it was working properly and if you could break it ? Probably not – although if you were about to recreate your undo tablespace with a 32KB block size you might test to see if the change would produce any surprise side-effects); or you might wonder if anything funny could happen to the redo generation if you created all your varchar2() columns as 4000 bytes “just in case”. […]

Pingback by Assumptions | Oracle Scratchpad — July 9, 2019 @ 11:47 am BST Jul 9,2019 | Reply
[…] warning about using any unnecessarily long character types “just in case” you need the […]

Pingback by 32K Columns | Oracle Scratchpad — October 30, 2019 @ 3:54 pm GMT Oct 30,2019 | Reply
[…] can have catastrophic side effects on performance if you’ve declared some very long columns “just in case”. Any time you use index organized tables you should remember to check the Column Projection […]

Pingback by IOT Hash | Oracle Scratchpad — October 31, 2019 @ 2:59 pm GMT Oct 31,2019 | Reply