Oracle Scratchpad

December 23, 2014

Just in case

Filed under: Infrastructure,Oracle,Performance,redo — Jonathan Lewis @ 10:37 am GMT Dec 23,2014

For those who don’t read Oracle-l and haven’t found Nikolay Savvinov’s blog, here’s a little note pulling together a recent question on Oracle-L and a relevant (and probably unexpected) observation from the blog. The question (paraphrased) was:

The developers/data modelers are creating all the tables with varchar2(4000) as standard by default “Just in case we need it”. What do you think of this idea?

The general answer was that it’s a bad idea (and unnecessary, anyway) and one specific threat that got mentioned was the problem of creating indexes and Oracle error ORA-01450; but coincidentally Nikolay Savvinov had written about a performance-related “bug” in August this year, which turned out, only last week, to be expected behaviour. You can read his articles for the details, but since he used a 4KB block size to demonstrate it I thought I’d repeat the exercise using an 8KB block size.

 


rem     Script: column_length_threat.sql

drop table t1 purge;
create table t1 (id number(6), v1 varchar(40), v2 varchar2(40), v3 varchar2(40));
create unique index t1_i1 on t1(id);

execute snap_redo.start_snap

insert into t1 
select	object_id, object_name, object_name, object_name 
from	all_objects
where	rownum <= 10000
;

execute snap_redo.end_snap


drop table t1 purge;
create table t1 (id number(6), v1 varchar(4000), v2 varchar2(4000), v3 varchar2(4000));
create unique index t1_i1 on t1(id);

execute snap_redo.start_snap

insert into t1 
select	object_id, object_name, object_name, object_name 
from	all_objects
where	rownum <= 10000
;

execute snap_redo.end_snap

I’ve dropped and created the same table twice, once with varchar2(40) columns and once with varchar2(4000) columns.

I’ve created an index on a (non-character) column – the specific results vary depending on whether the index is unique or non-unique, and whether or not you have the index, and whether or not the table already holds data, and the effective clustering on the index columns etc. etc. but the key difference between the two sets of results doesn’t go away.

I’ve inserted object_name values (maximum usage 32 bytes) into the varchar2() columns, inserting 10,000 rows.

The snap_redo package is one of my simple pre/post packages that calculates changes in values in some dynamic performance view – in this case it’s looking at v$sysstat (system statistics) for statistics relating to redo generation, which means you need to run this test on an otherwise idle instance. Here are the two sets of results from an instance of 11.2.0.4:


Name                                                                     Value
----                                                                     -----
messages sent                                                               11
messages received                                                           11
calls to kcmgcs                                                            313
calls to kcmgas                                                             37
calls to get snapshot scn: kcmgss                                           74
redo entries                                                               769
redo size                                                            1,317,008
redo wastage                                                             3,888
redo writes                                                                 11
redo blocks written                                                      2,664
redo write time                                                             10
redo blocks checksummed by FG (exclusive)                                2,242
redo ordering marks                                                          1
redo subscn max counts                                                       1
redo synch time                                                              7
redo synch time (usec)                                                  88,875
redo synch time overhead (usec)                                          1,810
redo synch time overhead count (<2 msec)                                    11
redo synch writes                                                           11
redo write info find                                                        11
undo change vector size                                                261,136
rollback changes - undo records applied                                      2
IMU undo allocation size                                                17,184


Name                                                                     Value
----                                                                     -----
messages sent                                                                8
messages received                                                            8
calls to kcmgcs                                                            222
calls to kcmgas                                                             56
calls to get snapshot scn: kcmgss                                           52
redo entries                                                            20,409
redo size                                                            5,606,872
redo buffer allocation retries                                               1
redo wastage                                                             1,248
redo writes                                                                  6
redo blocks written                                                     11,324
redo write time                                                             26
redo blocks checksummed by FG (exclusive)                                  571
redo ordering marks                                                         32
redo subscn max counts                                                       1
redo synch time                                                              6
redo synch time (usec)                                                  60,230
redo synch time overhead (usec)                                            159
redo synch time overhead count (<2 msec)                                     1
redo synch writes                                                            1
redo write info find                                                         1
undo change vector size                                              1,590,520
IMU undo allocation size                                                   144

Notice, particularly, the great change in the number of redo entries and the total redo size when the character columns are defined at varchar2(4000). Note particularly that the number of redo entries is roughly “2 * number of rows inserted”; for each row that’s one for the row itself and one for the index entry. You can check the redo log content by dump the log file, of course (and Nikolay did), or you can take my word for it that Oracle is doing the equivalent of single row processing in the varchar2(4000) case and array processing in the varchar2(40) case.

When Oracle calculates that the row length definition (not data) is larger than the block size it falls back to single row processing; this can increase your redo generation significantly, and since the rate at which you can pump out redo is the ultimate rate at which you can load data this could have a significant impact on your data loading times. Declaring character columns as varchar2(4000) “just in case” is a bad idea.

 

19 Comments »

  1. Hi Jonathan,

    thanks for bringing this to the community’s attention! I still remember my frustration when a spent a lot of time moving a significant amount of code from row-by-row to bulk processing only to see some ridiculously small performance gains. I’m sure there are many people out there in a similar position, and this information can help them save some time (and nerves).

    Best regards,
    Nikolay

    Comment by savvinov — December 23, 2014 @ 11:29 am GMT Dec 23,2014 | Reply

    • Nikolay,

      Thanks for pursuing the issue with Oracle and getting a resolution.

      It’s an interesting example of a feature that has probably been around for a very long time but never been properly noticed because the impact has never been sufficiently apparent to prompt someone to look into it. (A bit like the ASSM / pctfree / chained rows bug that Steve Karam came across a few years ago.)

      Although the single-row/array argument has never been made in the “bigger blocks are better” canon, perhaps it was actually an (undocumented) observation of this anomaly many years ago that first prompted someone to create the legend.

      Comment by Jonathan Lewis — December 23, 2014 @ 11:52 am GMT Dec 23,2014 | Reply

      • Interesting. So what’s your position on the optimal database block size — do you think that except in some rare cases the default 8k is adequate or do you have some arguments in favor or smaller/larger block sizes for some typical configurations?

        Comment by savvinov — December 23, 2014 @ 12:07 pm GMT Dec 23,2014 | Reply

        • I would pick the default block size for the operating system, which would probably be 8KB in all cases nowadays, as the database block size unless you could give me a fairly convincing demonstration that an alternative was better.

          On very rare occasions I would consider a different block size for specific options. I have had a couple of cases where it seemed to make sense to use 16KB for an IOT (for performance reasons), and a case where 4KB for a LOB segment (for space saving reasons) was a good idea.

          Comment by Jonathan Lewis — December 23, 2014 @ 12:28 pm GMT Dec 23,2014

  2. For two column “why should I use the RDBMS features?” key, value pair data models stuffed in a pair of varchar2(4000), this is indeed a (corner) case for using some bigger block size. I haven’t tested whether 10K block sizes still work, but that was indeed a size I used for this very purpose (though I hadn’t discovered the slow-by-slow effect and it may not even have been there circa 7.1) of avoiding multi-block rows circa 1992/3 handling an interesting case for a customer.

    E-biz was demonstrably faster too, as makes sense if you have tens of 500 var char “attribute_n” columns and most of them are born and remain forever NULL so actual rows tend to fit in a single larger block. None of them would even be close to fitting under the “I’m going to row-by-row if an individual row can exceed a block size” rule.

    Nikolay: Have you tested if this row-by-row processing is the effect with a single long or a single (possibly out of band) more modern very long type (like a LOB)? Do we know when (version wise) it came into play?

    JL: I particularly like your sly migration from optimal database block size to default database block size in answer to Nikolay’s question. Thus you underscore that the default should be the sweet spot for the natural group of the physical resources being used (following OS perhaps further down the stack), while only provably (and likely rare) cases should vary from the default. Since the advent of it even being possible to have multiple database block sizes in a database that is the real and far more easily testable question than varying the default database block sizes and comparing throughput in aggregate.

    Comment by rsiz — December 23, 2014 @ 1:25 pm GMT Dec 23,2014 | Reply

    • I haven’t had a chance to conduct an in-depth analysis for inserts (or other DML) involving LOB’s. It would be interesting to look at that, but a thorough study would require a lot of time because of multiple possible combinations of parameters (in-row storage on or off, chunk size etc.). I’m a little bit constraint in terms of time right now so I’m not sure if I’d be able to test this behavior with LOBs any time soon.

      I don’t know in which version this behavior first appeared. I vaguely recall that Jonathan mentioned something about the history of this behavior in the thread to my original post on multi-row DML and block size (http://savvinov.com/2014/08/19/block-size-and-multirow-dml-peformance/), so you may be able to find some relevant information in there.

      Comment by savvinov — December 23, 2014 @ 2:38 pm GMT Dec 23,2014 | Reply

      • Regarding the effect, I’ve just run the test on 8.1.7.4 and it shows the same behaviour.

        I have to agree about the availability of time – there are plenty of things that I could test, but I won’t test until I have a project for which the test is important. Knowing that something needs to be tested, though, is half the battle in doing the right thing.

        Comment by Jonathan Lewis — December 23, 2014 @ 3:43 pm GMT Dec 23,2014 | Reply

  3. This may not be an issue when using Direct Path Insert (APPEND or PARALLEL Insert). I have a large number of tables where the maximum row size is very much more than the block size and I use SQLLoader Direct Path and INSERT PARALLEL operations.

    A simple test case :

    drop table hkc_test_large_row_size purge;
    
    create table hkc_test_large_row_size (id_column number, data_col_1 varchar2(4000), data_col_2 varchar2(4000), data_col_3 varchar2(4000));
    create unique index hkc_tlrs_undx on hkc_test_large_row_size(id_column);
    
    alter session enable parallel dml;
    
    
    
    
    
    select distinct sid from v$mystat;
    
    pause Get the redo stats 
    
    insert /*+ APPEND */ into hkc_test_large_row_size
    select rownum, to_char(rownum), to_char(rownum), to_char(rownum)
    from dual
    connect  by level < 1001
    /
    
    pause Get the redo stats 
    commit;
    
    alter table hkc_test_large_row_size parallel 4;
    
    insert /*+ PARALLEL (MANUAL) */ into hkc_test_large_row_size
    select rownum+2000, to_char(rownum), to_char(rownum), to_char(rownum)
    from dual
    connect  by level < 1001
    /
    
    
    pause Get the redo stats 
    
    commit;
    

    Comment by Hemant K Chitale — December 24, 2014 @ 1:52 am GMT Dec 24,2014 | Reply

    • Hemant,

      Thanks for the comment and sample code.

      I think this would be because Oracle logs blocks rather than rows when doing direct path.

      A follow-on question for this example would then be how well does Oracle fill the blocks. (I mention this only because there used to be an oddity with direct path where the number of rows per block didn’t match the number of rows you’d get from a standard insert – and I haven’t ever checked to see if it has been fixed.)

      Comment by Jonathan Lewis — December 24, 2014 @ 11:18 am GMT Dec 24,2014 | Reply

  4. Jonathan,
    one additional question: do you have a hypothesis why this change in redo generation mode takes place? Since your test on 8.1.7.4 shows the same behaviour this is apparently not a result of newer optimizations (private redo for example).

    Comment by Martin Preiss — December 24, 2014 @ 10:44 am GMT Dec 24,2014 | Reply

    • Martin,

      My first guess would be that Oracle is anticipating an endless stream of chained rows, which would require some special form of single row processing to handle the chained component.

      Anyone running a data warehouse that uses tables with more than 255 columns might want to check if there’s a similar switch from array processing to single row processing as the column count for a table definition crosses the boundary. (I may have done that check in the past, but if I did I don’t recall on which version or what result I saw.)

      Comment by Jonathan Lewis — December 24, 2014 @ 11:22 am GMT Dec 24,2014 | Reply

      • Jonathan,
        just checked your assumption about tables with multiple row pieces and it seems you are right: for a table with 200 columns I got redo log entries with multi-row entries (slot[n]) and less than 1000 redo entries (in v$sesstat) for 1000 inserted rows. For a table with 300 columns (and two row pieces) the redo log entries are created per row and there are more than 2000 redo entries (in v$sesstat) for the 1000 rows. The behaviour was unchanged when I inserted values only into the first and the last column (that are located in different row pieces for the table with 300 columns) – and even an insert that only targeted one row piece (the first column) resulted in more than 1000 redo entries.

        Comment by Martin Preiss — December 25, 2014 @ 8:47 pm GMT Dec 25,2014 | Reply

      • Jonathan,
        Re your remark “Anyone running a data warehouse that uses tables with more than 255 columns might want to check if there’s a similar switch from array processing to single row processing” …

        See 9373758.8 and 1382613.1 I haven’t got around to testing this.

        Comment by Hemant K Chitale — December 26, 2014 @ 6:03 am GMT Dec 26,2014 | Reply

        • Hemant,

          Thanks for the reference – another variation to watch out for.
          The original bug report (9373758) is a nice warning of the scale of the chaining threat (on Exadata in particular)

          Comment by Jonathan Lewis — January 3, 2015 @ 1:05 pm GMT Jan 3,2015

  5. Oracle Data Miner has a number of built-in data types with “just in case” VARCHAR2(4000) columns, like DM_NESTED_NUMERICAL, DM_NESTED_CATEGORICAL. I just did several tests with nested tables based on 1) these built-in data types and 2) “custom” data types that are using varchar2(40) and not ODM’s default varchar2(4000). No difference in redo generation for nested tables at least. Most likely because nested tables can’t use direct path anyway.. Thanks for bringing this to attention.

    Comment by dautkhanov — December 26, 2014 @ 10:09 pm GMT Dec 26,2014 | Reply

    • dautkhanov,

      Thanks for the comment.

      If you do write your results up in a blog please come back and add a link.

      It would be interesting to see if there are any variations in generating nested table data that do optimize – I think I’ve seen a couple of hints used internally for nested tables on export and import that might be there fiddle the system.

      Comment by Jonathan Lewis — January 3, 2015 @ 1:08 pm GMT Jan 3,2015 | Reply

  6. To put some context around this, since we are the developers in question…. In short, this is not “Just in Case”

    We are a Data Warehouse application that accepts data from multiple sources, but our main source (90%) is a completely unconstrained, multi-value, PICK database. The source application can, and does very frequently, mis place columns so that completely unexpected data can arrive in the wrong column or even the wrong file. It can be “man handled” so that very large text values can appear in a single character field. These are tip of the iceberg examples.

    We have no influence on how the source system operates and there is no appetite to make significant changes to the source application that attempts to constrain the data. This is the way it is.

    We looked at a number of ways to alleviate the problems in an attempt to allow the data in. Pre processing the data in order to capture the bad data early. Capturing faulty data at point of entry using exception clause. Letting it fail and managing the dependencies. It simply wasn’t simple and the DWH had to be operational in a very short time frame. A team of 4 people developed an Enterprise DWH from a complete standing start in 9 months. This is a large global company with a tiny IT budget.

    The data was of such poor quality that sidelining faulty rows caused a significant skew in the results. Typically we observed that the bad data was confined to non numeric attributes – hence if we could get the data in we were likely to get value from it, and that has proved to be the case. Data Quality is now a valuable dimension of our data, and has enabled us to quantify and isolate many problems in the source systems whilst maintaing DWH up time.

    Almost all of our operations are NOLOGGING parallel inserts or merges. Our Indexed Data is constrained as correctly as we can as it is rarely, if ever, valuable to allow faulty primary key data to persist and the PICK application does a good job of maintaining Primary Key integrity.

    Our tables are extremely lightly indexed in our relational Staging Schemas, we always want to force set based processing on our large datasets. It seems that we are not witnessing any “row by row” processing and I am not aware of any redo issues, certainly in the 3 years since we have been operational this has never been mentioned.

    In our Star Schemas for data publishing we are heavily indexed using Bitmap and Bitmap Join indexes on surrogate keys with Dimensions and Hierarchies defined, a comprehensive hierarchical Materialized View strategy with trusted rewrite forced. There are no null values in indexed columns, rather a surrogate key pointing to a “No Data” dimension. Hence we take advantage of all the Star related performance features and the ability to skip unrequired tables that may have been included in a Business Objects report.

    The Client reporting application, Business Objects, does not suffer from any issues due to the Oracle meta data being defined as 4k.

    I have written all of this here because, although the original question was raised on Oracle-l, many of the replies were simply confrontational. (terms like “Duh veloper” and “Grow Up” dont really move the conversation forwards). Nobody wondered why we had done it or tried to find out. Had they understood the reality of the data, business pressures, SLA’s etc then a more rounded solution could have been explored.

    I understand and genuinely appreciate the descriptions and examples of the potential technical issues. We don’t appear to be suffering enough (or at all) to justify a change in strategy at the moment. Everything works. The business is happy.

    The original questioner suggested that we double the size of fields to allow for potential growth. As described here, “growth” isn’t the issue – faulty data is. I just don’t understand the point of trying to describe data via metadata in a datawarehouse. An OLTP data entry system, yes. An OLAP reporting system, no.

    Comment by Dave Noble — January 5, 2015 @ 3:02 pm GMT Jan 5,2015 | Reply

  7. […] text). Today I just read a tweet from Jonathan Lewis, directing me to one of his older blog posts – “Just in case”. I try to read every blog post from Jonathan, but sometimes it’s hard to keep up. This one I […]

    Pingback by jcon.no: Oracle blog - One of the things you just need to know … — March 17, 2016 @ 2:22 pm GMT Mar 17,2016 | Reply


RSS feed for comments on this post. TrackBack URI

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Blog at WordPress.com.