Oracle Scratchpad

May 20, 2022

Quiz Night

Filed under: Infrastructure,Oracle,Problem Solving,Statistics — Jonathan Lewis @ 6:09 pm BST May 20,2022

It’s a long time since I’ve done a quiz night – but here’s something that fooled me (briefly) when it appeared on the Oracle Developers’ Forum. Here’s a table definition – and I’m not going to make it easy by giving you a “create table” statement – but it’s just a simple heap table:

sql> desc interr_skuplannparam
 name                             null?    type
 -------------------------------- -------- -------------------
 atpdur                                    number(38)
 depdmdopt                                 number(38)
 externalskusw                             number(1)
 firstreplendate                           date
 lastfrzstart                              date
 lastplanstart                             date
 plandur                                   number(38)
 planleadtime                              number(38)
 planleadtimerule                          number(38)
 planshipfrzdur                            number(38)
 restrictdur                               number(38)
 allocbatchsw                              number(1)
 cmpfirmdur                                number(38)
 custservicelevel                          float(126)
 maxchangefactor                           float(126)
 mfgleadtime                               number(38)
 recschedrcptsdur                          number(38)
 cpppriority                               number(38)
 cpplocksw                                 number(1)
 criticalmaterialsw                        number(1)
 aggexcesssupplyrule                       number(38)
 aggundersupplyrule                        number(38)
 bufferleadtime                            number(38)
 maxoh                                     float(126)
 maxcovdur                                 number(38)
 drpcovdur                                 number(38)
 drpfrzdur                                 number(38)
 drprule                                   number(38)
 drptimefencedate                          date
 drptimefencedur                           number(38)
 incdrpqty                                 float(126)
 mindrpqty                                 float(126)
 mpscovdur                                 number(38)
 mfgfrzdur                                 number(38)
 mpsrule                                   number(38)
 mpstimefencedate                          date
 mpstimefencedur                           number(38)
 incmpsqty                                 float(126)
 minmpsqty                                 float(126)
 shrinkagefactor                           number(38)
 item                                      varchar2(50 char)
 loc                                       varchar2(50 char)
 expdate                                   date
 atprule                                   number(38)
 prodcal                                   varchar2(50 char)
 prodstartdate                             date
 prodstopdate                              date
 orderingcost                              float(126)
 holdingcost                               float(126)
 eoq                                       float(126)
 ff_trigger_control                        number(38)
 workingcal                                varchar2(50 char)
 lookaheaddur                              number
 orderpointrule                            number
 orderskudetailsw                          number(1)
 supsdmindmdcovdur                         number(38)
 orderpointminrule                         number(38)
 orderpointminqty                          float(126)
 orderpointmindur                          number(38)
 orderuptolevelmaxrule                     number(38)
 orderuptolevelmaxqty                      float(126)
 orderuptolevelmaxdur                      number(38)
 aggskurule                                number(38)
 fwdbuymaxdur                              number(38)
 costuom                                   number(38)
 cumleadtimedur                            number(38)
 cumleadtimeadjdur                         number(38)
 cumleadtimerule                           number(38)
 roundingfactor                            float(126)
 limitplanarrivpublishsw                   number(1)
 limitplanarrivpublishdur                  number
 maxohrule                                 number(1)
 integration_stamp                         date
 integration_jobid                not null varchar2(32 char)
 error_str                                 varchar2(2000 char)
 error_stamp                               date

The column integration_jobid (the single “not null” column) has been defined with the default value of “INT_JOB”, which takes 7 bytes to store. What’s the result of the query at the end of this little script:

truncate table interr_skuplannparam;

insert into interr_skuplannparam (atpdur) 
select   0 
from     all_objects 
where    rownum <= 10000
/

commit;

execute dbms_stats.gather_table_stats(user,'interr_skuplannparam')

select  avg_row_len 
from    user_tables
where   table_name = 'INTERR_SKUPLANNPARAM' -- added later
;

Hint: the value zero is represented internally as a single byte holding the value 0x80 (decimal 128).

Answer

It’s fairly common knowledge that Oracle includes a “length byte” in the estimates for average column length and average row length. It’s also fairly well known that “trailing nulls take no space” in a row so don’t need a length byte. There are a couple of “less common knowledge” details to add, though:

  1. any column that is null in every row gets a zero for the avg_col_len even if it isn’t a “trailing null”.
  2. but point 1 doesn’t apply to date columns that are always null, they get a 1 for the avg_col_len even the column is a “trailing null”.. The same is true for the various timestamp and interval types.
  3. for columns that hold at least one value the avg_col_len is the average over all rows of the actual space used by that column’s data, rounded up, plus 1 for the length byte.
  4. the avg_row_len is not the sum(avg_col_len) it is based on the average of the summed column lengths for each row, plus the count of the length bytes recorded.
  5. User defined type, LOBs, varray types etc. introduce all sorts of other strange effects. (But that’s probably “more common” knowledge.

So what does that mean in my example where there’s a declared not null column near the end of the row, with two trailing columns and with every column except the first and the non-null column set to null for every single row in the table? The easy option is to create the model and show you the results of querying user_tab_cols.

break on report 
compute sum of avg_col_len on report

select  avg_row_len 
from    user_tables
where   table_name = 'INTERR_SKUPLANNPARAM'
;

select 
        column_name, data_type, avg_col_len 
from
         user_tab_cols 
where 
        table_name = 'INTERR_SKUPLANNPARAM'
and     avg_col_len != 0
order by
        column_id
;


============

AVG_ROW_LEN
-----------
         20

1 row selected.


 COLUMN_ID SEGMENT_COLUMN_ID COLUMN_NAME              DATA_TYPE                 AVG_COL_LEN
---------- ----------------- ------------------------ ------------------------- -----------
         1                 1 ATPDUR                   NUMBER                              2
         4                 4 FIRSTREPLENDATE          DATE                                1
         5                 5 LASTFRZSTART             DATE                                1
         6                 6 LASTPLANSTART            DATE                                1
        29                29 DRPTIMEFENCEDATE         DATE                                1
        36                36 MPSTIMEFENCEDATE         DATE                                1
        43                43 EXPDATE                  DATE                                1
        46                46 PRODSTARTDATE            DATE                                1
        47                47 PRODSTOPDATE             DATE                                1
        73                73 INTEGRATION_STAMP        DATE                                1
        74                74 INTEGRATION_JOBID        VARCHAR2                            8
        76                76 ERROR_STAMP              DATE                                1
                                                                                -----------
sum                                                                                      20

12 rows selected.


My query of user_tab_cols orders by column_id, technically it should order by segment_column_id to show the physical ordering in the data segment to allow for all the strange effects you can get in more complex scenarios, but in this very simple case the two values are the same.

As you can see, every date type (including the trailing error_stamp) has an avg_col_len of 1, even though all the dates are null in every row. Column atdpur has avg_col_len = 2, which is 1 byte for storing zero plus a length byte and integration_job_id has avg_col_len = 8, which is 7 bytes for storing ‘INT_JOB’ plus a length byte.

In this case where every single row is identical there are no rounding effects due to the calculation of average column length (the column data stored is the same in every row) so the avg_row_len = sum(avg_col_len).

Conclusion

This was an extreme (though real) production) case. I wrote this note because someone was asking why there was such a massive difference between their estimate of the space needed for a table and the actual space taken by the table.

There are many reasons why this could happen, of course, but in their case their estimate was based on multiplying the avg_row_len by the number of rows in the table – and they had a large number of columns that were always null but still required space for the length byte, so the avg_row_len was about half the actual stored space for the average row. (And then the avg_row_len doesn’t allow for the row overhead and row directory entry, which total another 5 bytes per row).

Lagniappe

It wasn’t part of the quiz, but once you’ve got a simple model you can always start playing with it to find out a little more, and to check how well your guesswork holds up as you make the scenario more complicated. For example, let’s update a few columns in just one row of the table, gather stats again, and report the row and column lengths.

update interr_skuplannparam
set 
        shrinkagefactor = 1234567890,
        item            = 'xxx',
        expdate         = sysdate
where
        rownum = 1
/

commit;

execute dbms_stats.gather_table_stats(user,'interr_skuplannparam')

select  avg_row_len 
from    user_tables
where   table_name = 'INTERR_SKUPLANNPARAM'
;

select 
        column_id, segment_column_id, column_name, data_type, avg_col_len 
from
        user_tab_cols 
where 
        table_name = 'INTERR_SKUPLANNPARAM'
and     avg_col_len != 0
order by
        column_id
;

============================

AVG_ROW_LEN
-----------
         22

1 row selected.


 COLUMN_ID SEGMENT_COLUMN_ID COLUMN_NAME              DATA_TYPE                 AVG_COL_LEN
---------- ----------------- ------------------------ ------------------------- -----------
         1                 1 ATPDUR                   NUMBER                              2
         4                 4 FIRSTREPLENDATE          DATE                                1
         5                 5 LASTFRZSTART             DATE                                1
         6                 6 LASTPLANSTART            DATE                                1
        29                29 DRPTIMEFENCEDATE         DATE                                1
        36                36 MPSTIMEFENCEDATE         DATE                                1
        40                40 SHRINKAGEFACTOR          NUMBER                              2
        41                41 ITEM                     VARCHAR2                            2
        43                43 EXPDATE                  DATE                                2
        46                46 PRODSTARTDATE            DATE                                1
        47                47 PRODSTOPDATE             DATE                                1
        73                73 INTEGRATION_STAMP        DATE                                1
        74                74 INTEGRATION_JOBID        VARCHAR2                            8
        76                76 ERROR_STAMP              DATE                                1
                                                                                -----------
sum                                                                                      25


The total of the avg_col_len has gone up from 20 to 25 – this is two bytes each for the shrinkage_factor and item columns (a tiny average for the stored data, plus 1 for a length byte), and one extra byte for the expdate column (a tiny average for the stored data). All three values rounded up from “1 and a bit” to 2.

The avg_row_len, however, has gone up by only 2 – which I am going to assume is the two newlength bytes, ,and with no allowance for the impact of the one row in 10,000 that is now a few bytes longer. It looks as if the rounding rules for the row length may be different from the rounding (up) rules for the column length.

8 Comments »

  1. OK, I’ll bite (if only to kick off some better answers). There is going to be a null indicator for the first 76 columns as Oracle puts one in for every column up to the last column populated. But I suspect it does not store the value of the default for column 77 as it is a waste of time – Oracle knows it is the default if it is mandatory but empty. So I’ll say the length is 77.
    However, I do wonder if the float(126) columns are a curve ball…

    Comment by mwidlake — May 20, 2022 @ 6:41 pm BST May 20,2022 | Reply

  2. Martin,

    Thanks for playing and not cheating.
    I’ll give you three out of 10 for identifying points that might have been relevant.
    Wrong answer, though.

    Regards
    Jonathan Lewis

    Comment by Jonathan Lewis — May 20, 2022 @ 6:54 pm BST May 20,2022 | Reply

  3. The result of the last query cannot be determined with the given information – unless interr_skuplannparam is the only table in the current schema. If it is the only table, the expected result is 1 (row contains not null values) + 1 (size of Oracle’s representation of 0) + (7+1) (length of the not-null column in bytes plus one byte to store the string’s length) + 10*1 (because there are 10 date columns) = 20.

    Comment by Rene Nyffenegger — May 20, 2022 @ 6:56 pm BST May 20,2022 | Reply

    • Rene,

      Apologies for the late reply. Your analysis is correct, and I’m just adding a few notes to the article to expand on it (and adding a where clause to the query).

      Regards
      Jonathan Lewis

      Comment by Jonathan Lewis — May 31, 2022 @ 10:04 am BST May 31,2022 | Reply

  4. Hi Jonathan,
    Good one, as always!
    (Of course, I did cheat: couldn’t resist putting my assumptions to a couple of quick tests–so I won’t spoil anyone’s fun here by commenting further.)
    Very helpful in any case. Thanks!
    Regards,

    Comment by rvo — May 21, 2022 @ 11:07 am BST May 21,2022 | Reply

    • rvo,

      Thanks for the comment – and keeping quiet about the answer – though I hope you got the same result as Rene.

      Regards
      Jonathan Lewis

      Comment by Jonathan Lewis — May 31, 2022 @ 10:05 am BST May 31,2022 | Reply

  5. […] Quiz night 37 (May 2022): why avg_row_len may give you a very bad estimate of the space needed by your table data. […]

    Pingback by Quiz Catalogue | Oracle Scratchpad — May 31, 2022 @ 12:52 pm BST May 31,2022 | Reply

  6. […] Quiz – what’s the row length (May 2022): avg_row_len * num_rows isn’t always a good approximation to the total space your data needs – even when it’s simple Oracle types. […]

    Pingback by Fragmentation catalogue | Oracle Scratchpad — May 31, 2022 @ 6:14 pm BST May 31,2022 | Reply


RSS feed for comments on this post. TrackBack URI

Comments and related questions are welcome.

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Website Powered by WordPress.com.

%d bloggers like this: