It’s a long time since I’ve done a quiz night – but here’s something that fooled me (briefly) when it appeared on the Oracle Developers’ Forum. Here’s a table definition – and I’m not going to make it easy by giving you a “create table” statement – but it’s just a simple heap table:
sql> desc interr_skuplannparam
name null? type
-------------------------------- -------- -------------------
atpdur number(38)
depdmdopt number(38)
externalskusw number(1)
firstreplendate date
lastfrzstart date
lastplanstart date
plandur number(38)
planleadtime number(38)
planleadtimerule number(38)
planshipfrzdur number(38)
restrictdur number(38)
allocbatchsw number(1)
cmpfirmdur number(38)
custservicelevel float(126)
maxchangefactor float(126)
mfgleadtime number(38)
recschedrcptsdur number(38)
cpppriority number(38)
cpplocksw number(1)
criticalmaterialsw number(1)
aggexcesssupplyrule number(38)
aggundersupplyrule number(38)
bufferleadtime number(38)
maxoh float(126)
maxcovdur number(38)
drpcovdur number(38)
drpfrzdur number(38)
drprule number(38)
drptimefencedate date
drptimefencedur number(38)
incdrpqty float(126)
mindrpqty float(126)
mpscovdur number(38)
mfgfrzdur number(38)
mpsrule number(38)
mpstimefencedate date
mpstimefencedur number(38)
incmpsqty float(126)
minmpsqty float(126)
shrinkagefactor number(38)
item varchar2(50 char)
loc varchar2(50 char)
expdate date
atprule number(38)
prodcal varchar2(50 char)
prodstartdate date
prodstopdate date
orderingcost float(126)
holdingcost float(126)
eoq float(126)
ff_trigger_control number(38)
workingcal varchar2(50 char)
lookaheaddur number
orderpointrule number
orderskudetailsw number(1)
supsdmindmdcovdur number(38)
orderpointminrule number(38)
orderpointminqty float(126)
orderpointmindur number(38)
orderuptolevelmaxrule number(38)
orderuptolevelmaxqty float(126)
orderuptolevelmaxdur number(38)
aggskurule number(38)
fwdbuymaxdur number(38)
costuom number(38)
cumleadtimedur number(38)
cumleadtimeadjdur number(38)
cumleadtimerule number(38)
roundingfactor float(126)
limitplanarrivpublishsw number(1)
limitplanarrivpublishdur number
maxohrule number(1)
integration_stamp date
integration_jobid not null varchar2(32 char)
error_str varchar2(2000 char)
error_stamp date
The column integration_jobid (the single “not null” column) has been defined with the default value of “INT_JOB”, which takes 7 bytes to store. What’s the result of the query at the end of this little script:
truncate table interr_skuplannparam;
insert into interr_skuplannparam (atpdur)
select 0
from all_objects
where rownum <= 10000
/
commit;
execute dbms_stats.gather_table_stats(user,'interr_skuplannparam')
select avg_row_len
from user_tables
where table_name = 'INTERR_SKUPLANNPARAM' -- added later
;
Hint: the value zero is represented internally as a single byte holding the value 0x80 (decimal 128).
Answer
It’s fairly common knowledge that Oracle includes a “length byte” in the estimates for average column length and average row length. It’s also fairly well known that “trailing nulls take no space” in a row so don’t need a length byte. There are a couple of “less common knowledge” details to add, though:
- any column that is null in every row gets a zero for the avg_col_len even if it isn’t a “trailing null”.
- but point 1 doesn’t apply to date columns that are always null, they get a 1 for the avg_col_len even the column is a “trailing null”.. The same is true for the various timestamp and interval types.
- for columns that hold at least one value the avg_col_len is the average over all rows of the actual space used by that column’s data, rounded up, plus 1 for the length byte.
- the avg_row_len is not the sum(avg_col_len) it is based on the average of the summed column lengths for each row, plus the count of the length bytes recorded.
- User defined type, LOBs, varray types etc. introduce all sorts of other strange effects. (But that’s probably “more common” knowledge.
So what does that mean in my example where there’s a declared not null column near the end of the row, with two trailing columns and with every column except the first and the non-null column set to null for every single row in the table? The easy option is to create the model and show you the results of querying user_tab_cols.
break on report
compute sum of avg_col_len on report
select avg_row_len
from user_tables
where table_name = 'INTERR_SKUPLANNPARAM'
;
select
column_name, data_type, avg_col_len
from
user_tab_cols
where
table_name = 'INTERR_SKUPLANNPARAM'
and avg_col_len != 0
order by
column_id
;
============
AVG_ROW_LEN
-----------
20
1 row selected.
COLUMN_ID SEGMENT_COLUMN_ID COLUMN_NAME DATA_TYPE AVG_COL_LEN
---------- ----------------- ------------------------ ------------------------- -----------
1 1 ATPDUR NUMBER 2
4 4 FIRSTREPLENDATE DATE 1
5 5 LASTFRZSTART DATE 1
6 6 LASTPLANSTART DATE 1
29 29 DRPTIMEFENCEDATE DATE 1
36 36 MPSTIMEFENCEDATE DATE 1
43 43 EXPDATE DATE 1
46 46 PRODSTARTDATE DATE 1
47 47 PRODSTOPDATE DATE 1
73 73 INTEGRATION_STAMP DATE 1
74 74 INTEGRATION_JOBID VARCHAR2 8
76 76 ERROR_STAMP DATE 1
-----------
sum 20
12 rows selected.
My query of user_tab_cols orders by column_id, technically it should order by segment_column_id to show the physical ordering in the data segment to allow for all the strange effects you can get in more complex scenarios, but in this very simple case the two values are the same.
As you can see, every date type (including the trailing error_stamp) has an avg_col_len of 1, even though all the dates are null in every row. Column atdpur has avg_col_len = 2, which is 1 byte for storing zero plus a length byte and integration_job_id has avg_col_len = 8, which is 7 bytes for storing ‘INT_JOB’ plus a length byte.
In this case where every single row is identical there are no rounding effects due to the calculation of average column length (the column data stored is the same in every row) so the avg_row_len = sum(avg_col_len).
Conclusion
This was an extreme (though real) production) case. I wrote this note because someone was asking why there was such a massive difference between their estimate of the space needed for a table and the actual space taken by the table.
There are many reasons why this could happen, of course, but in their case their estimate was based on multiplying the avg_row_len by the number of rows in the table – and they had a large number of columns that were always null but still required space for the length byte, so the avg_row_len was about half the actual stored space for the average row. (And then the avg_row_len doesn’t allow for the row overhead and row directory entry, which total another 5 bytes per row).
Lagniappe
It wasn’t part of the quiz, but once you’ve got a simple model you can always start playing with it to find out a little more, and to check how well your guesswork holds up as you make the scenario more complicated. For example, let’s update a few columns in just one row of the table, gather stats again, and report the row and column lengths.
update interr_skuplannparam
set
shrinkagefactor = 1234567890,
item = 'xxx',
expdate = sysdate
where
rownum = 1
/
commit;
execute dbms_stats.gather_table_stats(user,'interr_skuplannparam')
select avg_row_len
from user_tables
where table_name = 'INTERR_SKUPLANNPARAM'
;
select
column_id, segment_column_id, column_name, data_type, avg_col_len
from
user_tab_cols
where
table_name = 'INTERR_SKUPLANNPARAM'
and avg_col_len != 0
order by
column_id
;
============================
AVG_ROW_LEN
-----------
22
1 row selected.
COLUMN_ID SEGMENT_COLUMN_ID COLUMN_NAME DATA_TYPE AVG_COL_LEN
---------- ----------------- ------------------------ ------------------------- -----------
1 1 ATPDUR NUMBER 2
4 4 FIRSTREPLENDATE DATE 1
5 5 LASTFRZSTART DATE 1
6 6 LASTPLANSTART DATE 1
29 29 DRPTIMEFENCEDATE DATE 1
36 36 MPSTIMEFENCEDATE DATE 1
40 40 SHRINKAGEFACTOR NUMBER 2
41 41 ITEM VARCHAR2 2
43 43 EXPDATE DATE 2
46 46 PRODSTARTDATE DATE 1
47 47 PRODSTOPDATE DATE 1
73 73 INTEGRATION_STAMP DATE 1
74 74 INTEGRATION_JOBID VARCHAR2 8
76 76 ERROR_STAMP DATE 1
-----------
sum 25
The total of the avg_col_len has gone up from 20 to 25 – this is two bytes each for the shrinkage_factor and item columns (a tiny average for the stored data, plus 1 for a length byte), and one extra byte for the expdate column (a tiny average for the stored data). All three values rounded up from “1 and a bit” to 2.
The avg_row_len, however, has gone up by only 2 – which I am going to assume is the two newlength bytes, ,and with no allowance for the impact of the one row in 10,000 that is now a few bytes longer. It looks as if the rounding rules for the row length may be different from the rounding (up) rules for the column length.