Oracle Scratchpad

October 1, 2012

Row sizes 2

Filed under: fragmentation,Infrastructure,Oracle — Jonathan Lewis @ 6:34 am BST Oct 1,2012

In an earlier post I showed you how you could generate SQL to analyze the distribution of row sizes in a table. In the introduction to the code I made a comment about how it failed to “allow for nulls at the end of rows”; a feature of Oracle storage that isn’t commonly known is that a set of consecutive null columns at the end of a row take up no storage space, while any null columns followed by a non-null column take up one byte (holding the value 0xFF) per column so that Oracle can “count its way” through the null columns to the non-null column. Consider this example:

create table null_demo(
	n0 number,
	n1 number,
	n2 number,
	n3 number,
	n4 number,
	n5 number,
	n6 number,
	n7 number,
	n8 number,
	n9 number
);

SQL> insert into null_demo(n0, n9) values (0,0);

1 row created.

SQL> insert into null_demo(n0, n1) values (0,0);

1 row created.

SQL> commit;

The space used by the first row in this table is 15 bytes, the space used by the second is 7 bytes (ignoring the row-directory bytes), as we can see from this partial dump of the one block containing data (note the “tl” entry):

tab 0, row 0, @0x1f91
tl: 15 fb: --H-FL-- lb: 0x1  cc: 10
col  0: [ 1]  80
col  1: *NULL*
col  2: *NULL*
col  3: *NULL*
col  4: *NULL*
col  5: *NULL*
col  6: *NULL*
col  7: *NULL*
col  8: *NULL*
col  9: [ 1]  80
tab 0, row 1, @0x1f8a
tl: 7 fb: --H-FL-- lb: 0x1  cc: 2
col  0: [ 1]  80
col  1: [ 1]  80

The difference in length is due to the eight bytes (all holding 0xFF) which tell Oracle about the 8 null columns between n0 and n9 in the first row.

Although it would be silly in this particular case to worry about saving a tiny number of bytes we could, in principle, re-arrange the column order of this table to save some space – if the columns were in the order (n0, n1, n9, n2, n3, n4, n5, n6, n7, n8) the length of the second row would be unchanged, but the total length of the first row would drop from 15 to 8.

Now, in general, it’s not worth messing about with column ordering (although I’ll tell you about one special case later on) to save space but I have seen examples, usually relating to data warehouses, where millions of rows in a table have literally dozens of consecutive null columns; in fact, at one site I came across a table where the data content averaged about 120 bytes per row and the null columns accounted for about 180 bytes per row of 0xff “marker” bytes. If you think one or two of your tables fall into this category, can you prove it, and can you work out how much space you could save, and what the best column ordering might be ? The answer is yes – using code similar to the row-sizing code I showed you in the previous posting.

We simply need to construct a piece of SQL that selects every column in the table in the right order, converts nulls to spaces and non-null columns to a single character such as an asterisk or hyphen, then concatenates the result to create a string representing the pattern of data in the column; then we aggregate and count the different patterns. Here’s the code – as before, it’s only going to work for simple heap tables with the basic types:

declare
	m_string	varchar2(32000) := 'select null ';
	m_cursor	sys_refcursor;

	m_length	number;
	m_pattern	varchar2(1000);
	m_ct		number;

begin
	for r in (
		select column_name, data_type
		from	user_tab_columns
		where	table_name = upper('&m_table')		-- choose your TABLE
		and	data_type not in (
				'LONG',
				'LONG RAW',
				'CLOB',
				'NCLOB',
				'BLOB'
			)
		order by
			column_id
	) loop
		m_string := m_string ||
				' || decode (' ||
				r.column_name ||
				',null,'' '',''-'') ' ||	-- nulls turn into spaces, non-null to hyphen
				chr(10)
		;

	end loop;

	m_string :=
			'select length(rtrim(pattern)) length, Pattern, count(*) ct from (' ||
			m_string ||
			' pattern from &m_table ) group by pattern order by length(trim(pattern)), pattern'
	;	-- trim trailing spaces, then count and sort

--	dbms_output.put_line(m_string);

	dbms_output.put_line(
		rpad('Length',8) ||
		rpad('Pattern',40)   ||
		rpad('Count',12)
	);

	dbms_output.put_line(
		rpad('-',7,'-') || ' ' ||
		rpad('-',39,'-') || ' ' ||
		rpad('-',11,'-')
	);

	open m_cursor for m_string;
	loop
		fetch m_cursor
		into
			m_length, m_pattern, m_ct
		;

		exit when m_cursor%notfound;

		dbms_output.put_line(
			lpad(to_char(m_length,'FM999,999'),7) || ' ' ||
			rpad(m_pattern,39)   || ' ' ||
			lpad(to_char(m_ct,'FM999,999,999'),11)
		);

	end loop;

	close m_cursor;

end;
/

Here’s a sample of the output – it’s the obj$ table from a typical 11.2.0.3 install:

Length  Pattern                                 Count
------- --------------------------------------- -----------
     18 - --- -----  - ---                           52,369
     18 - --- -----  -----                            1,860
     18 - ---------  - ---                                3
     18 ----- -----  - ---                            4,209
     18 ----- -----  -----                               70
     18 -----------  - ---                              262

Immediately you can see that (for THIS system) we could save about 2 * 58,000 bytes if we moved columns 12 and 13 (remoteowner and linkname) to appear after the current column 18 (spare3). So let’s do a “create as select” to copy the columns in the new order, and see what the result looks like:

Length  Pattern                                 Count
------- --------------------------------------- -----------
     16 - --- ------ ---                             52,369
     16 - --- ----------                              1,860
     16 - ---------- ---                                  3
     16 ----- ------ ---                              4,210
     16 ----- ----------                                 70
     16 ------------ ---                                262

Looking at the patterns with the largest counts, the next columns to move look like the new columns 2 (dataobj#), 6 (subname), and 13 (oid$). We’ll put them after the new column 16 (spare 3). There are three “popular” patterns, and two of them have nulls in all three places, but the third has a non-null at column 2, so column 2 is the best choice to appear earliest in the re-arrangement then the order of subname and oid$ doesn’t really matter. Creating another copy of the table, this is the next pattern we get:

Length  Pattern                                 Count
------- --------------------------------------- -----------
     13 -------------                                52,369
     14 --------------                                4,211
     15 ------------- -                               1,860
     15 ---------------                                  70
     16 -------------  -                                  3
     16 -------------- -                                262

That’s another (52,000 * 3) + (4,200 * 2) + (1,860) bytes saved – and I don’t think I can do anything to improve things further.

There are two important points to note here: (a) it really wasn’t worth the effort for such a small table and (b) if I start creating lots of partitioned objects I’ll have to review the situation because I’ve moved the (frequently null) subname towards the end of the row and partitioned objects populate the subname column.

Footnote

I said there was a special case where you might have to think about column ordering – it’s the case where you add columns to a table, particularly a table where you’ve previously thought about null values and column ordering. Imagine adding (last_update_date, last_update_user) columns to a table because an audit requirement has suddenly been dumped in your lap – if there are mandatory columns the space impact they have may be far greater than the space needed to hold the column values themselves.

Imagine doing this to my modified obj$ table above. There are 21 columns in the table, most of them are null in the last 8 columns. If I add two non-null columns to the table that means an extra (possibly unexpected) 8 bytes added to most rows. It might be the difference between adding the data safely and making 10% of my rows migrate. Physical changes to data structures always come with odd little side effects – it’s nice if you can think about them in advance.

Warning

Although it is possible to minimise space usage by getting a sensible column ordering at the outset, it is possible to waste space if you rebuild a table, and the potential space saving from column re-ordering may be completely wiped out by the space lost by the fact that you’ve had to do a rebuild, so be careful – make sure you read this elderly post (which echoes a comment from Practical Oracle 8i) before doing ANY table rebuild.

7 Comments »

  1. Very nice article. We have done a similar analysis on our datawarehouse and carried out a column re-order excercise.
    we have saved 1.2TB of space (Tables + Indexes) from one of the data marts. We have arrived at new column order using different stratgy where we have got the number of nulls in each column and then did a CTAS of the original table with the new column order.

    Here is the cut-down version of the steps we have performed.

    select  /*+ parallel(x,16) */
    count(decode(C1,null,1,null)) C1,
    count(decode(C2,null,1,null)) C2,
    count(decode(C3,null,1,null)) C3,
    count(decode(C4,null,1,null)) C4,
    count(decode(C5,null,1,null)) C5,
    count(decode(C6,null,1,null)) C6,
    count(decode(C7,null,1,null)) C7,
    count(decode(C8,null,1,null)) C8,
    count(decode(C9,null,1,null)) C9
    from
    T1 x
    /
    

    We have ordered the resulting counts in ascending order to arrive at the new column order (we have used excel to transpose the output and plot some graphs etc).

    Column Name	Number of Nulls
    C1		0
    C8		0
    C7		4763
    C4		4763
    C9		5230377
    C6		11645479
    C3		88483772
    C5		252383865
    C2		2028039667
    

    Then we have CTAs’ed the table with new column order.

    The tables are huge and one of the table with 978 columns has gone down by 42% in size.

    Ananda

    Comment by raova — October 1, 2012 @ 10:58 am BST Oct 1,2012 | Reply

  2. An alternative approach to previous comment is if you were to run following query and create new order of columns, that would be just as good – provided you have gathered Statistics on the table. Then again, who doesn’t gather stats on their tables now a days? :)

      SELECT table_name, column_name, column_id, num_nulls, num_distinct
        FROM all_tab_columns
       WHERE owner = :owner
         AND table_name = :table_name
    ORDER BY num_nulls, column_id
    

    Comment by guest — October 1, 2012 @ 3:01 pm BST Oct 1,2012 | Reply

  3. May be true, provided the stats gather was done with 100% sample size to get an accurate value for num_nulls.

    Comment by raova — October 1, 2012 @ 3:10 pm BST Oct 1,2012 | Reply

  4. The “optimizer lady” said auto_sample_size is as accurate as 100%. Regardless, in this particular case, 95%+ accuracy is good enough for most people.

    Comment by guest — October 1, 2012 @ 5:28 pm BST Oct 1,2012 | Reply

  5. Table compression will reduce such a lists of null columns to about one byte for all of them.

    Comment by Bernard — October 2, 2012 @ 2:50 pm BST Oct 2,2012 | Reply

  6. 10.2.0.3 has this 255 column restriction on compression so we haven’t tried that option.

    Comment by raova — October 2, 2012 @ 3:26 pm BST Oct 2,2012 | Reply


RSS feed for comments on this post. TrackBack URI

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

The Rubric Theme. Blog at WordPress.com.

Follow

Get every new post delivered to your Inbox.

Join 3,528 other followers