Oracle Scratchpad

November 28, 2022

Hakan Factor

Filed under: Infrastructure,Oracle,Performance — Jonathan Lewis @ 3:14 pm GMT Nov 28,2022

There’s a question on the MOSC forum (needs an account) at present that started with the performance of the datapump API over a database link but moved on to the topic of how to handle a scenario that I’ve described in the past involving a table where rows are intially short and eventually become much longer and a requirement comes up to rebuild the table.

In this case the OP has to use datapump (selecting truncate as the “action on existence”) to copy the table data from one place to another rather then doing the more common ‘alter table move’ variant of rebuilding the table.

The underlying problem in this case is that:

  • the table has 84 columns made up of (pk_col1, pk_col2, flag, change_date) plus 20 groups of 4 “value” columns.
  • rows are inserted with just the four main columns and the first group of four values.
  • over time each subsequent group of 4 values in a row is updated in a separate statement

We haven’t been given numbers but a row probably ends up taking about 10 times the space it started with – and if that’s the case you would normally need to set the table’s pctfree to something like 90 to avoid getting a lot of migrated rows in the table. But that’s not the whole of the story.

Things to go wrong

If you don’t set pctfree to 90 you get lots of migrated rows. If you then do an export (expdp) in direct_path mode expdp will do a large number of single block reads following the migrated rows, and Oracle won’t cache the follow-on blocks, so you may re-read them several times in the course of reading one block in the direct path tablescan. (For cached reads the codepath for a tablescan will simply ignore the “head pointer” to a migrated row because it “knows” that it will find the whole row in some other block eventually.)

Update: the OP was using the PL/SQL API to control the import, and there didn’t seem to be an option for avoiding the direct path select at the opposite end of the link. In fact this was an omission in the PL/SQL Packages reference but there is a command line option access_method=insert_as_select and this is available in the API through a call to:

|                BEGIN    
|                        SYS.DBMS_DATAPUMP.SET_PARAMETER(
|                                handle => :JOBHNDL,
|                                name   => 'data_access_method',
|                                value  => 'INSERT_AS_SELECT'
|                        );
|                END;

My thanks to Daniel Overby Hansen for pointing this out and arranging for an update to the documentation for 23c.

If you do set pctfree to 90 then when you rebuild the table (or recreate it with pctfree set to 90) than you end up with a much larger table with lots of blocks that are only 10% used because most of the rows are now big and aren’t going to grow any more.

Best strategy – the Hakan factor.

Work out how many rows in their final state will fit into a block and recreate the table telling Oracle that that’s the maximum number of rows it’s allowed to put in a block. (You could also set pctfree to zero at the same time to minimise the chance of Oracle inserting fewer rows than your target.)

The devil, of course, is in the detail. Part of the devilry comes from a bug that was fixed some time as far back as 10.2.0.1. Part comes from the fact that Oracle doesn’t give us a documented API to set the magic number – we have to find a way to teach Oracle about the number or hack the data dictionary. Part, inevitably, comes from the fact that when dealing with undocumented (or barely documented) mechanisms you ought to set up some test cases to check that the latest version of Oracle behaves the same way as your previous versions of Oracle when you’re playing dirty tricks.

Part 1 – Teaching Oracle.

You may know your data so well that you can immediately say how many “full-length” rows should should fit a block. If you can’t do this you could simply create a copy of the original table structure with a pctfree of zero then copy into it a few hundred rows from the original table using a predicate to limit the selected rows to ones that would not be updated any further. For example (using the table definition supplied by the OP) you might say:

create table test_tab_clone 
pctfree 0 
as 
select  * 
from    test_tab 
where   rownum = 0
/

insert into test_tab_clone 
select  * 
from    t1 
where   rownum <= 400 
and     fourthvalue19 is not null
/

commit
/

I’m assuming in this case column “fourthvalue19” will only be non-null only if the whole of the 19th set of values is populated and all the other sets of values are populated. From the OP’s perspective there may be a more sensible way of identifying fully populated rows. You do need to ensure that the table has at least one full block otherwise some odd things can happen when you try to set the Hakan factor.

Once you’ve got a small table of full size rows a simple analysis of rows per block is the next step:

select
        rows_starting_in_block,
        count(*)        blocks
from
        (
        select
                dbms_rowid.rowid_relative_fno(rowid),
                dbms_rowid.rowid_block_number(rowid),
                count(*)                                rows_starting_in_block
        from
                test_tab_clone
        group by
                dbms_rowid.rowid_relative_fno(rowid),
                dbms_rowid.rowid_block_number(rowid)
        )
group by
        rows_starting_in_block
order by
        rows_starting_in_block
/

ROWS_STARTING_IN_BLOCK     BLOCKS
---------------------- ----------
                     3          1
                    18         22
                    19          1
                       ----------
sum                            24

Looking at these results I can see that there’s a slight variation in the number of rows that could be crammed into a block – and one block which holds the last few rows of my insert statement which I can ignore. In a more realistic case you might need to tweak the selection predicate to make sure that you’ve picked only full-size rows; or you might simply need to decide that you’ve got a choice of two or three possible values for the Hakan factor and see what the results are from using them.

With the same figures above I’d be strongly inclined to set a Hakan factor of 18. That does mean I might be “wasting” roughly 1/19th of every block (for the relatively cases where a 19th row would have fitted) but it’s better than setting the Hakan factor to 19 and finding I get roughly 1 row in every 19 migrating for 22 blocks out of 23 where I should have restricted the number of rows per block to 18; the choice is not always that straightforward.

So here’s how we now “train” Oracle, then test that it learned the lesson:

truncate table test_tab_clone;
insert into test_tab_clone select * from test_tab where rownum <= 18;
alter table test_tab_clone minimize records_per_block;

truncate table test_tab_clone;
insert into test_tab_clone select * from all_objects where rownum <= 10000;

start rowid_count test_tab_clone

ROWS_STARTING_IN_BLOCK     BLOCKS
---------------------- ----------
                    10          1
                    18        555
                       ----------
sum                           556

In the first three statments I’ve emptied the table, inserted 18 rows (I ought to check they all went into the same block, really) and set the Hakan factor.

Once the Hakan factor is set I’ve emptied the table again then populated it with the “full” data set. In fact for demo purposes I’ve copied exactly 10,000 rows so that we can see that every block (except, we can safely assume, the last one written to) has acquired exactly 18 rows.

Part 2 – applying the strategy

It’s often easy to sketch out something that looks like as if it’s exactly what you need, but there are always peripheral considerations that might cause problems and an important part of examining a problem is to consider the overheads and penalties. How, for example, is our OP going to apply the method in production.

There are two problems

  • It’s a large table, and we’re cloning it because we can’t hack directly into the data dictionary to modify the table directly. What are the side effects?
  • We want the imported export to acquire the same Hakan factor. Do we have to take any special action?

The import is the simpler problem to consider since it’s not open-ended. As far as impdp is concerned we could import “data_only” or “include_metadata”, and the “table_exists_action” could be either replace or truncate, so there are only 4 combinations to investigate.

The bad news is that none of the options behaves nicely – impdp (tested on 19.11.0.0) seems to import the data then execute the “minimize records_per_block” command when really it should transfer the Hakan factor before importing the data. So it seems to be necessary to go through the same convoluted steps at least once to precreate a target table with the desired Hakan factor and thereafter use only the truncate option for the import if you want to make the target behave in every way like the source. (Even then you will need to watch out for extreme cases if the export holds fewer rows than the value you’ve set for the Hakan factor – with the special case that if the exported table is empty the attempt by the import to set the Hakan factor raises error “ORA-28603: statement not permitted on empty tables”.)

Let’s get back to the side effects of our cloning exercise on the source table. We’ve created a copy of the original data with a suitable Hakan factor so that blocks holding “completed” rows are full and 1blocks holding “in-gransit” rows have enough space to grow to their “completed” size and there are no migrated rows – and we don’t expect to see migrated rows in the future. But it’s not the right table, and to ensure we had a complete copy we would have stopped all processing of the source table.

Could we have avoided the stoppage? Maybe we could use the dbms_redefinition package – the OP is running Standard Edition so can’t do online redefinition any other way – and use the Hakan hack mechanism on the “interim” table immediately after creating it.

If we find that the online redefinition mechanism generates too much undo and redo we’ll have to use the blocking method – but then we have to do some table renaming and worry about PL/SQL packages becoming invalid, and foreign key constraints, synonyms, views etc. being associated with the wrong table.

So even though we can sketch out with an outline strategy there are still plenty of details to worry about around the edges. To a large degree this is because Oracle has not yet made the Hakan factor a “proper” property of a table that you can explicitly set in a “move” or “create table” operation . There is a function embedded in the executable (kkdxFixTableHAKAN) that looks as if it should set the Hakan factor, and there is presumably some piece of code that sets the Hakan factor when you exectute a call to “create table for exchange”, it would be nice if there was an API that was visible to DBAs.

Summary

If you have a table where rows grows significantly over their lifetime, you ought to ensure that you’ve set a suitable pctfree for the table. But if you anticipate copying, or moving the table at any time then there’s no way to pick a pctfree that is good for all stages of the data’s lifetime.

There is a feature that you can impose on the data to avoid the problems of extreme change in row-lengths and it’s fairly straightforward to impose on a single table but there is no API available to manipulate the feature directly and if you don’t anticipate the need during the initial design stage then applying the feature after the event can be an irritating and resource-intensive operation.

Footnote

For those not familiar with it, the Hakan Factor was introduced by Oracle to allow a little extra efficiency in the compression and use of bitmap indexes. If Oracle has information about the largest number of rows that can appear in any block in a table it can minimise the number of bits needed per block (space saving) and avoid having to expand and compare unnecessarily long sequences of zero bits when comparing entries across bitmap indexes. Given their intended use it should come as no surprise that you can’t call “minimize records_per_block” for a table that has an existing bitmap index.

1 Comment »

  1. […] Hakan Factor (Nov 2022): A detailed look at handling a problem with rows that grow significantly over time. […]

    Pingback by Infrastructure Catalogue | Oracle Scratchpad — December 3, 2022 @ 3:37 pm GMT Dec 3,2022 | Reply


RSS feed for comments on this post. TrackBack URI

Comments and related questions are welcome.

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Website Powered by WordPress.com.

%d bloggers like this: