Oracle Scratchpad

May 29, 2021

Partitioning existing data

Filed under: Infrastructure,Oracle — Jonathan Lewis @ 10:35 am BST May 29,2021

I started drafting this note 3 years ago and never got round to finishing it, but I rediscovered it today while looking for something I had written about the “alter table … modify” command options that you can use to convert a non-partitioned table into a partitioned table, or change the partitioning of an already partitioned table (which may need extra preparatory work). I’ve left the opening comments and list of thoughts in place, but there are two particular details I want to talk about in a little more detail because it’s possible that they may turn into relatively commonly occurring issues that will stopyou using this mechanism.

So let’s go back three years.


When someone asks the question, as happened on the ODC (OTN) database forum a little while ago [ed: March 2018]: “How do I partition a 90GB table?” there’s really only one correct (though somewhat cruel) answer: “If you have to ask the question, you probably shouldn’t.”

There are so many possible side effects to partitioning that if you’ve managed to work out what column(s) to partition on, what partitioning strategy is appropriate, what partitioning maintenance operations you will have to handle, what problems you might have with referential integrity, and how you’re going to re-engineer the indexes then the question of how to do the physical re-arrangement is trivial by comparison. Let’s assume we have done all the other homework, though, and consider possibilities.

  • Would partitioning “from now on” be acceptable, or do you have to partition the past ?
  • Is there down-time for the operation, or does it have to be done online.
  • Whether online or “offline” you need a lot of space somewhere to duplicate the data (and, possibly, the indexes)
  • Could you benefit from different storage attibutes for different (e.g. aged) partitions: pctfree 0, basic compresion
  • Could you take advantage of new features as you partition (partial indexes, attribute clustering)
  • Have you allowed time for re-creating indexes on the copy
  • Will you need, or could you benefit from, changes to indexes – local / global / removing the partiitoning column
  • Do you need to allow time for re-validating referential integrity (both ways, perhaps)
  • Do you have a library of scripts that could handle all the referential integrity, privileges, etc.
  • Do you have a standby, and what are the implications for refreshing it or keeping it in synch
  • Do you have any columns marked unused that you would like to see dropped.
  • Do you have any columns that were “fast add default” columns that could be cleaned up

Today (May 2021) I had a reason to consider the last two points in that list and how well the options in “alter table XXX modify” for partitioning might handle them. I’ll take the unused columns first because their implementation is the more commonly known.

Unused Columns

When you set a column unused, it gets flagged and renamed in the data dictionary but any data stored in that column stays in place (invisibly) in the segment until you subsequently drop the column – and dropping a column is a very expensive thing to do, dropping multiple columns is even worse as Oracle drops each column for each row as a separate pair of undo and redo vectors. I wrote an article about this for Simpletalk a few years ago – the figures in the article came from an instance running 12.1.0.2 but the behaviour is the same in 19.11.0.0.

So marking a column unused is quick and cheap, dropping it is hideously slow and expensive. However it would be good to have a mechanism for getting rid of the column without the awful overheads because if you don’t do so the day will come when someone, someday, will accidentally “drop column” when they really meant “set unused” and then your production system will effecetively stop until the whole operation is complete.

This is where I thought the 12.2 option for online turning a non-partitioned table into a partitioned table or even the simple “alter table XXX  move online” might help. These commands will copy every single row in the table from a source copy to a destination copy (dealing with all the indexes etc.) so they could (in principle) rewrite the rows without the unused columns and then fix up the data dictionary at the end of the move/modify.

They don’t.

They get a little way towards cleaning the table up, all the unused columns are set to null while the rows are copied so you get the space back – and it’s more likely to be usable and safe to use than it would be with the “drop column” command, but the column is still there (with an obscure name in the data dictionary telling you when it was dropped (e.g. SYS_C00002_21052719:24:01$ was once the second column in the table, marked unused at 19:24:01 on 27th May 2021). So the table is still exposed to the risk of someone coming along and issuing a system-threatening “drop column”. And, of course, it’s still sitting there contributing to the 255 “real-column” limit where the row has to start splitting into multiple row-pieces.

Fast Add Nullable (FAN) Default

What about tables where you’ve added a column with a default value.  In 11g Oracle introduced a mechanism that stored the value in the data dictionary if the column were declared as not null and didn’t have to update every row in the table. In 12c the mechanism was extended to include nullable columns by adding an extra column on the table to flag rows that were inserted or had the default value updated since the column was added. (Each bit in this extra column corresponds to one FAN column in the table.)

Again you a might think that “alter table … move” and “alter table … modify” would be able to tidy things up by filling in the default value as they rewrite the table. And, just as we’ve seen with unused columns, they nearly do the job  – but they still leave the spare (and now apparently redundant) mapping column in place with all the infrastructure that would allow Oracle to work out whether or not a null meant null.

But there’s more to this than just the a little bit of infrastructure in this case because the optimizer and run-time engine carry on doing all the work that they did when the mapping column was needed. Here’s a little code to demonstrate the point:


rem
rem     Script:         fast_default_add_col.sql
rem     Author:         Jonathan Lewis
rem     Dated:          May 2019
rem     Purpose:        
rem
rem     Last tested 
rem             19.11.0.0
rem             18.3.0.0
rem             12.2.0.1
rem

create table t1
as
with generator as (
        select 
                rownum id
        from dual 
        connect by 
                level <= 1e4    -- > comment to avoid WordPress format issue
)
select
        rownum                          id,
        lpad(rownum,10,'0')             v1,
        lpad('x',100,'x')               padding
from
        generator       v1
where
        rownum <= 1e3   -- > comment to avoid WordPress format issue
;

alter table t1 add (def_val varchar2(10) default 'XXX' /* not null */);

column data_default format a20

select
        column_name, column_id, segment_column_id, internal_column_id, avg_col_len, data_default
from
        user_tab_cols
where
        table_name = 'T1'
order by
        column_id
;

insert into t1(id, v1, padding, def_val) values(-1, 'x', 'x', 'xz');
insert into t1(id, v1, padding, def_val) values(-2, 'x', 'x', null);
update t1 set def_val = 'yy' where id = 1;
commit;

column sys_nc00004$ format a32

select distinct sys_nc00004$, count(*) 
from t1 
group by sys_nc00004$
;

I’ve created table with 1,000 rows, added a (nullable) column with default value, then inserted two rows and update the new column of a pre-existing row. I’ve run a query to show that there’s a “spare” system generated column which is a real (stored in the segment) column. I’ve then reported the aggregate information about this sytem generated column. Here are the results of the two queries:

OLUMN_NAME           COLUMN_ID SEGMENT_COLUMN_ID INTERNAL_COLUMN_ID AVG_COL_LEN DATA_DEFAULT
-------------------- ---------- ----------------- ------------------ ----------- --------------------
ID                            1                 1                  1           4
V1                            2                 2                  2          11
PADDING                       3                 3                  3         101
DEF_VAL                       4                 5                  5           4 'XXX' /* not null */
SYS_NC00004$                                    4                  4           0


SYS_NC00004$                       COUNT(*)
-------------------------------- ----------
01                                        3
                                        999

If you cared to query t1 where sys_nc00004$ is not null you’d see that the three rows from the aggregate report were the two I’d inserted plus the one I’d updated.

So let’s run a couple of queries against the data and check their execution plans – first a simple aggregate query on the def_val column:


set serveroutput off

set feedback only
select def_val, count(*) from t1 group by def_val;
set feedback on
select * from table(dbms_xplan.display_cursor(format=>'+projection'));

Plan hash value: 136660032

---------------------------------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |       |       |     5 (100)|          |
|   1 |  HASH GROUP BY     |      |     1 |     4 |     5  (20)| 00:00:01 |
|   2 |   TABLE ACCESS FULL| T1   |  1000 |  4000 |     4   (0)| 00:00:01 |
---------------------------------------------------------------------------

Column Projection Information (identified by operation id):
-----------------------------------------------------------
   1 - DECODE(TO_CHAR(SYS_OP_VECBIT("SYS_NC00004$",0)),NULL,NVL("DEF_VAL
       ",'XXX'),'0',NVL("DEF_VAL",'XXX'),'1',"DEF_VAL")[10], COUNT(*)[22]
   2 - DECODE(TO_CHAR(SYS_OP_VECBIT("SYS_NC00004$",0)),NULL,NVL("DEF_VAL
       ",'XXX'),'0',NVL("DEF_VAL",'XXX'),'1',"DEF_VAL")[10]

The Column Projection Information tells use that Oracle is not just looking at the def_val column to produce a result for this query. The optimizer has used the data dictionary information to construct a decode (not case !) expression that will be evaluated for every row. The expression checks the system generated column to check the bit relevant to the def_val column and uses it to decide whether to report the def_val column itself, or to report the default value from the data dictionary.  (It is possible to get the same result using a decode with just two components, but I think the Oracle developer may have decided that the longer expression would be decoding “most likely first”.)

The same complicated expression appears as the projection for operation 1, but I don’t think this means Oracle is actually re-evaluating the expression – after all def_val isn’t being projected upwards from operation 2 so isn’t available for slotting into the expression.

Here’s another example that might be used – find all the rows where null was explicitly supplied:

set feedback only
select * from t1 where def_val is null;
set feedback on
select * from table(dbms_xplan.display_cursor(format=>'+projection'));

Plan hash value: 3617692013

--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      |       |       |     4 (100)|          |
|*  1 |  TABLE ACCESS FULL| T1   |     1 |   116 |     4   (0)| 00:00:01 |
--------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter(DECODE(TO_CHAR(SYS_OP_VECBIT("SYS_NC00004$",0)),NULL,NVL("
              DEF_VAL",'XXX'),'0',NVL("DEF_VAL",'XXX'),'1',"DEF_VAL") IS NULL)

Column Projection Information (identified by operation id):
-----------------------------------------------------------
   1 - "T1"."ID"[NUMBER,22], "T1"."V1"[VARCHAR2,40],
       "T1"."PADDING"[VARCHAR2,100], DECODE(TO_CHAR(SYS_OP_VECBIT("SYS_NC00004$
       ",0)),NULL,NVL("DEF_VAL",'XXX'),'0',NVL("DEF_VAL",'XXX'),'1',"DEF_VAL")[
       10]

As you can see, the same decode() expression appears in both the Predicate Information and as a column projection. Again, this probably doesn’t mean it has to be evaluated twice per row but there’s clearly a CPU overhead for taking advantage of the fast add default mechanism – though it might be possible to find cases where this is offset by the improved data packing due to shorter rows.

Eventually you might decide to tidy things up a bit with a simple ‘alter table move’ (or, as for the rest of this note, ‘alter table … modify” so what changes if you do this?

The actual default value will get written into the row, but apart from that nothing changes – the mapping column stays exactly where it is, none of its bits are altered, and (possibly most importantly) the complex decode() expression doesn’t go away. It’s another case where there seems to be an opportunity for the code to do a more thorough job.

tl;dr

From 12c onwards when “alter table move” and “alter table … modify “ are going to rewrite the entire table you might expect them to tidy up all the mess introduced by setting columns unused or introducing nullable columns with default values . On the plus side they do clean the data, but they don’t clean the metadata as thoroughly as they could do and this does leave processing overheads continuing into the future.

Footnote

You will often hear the warning “The execution plans from Explain Plan may be telling lies”. The new mechanisms for adding columns with default values gives it a lovely opportunity to do so. Here’s what explain plan does with the last query above:


explain plan for
select * from t1 where def_val is null;

select * from table(dbms_xplan.display(format=>'projection'));

lan hash value: 3617692013

--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      |     1 |   116 |     4   (0)| 00:00:01 |
|*  1 |  TABLE ACCESS FULL| T1   |     1 |   116 |     4   (0)| 00:00:01 |
--------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter("DEF_VAL" IS NULL)

Column Projection Information (identified by operation id):
-----------------------------------------------------------
   1 - "T1"."ID"[NUMBER,22], "T1"."V1"[VARCHAR2,40],
       "T1"."PADDING"[VARCHAR2,100], "DEF_VAL"[VARCHAR2,10]

 

Leave a Comment »

No comments yet.

RSS feed for comments on this post. TrackBack URI

Comments and related questions are welcome.

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Website Powered by WordPress.com.