Oracle Scratchpad

March 15, 2018

Keeping Intervals

Filed under: 12c,Oracle,Partitioning — Jonathan Lewis @ 8:03 am BST Mar 15,2018

I’ve recently been reminded of a blog post I wrote a couple of years ago that discussed the issue of running into the hard limit of 2^20 -1 as the number of segments for a (composite) partitioned table – a problem that could arise in a relatively short time if you used a large number of hash subpartitions in an interval/hash composite partitioned table (you get about 2 years and 10 months of daily partitions at 1,024 subpartitions per day, for example).

A natural follow-on from that article is to think through a strategy for dropping old partitions sufficiently early that you don’t hit the limit as new partitions are created. This, of course, pretty much defeats the point of interval partitioning – instead of planning to add partitions “just in time” you now have to eliminate them “just in time”. Amongst other issues, we’re going to find that interval partitioning manages to re-introduce a problem with range partitioning that Oracle got rid of in Oracle 10g.

So let’s test the obvious option: drop the oldest partition(s) in time to keep head-room for new partitions; for convenience we’ll start with a simple interval partitioned table with a few pre-declared range partitions and a few automatically generated interval partitions. All the examples here were run under 12.1.0.2:


rem
rem     Script:         pt_merge.sql
rem     Author:         Jonathan Lewis
rem     Dated:          Feb 2018
rem

create table t1(id, v1, padding)
partition by range (id) interval (1e4)
(
        partition p10000 values less than (1e4),
        partition p20000 values less than (2e4),
        partition p30000 values less than (3e4),
        partition p40000 values less than (4e4),
        partition p50000 values less than (5e4)
)
nologging
as
with generator as (
        select
                rownum id
        from dual
        connect by
                level <= 1e4 -- > comment to avoid WordPress format issue
)
select
        rownum                          id,
        lpad(rownum,10,'0')             v1,
        lpad('x',100,'x')               padding
from
        generator       v1,
        generator       v2
where
        rownum <= 1e5 -- > comment to avoid WordPress format issue
;


begin
        dbms_stats.gather_table_stats(
                ownname     => user,
                tabname     => 'T1',
                method_opt  => 'for all columns size 1'
        );
end;
/

SEGMENT_NAME              PARTITION_NAME         HEADER_BLOCK     BLOCKS
------------------------- ---------------------- ------------ ----------
T1                        P10000                          128        256
T1                        P20000                          384        256
T1                        P30000                          640        256
T1                        P40000                          896        256
T1                        P50000                         1152        256
T1                        SYS_P69838                     1408        256
T1                        SYS_P69839                     1664        256
T1                        SYS_P69840                     1920        256
T1                        SYS_P69841                     2176        256
T1                        SYS_P69842                     2432        256
T1                        SYS_P69843                     2688        128

11 rows selected.


I’ve created 100,000 rows and since the partitions I’ve pre-declared have an (unreachable) upper bound of only 50,000 Oracle will have added a further 6 partitions to the table to hold the data for values up to 110,000 (with just one row in the last partition). For testing purposes I’ve created the table in an otherwise empty tablespace so when I check the block address of each segment I can see the location (and size) of the segments so far. So here’s the list of names and locations:

SEGMENT_NAME              PARTITION_NAME         HEADER_BLOCK     BLOCKS
------------------------- ---------------------- ------------ ----------
T1                        P10000                          128        256
T1                        P20000                          384        256
T1                        P30000                          640        256
T1                        P40000                          896        256
T1                        P50000                         1152        256
T1                        SYS_P69838                     1408        256
T1                        SYS_P69839                     1664        256
T1                        SYS_P69840                     1920        256
T1                        SYS_P69841                     2176        256
T1                        SYS_P69842                     2432        256
T1                        SYS_P69843                     2688        128

11 rows selected.

No surprises so far. So let’s pretend we know the dreaded ORA-14299 or ORA-14300 will be arriving soon and try to drop the first 5 partitions to keep the partition count below the limit. Here’s a cut-n-paste from an SQL*Plus session that tries to do that one partition at a time:

SQL> alter table t1 drop partition p10000;

Table altered.

SQL> alter table t1 drop partition p20000;

Table altered.

SQL> alter table t1 drop partition p30000;

Table altered.

SQL> alter table t1 drop partition p40000;

Table altered.

SQL> alter table t1 drop partition p50000;
alter table t1 drop partition p50000
                              *
ERROR at line 1:
ORA-14758: Last partition in the range section cannot be dropped

We can’t drop partition p50000 – it’s the highest partition that wasn’t created automatically, and we have to leave an “anchor” partition in place for interval partitioning to work from. By querying user_tab_partitions we can even see that this partition is flagged a little differently from the others:


select
        partition_name, interval, high_value 
from
        user_tab_partitions
where
        table_name = 'T1'
order by
        partition_position
;


PARTITION_NAME         INT HIGH_VALUE
---------------------- --- --------------------------
P50000                 NO  5e4
SYS_P69844             YES 60000
SYS_P69845             YES 70000
SYS_P69846             YES 80000
SYS_P69847             YES 90000
SYS_P69848             YES 100000
SYS_P69849             YES 110000

7 rows selected.

So, at first sight, we’re stuck. If we’re dropping old partitions we will eventually get to a point where there’s only one “real” range partition at the bottom and then we can’t drop any more historic partitions. There are two solutions to this problem, explained a long time ago here and here by Harald van Breederode.

Option 1

Convert the interval partitioned table to a range partitioned table and back again, and if you know the interval (and you can always look it up in the data dictionary) there’s a quick and dirty way of doing that. Here’s a cut-n-paste demonstrating the method and effect:


SQL> alter table t1 set interval (10000);

1Table altered.

SQL> select partition_name, interval, high_value from user_tab_partitions where table_name = 'T1' order by partition_position ; 

PARTITION_NAME         INT HIGH_VALUE
---------------------- --- --------------------------
P10000                 NO  1e4
P20000                 NO  2e4
P30000                 NO  3e4
P40000                 NO  4e4
P50000                 NO  5e4
SYS_P69850             NO  60000
SYS_P69851             NO  70000
SYS_P69852             NO  80000
SYS_P69853             NO  90000
SYS_P69854             NO  100000
SYS_P69855             NO  110000

11 rows selected.

SQL> select table_name, partitioning_type, interval from user_part_tables;

TABLE_NAME           PARTITION INTERVAL
-------------------- --------- --------------------
T1                   RANGE     1E4

1 row selected.

Every single partition has just become a range-based partition, but the table is still interval partitioned. This is a tidy solution, but there’s one obvious, generic, drawback to the method.  The “theory” of interval partitioning is that you don’t have to pre-create partitions in anticipation of the data arriving – so what will happen if a (possibly bad) row arrives weeks ahead of schedule and you find that Oracle has created (say) partition 85,001 with a gap of 12,000 partitions between the current high partition and the new one. If you use this “convert to range and back” trick then you’ll have a single partition covering the entire range where you were expecting (eventually) to have 12,000 partitions. Every time you convert from interval to range and back you’d better have code that checks if there are any gaps first, and then does loads of “split partition” –  or comes up with some other strategy – to address the side effects.

Option 2

When you’ve got just one range partition left, merge the bottom two partitions – this makes the next partition up a range partition without affecting any other partitions. After recreating the original table and dropping the first 4 partitions this is how things go:


SQL> alter table t1 drop partition p50000;
alter table t1 drop partition p50000
                              *
ERROR at line 1:
ORA-14758: Last partition in the range section cannot be dropped


SQL> alter table t1 merge partitions for (45000), for (55000) into partition p_low;

Table altered.

SQL> select partition_name, interval, high_value from user_tab_partitions where table_name = 'T1' order by partition_position;

PARTITION_NAME         INTERVAL             HIGH_VALUE
---------------------- -------------------- --------------------------
P_LOW                  NO                   60000
SYS_P69863             YES                  70000
SYS_P69864             YES                  80000
SYS_P69865             YES                  90000
SYS_P69866             YES                  100000
SYS_P69867             YES                  110000

6 rows selected.

Is this too good to be true ? Of course it is, but you may have to pause for a moment to think why. When you merge two partitions Oracle copies the contents of the two segments into a new segment – always; even if one of the two segments is empty. When you do a “split partition” Oracle runs a check to see if the split would leave all the data in a single segment and if it would then Oracle doesn’t do any copying but simply plays clever games in the data dictionary – unfortunately Oracle doesn’t use the same sort of trick to optimise a merge.

So the merge partition mechanism carries less risk than the “interval/range/interval”, but you either pay the cost of the merge or you carefully code the mechanism so that the bottom two partitions are always empty when you merge: for example you might always leave the bottom (range) partition empty and use your scheduled code to truncate (or exchange out) the lowest interval partition, then do the merge.

The good news

When you upgrade to 12.2.0.1 you can drop the lowest partition – and Oracle will simply turn the lowest interval partition currently in existence into a range partition. (That may be a bit of a nuisance if there’s a gap between the range partition and the current lowest interval partition.)

The Bad News

It doesn’t really matter which strategy you use to deal with this problem (even if you’ve upgraded to 12.2) – you still pay one other penalty for both mechanisms. And that’s the bit which re-introduces a problem that last existed in 9i.

Ask youself “How does Oracle know which interval a partition is for and what the limit is on the partitioning key ?” Then look at the data dictionary, or maybe build a very simple model and trace what happens when you use either of the methods above – but in your model create a significant number or partitions first. I’m going to take the data dictionary method – starting from the point where I’ve created and populated the table. Again this is cut-n-paste, and do note that I switch to the sys account after creating the table:


SQL> select object_id, object_name, subobject_name from user_objects;

 OBJECT_ID OBJECT_NAME          SUBOBJECT_NAME
---------- -------------------- ----------------------
    185164 T1
    185165 T1                   P10000
    185166 T1                   P20000
    185167 T1                   P30000
    185168 T1                   P40000
    185169 T1                   P50000
    185170 T1                   SYS_P69868
    185171 T1                   SYS_P69869
    185172 T1                   SYS_P69870
    185173 T1                   SYS_P69871
    185174 T1                   SYS_P69872
    185175 T1                   SYS_P69873

12 rows selected.

SQL> connect / as sysdba
Connected.

SQL> select obj#, dataobj#, part# from tabpart$ where bo# = 185164 order by part#;

      OBJ#   DATAOBJ#      PART#
---------- ---------- ----------
    185165     185165         10
    185166     185166         20
    185167     185167         30
    185168     185168         40
    185169     185169         50
    185170     185170 2147483648
    185171     185171 2147483649
    185172     185172 2147483650
    185173     185173 2147483651
    185174     185174 2147483652
    185175     185175 2147483653

11 rows selected.

I’ve queried user_objects to find the object_id of the table then used that as the “base object number” (bo#) to query tabpart$, which holds the table partition definitions. Note how there are 5 partitions where the partition number goes up 10 at a time, and 6 where it goes up one at a time. Prior to 10g (and interval partitions, of course) the stored partition number would increase in steps of 1 but if you wanted to do a split, merge or drop partition (and the last of the three was the most significant one) every single partition position about the split/merge/drop point would have to be renumbered, and that was done by a single row update to the data dictionary to keep the numbering intact. The steps of 10 were introduced in 10g to deal with the inherent performance problems – particularly the shared pool catastrophe that this could cause.

The steps of 1 for interval partitions allows Oracle to keep track (easily) of what high_value each partition partition represents, and the highest legal partition. Try inserting the values 1,000,000 into the table and re-run the query against tabpart$ and you’ll see Oracle adding part# = 2147483743. So what do you think is going to happen if you try to apply the two mechanisms ?

If you do the interval/range/interval switch every interval part# will be renumbered so to follow the “increment by 10” pattern. If you drop partitions p10000 to p40000 nothing happens to the existing part# values until you get to the command to merge p50000 with the next partition up and then you see this:


SQL> alter table test_user.t1 merge partitions for (45000), for (55000) into partition p_low;

Table altered.

SQL> select obj#, dataobj#, part# from tabpart$ where bo# = 185164 order by part#;

      OBJ#   DATAOBJ#      PART#
---------- ---------- ----------
    185177     185177         10
    185171     185171 2147483648
    185172     185172 2147483649
    185173     185173 2147483650
    185174     185174 2147483651
    185175     185175 2147483652
    185176     185176 2147483742

7 rows selected.


The newly merged partition is a new object, of course, so has a completely new obj# and dataobj#, and it’s been given the part# of 10 (the lowest value for a clean range-partitioned object). Every single interval partition has had its part# decreased by one. The lowest possible interval partition is always given the part# of 2147483648 (0x80000000) and the partition numbering increments by 1 from there onwards. (The numbering gets a little more subtle when you have composite partitioning but a similar approach takes place in tabcompart$).

Pause for thought – if you’re thinking of creating an interval partitioned table that could get close to a running level of 1 million partitions and you start to get rid of old partitions in any version of Oracle then each “drop/merge” partition will update about 1 million rows in the data dictionary – and that’s assuming you don’t have any local indexes that will need to be renumbered in the same way!

Here’s a critical part of the output from tkprof when I recreated the table with 1,000,000 rows – which means 101 partitions – and created a local index on it, before dropping the first 4 partitions and then enabled tracing just before merging the bottom interval partition with the anchor range partition.


update indpart$ set dataobj# = :1, part# = :2, flags = :3, ts# = :4, file# =
  :5, block# = :6, pctfree$ = :7, initrans = :8, maxtrans = :9, analyzetime =
  :10, samplesize = :11, rowcnt = :12, blevel = :13, leafcnt = :14, distkey =
  :15, lblkkey = :16, dblkkey = :17, clufac = :18, pctthres$ = :19
where
 obj# = :20


call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse       94      0.00       0.00          0          0          0           0
Execute     94      0.00       0.01          0         94        480          94
Fetch        0      0.00       0.00          0          0          0           0
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total      188      0.01       0.01          0         94        480          94


update tabpart$ set dataobj# = :1, part# = :2, ts# = :3, file# = :4, block# =
  :5, pctfree$ = :6, pctused$ = :7, initrans = :8, maxtrans = :9, flags = :10,
   analyzetime = :11, samplesize = :12, rowcnt = :13, blkcnt = :14, empcnt =
  :15, avgspc = :16, chncnt = :17, avgrln = :18
where
 obj# = :19


call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse       94      0.00       0.00          0          0          0           0
Execute     94      0.00       0.00          0        188        489          94
Fetch        0      0.00       0.00          0          0          0           0
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total      188      0.00       0.00          0        188        489          94

That’s not a lot of work for my little example with less than 100 partitions – but when you’ve got a million of them, with a handful of indexes, and the partitions have been created over time that’s going to turn into a lot of work that’s going to disrupt the shared pool for a long time, generate a lot of redo, and do a lot of disk reads and writes.

So be cautious with interval partitioning – even in 12.2 (and 18.1, possibly) the ease of use may disappear if you realise too late that you’re going to get into a cycle of partition maintenance.

Footnote for composite partitioning – the limits of 2^20-1 segments (hence subpartitions) still applies, but the necessary update is relevant only at the partition level, not at the subpartition level. The objects updated are tabcompart$ and indcompart$.

Update (included for ironic effect)

The day I posted this note my “Oracle Support Hot Topics” email with a report of the following bug:

Bug 19294302 : DBMS_REDEFINITION DOES NOT WORK WITH INTERVAL PARTITIONS

This was reported for 11.2.0.4, fixed in 12.2. The rediscovery information is:

ORA-14024 during copy_table_dep when the interim table is interval partitioned.

The problem arises if you change a table from simple range partitioned to range with interval – so might be relevant if you have a strategy of doing the interval/range/interval trick.

 

 

1 Comment »

  1. […] in this example because that’s just another complication to add to the comparison and, as I reported a few days ago, introduces another massive inefficiency on data […]

    Pingback by Reference Costs | Oracle Scratchpad — March 19, 2018 @ 7:44 am BST Mar 19,2018 | Reply


RSS feed for comments on this post. TrackBack URI

Comments and related questions are welcome.

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

w

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Powered by WordPress.com.