Updated (see end of article) for 21.3
I’ve recently been reminded of a blog post I wrote a couple of years ago that discussed the issue of running into the hard limit of 2^20 -1 as the number of segments for a (composite) partitioned table – a problem that could arise in a relatively short time if you used a large number of hash subpartitions in an interval/hash composite partitioned table (you get about 2 years and 10 months of daily partitions at 1,024 subpartitions per day, for example).
A natural follow-on from that article is to think through a strategy for dropping old partitions sufficiently early that you don’t hit the limit as new partitions are created. This, of course, pretty much defeats the point of interval partitioning – instead of planning to add partitions “just in time” you now have to eliminate them “just in time”. Amongst other issues, we’re going to find that interval partitioning manages to re-introduce a problem with range partitioning that Oracle got rid of in Oracle 10g.
So let’s test the obvious option: drop the oldest partition(s) in time to keep head-room for new partitions; for convenience we’ll start with a simple interval partitioned table with a few pre-declared range partitions and a few automatically generated interval partitions. All the examples here were run under 12.1.0.2:
rem
rem Script: pt_merge.sql
rem Author: Jonathan Lewis
rem Dated: Feb 2018
rem
create table t1(id, v1, padding)
partition by range (id) interval (1e4)
(
partition p10000 values less than (1e4),
partition p20000 values less than (2e4),
partition p30000 values less than (3e4),
partition p40000 values less than (4e4),
partition p50000 values less than (5e4)
)
nologging
as
with generator as (
select
rownum id
from dual
connect by
level <= 1e4 -- > comment to avoid WordPress format issue
)
select
rownum id,
lpad(rownum,10,'0') v1,
lpad('x',100,'x') padding
from
generator v1,
generator v2
where
rownum <= 1e5 -- > comment to avoid WordPress format issue
;
begin
dbms_stats.gather_table_stats(
ownname => user,
tabname => 'T1',
method_opt => 'for all columns size 1'
);
end;
/
SEGMENT_NAME PARTITION_NAME HEADER_BLOCK BLOCKS
------------------------- ---------------------- ------------ ----------
T1 P10000 128 256
T1 P20000 384 256
T1 P30000 640 256
T1 P40000 896 256
T1 P50000 1152 256
T1 SYS_P69838 1408 256
T1 SYS_P69839 1664 256
T1 SYS_P69840 1920 256
T1 SYS_P69841 2176 256
T1 SYS_P69842 2432 256
T1 SYS_P69843 2688 128
11 rows selected.
I’ve created 100,000 rows and since the partitions I’ve pre-declared have an (unreachable) upper bound of only 50,000 Oracle will have added a further 6 partitions to the table to hold the data for values up to 110,000 (with just one row in the last partition). For testing purposes I’ve created the table in an otherwise empty tablespace so when I check the block address of each segment I can see the location (and size) of the segments so far. So here’s the list of names and locations:
SEGMENT_NAME PARTITION_NAME HEADER_BLOCK BLOCKS
------------------------- ---------------------- ------------ ----------
T1 P10000 128 256
T1 P20000 384 256
T1 P30000 640 256
T1 P40000 896 256
T1 P50000 1152 256
T1 SYS_P69838 1408 256
T1 SYS_P69839 1664 256
T1 SYS_P69840 1920 256
T1 SYS_P69841 2176 256
T1 SYS_P69842 2432 256
T1 SYS_P69843 2688 128
11 rows selected.
No surprises so far. So let’s pretend we know the dreaded ORA-14299 or ORA-14300 will be arriving soon and try to drop the first 5 partitions to keep the partition count below the limit. Here’s a cut-n-paste from an SQL*Plus session that tries to do that one partition at a time:
SQL> alter table t1 drop partition p10000;
Table altered.
SQL> alter table t1 drop partition p20000;
Table altered.
SQL> alter table t1 drop partition p30000;
Table altered.
SQL> alter table t1 drop partition p40000;
Table altered.
SQL> alter table t1 drop partition p50000;
alter table t1 drop partition p50000
*
ERROR at line 1:
ORA-14758: Last partition in the range section cannot be dropped
We can’t drop partition p50000 – it’s the highest partition that wasn’t created automatically, and we have to leave an “anchor” partition in place for interval partitioning to work from. By querying user_tab_partitions we can even see that this partition is flagged a little differently from the others:
select
partition_name, interval, high_value
from
user_tab_partitions
where
table_name = 'T1'
order by
partition_position
;
PARTITION_NAME INT HIGH_VALUE
---------------------- --- --------------------------
P50000 NO 5e4
SYS_P69844 YES 60000
SYS_P69845 YES 70000
SYS_P69846 YES 80000
SYS_P69847 YES 90000
SYS_P69848 YES 100000
SYS_P69849 YES 110000
7 rows selected.
So, at first sight, we’re stuck. If we’re dropping old partitions we will eventually get to a point where there’s only one “real” range partition at the bottom and then we can’t drop any more historic partitions. There are two solutions to this problem, explained a long time ago here and here by Harald van Breederode.
Option 1
Convert the interval partitioned table to a range partitioned table and back again, and if you know the interval (and you can always look it up in the data dictionary) there’s a quick and dirty way of doing that. Here’s a cut-n-paste demonstrating the method and effect:
SQL> alter table t1 set interval (10000);
1Table altered.
SQL> select partition_name, interval, high_value from user_tab_partitions where table_name = 'T1' order by partition_position ;
PARTITION_NAME INT HIGH_VALUE
---------------------- --- --------------------------
P10000 NO 1e4
P20000 NO 2e4
P30000 NO 3e4
P40000 NO 4e4
P50000 NO 5e4
SYS_P69850 NO 60000
SYS_P69851 NO 70000
SYS_P69852 NO 80000
SYS_P69853 NO 90000
SYS_P69854 NO 100000
SYS_P69855 NO 110000
11 rows selected.
SQL> select table_name, partitioning_type, interval from user_part_tables;
TABLE_NAME PARTITION INTERVAL
-------------------- --------- --------------------
T1 RANGE 1E4
1 row selected.
Every single partition has just become a range-based partition, but the table is still interval partitioned. This is a tidy solution, but there’s one obvious, generic, drawback to the method. The “theory” of interval partitioning is that you don’t have to pre-create partitions in anticipation of the data arriving – so what will happen if a (possibly bad) row arrives weeks ahead of schedule and you find that Oracle has created (say) partition 85,001 with a gap of 12,000 partitions between the current high partition and the new one. If you use this “convert to range and back” trick then you’ll have a single partition covering the entire range where you were expecting (eventually) to have 12,000 partitions. Every time you convert from interval to range and back you’d better have code that checks if there are any gaps first, and then does loads of “split partition” – or comes up with some other strategy – to address the side effects.
Option 2
When you’ve got just one range partition left, merge the bottom two partitions – this makes the next partition up a range partition without affecting any other partitions. After recreating the original table and dropping the first 4 partitions this is how things go:
SQL> alter table t1 drop partition p50000;
alter table t1 drop partition p50000
*
ERROR at line 1:
ORA-14758: Last partition in the range section cannot be dropped
SQL> alter table t1 merge partitions for (45000), for (55000) into partition p_low;
Table altered.
SQL> select partition_name, interval, high_value from user_tab_partitions where table_name = 'T1' order by partition_position;
PARTITION_NAME INTERVAL HIGH_VALUE
---------------------- -------------------- --------------------------
P_LOW NO 60000
SYS_P69863 YES 70000
SYS_P69864 YES 80000
SYS_P69865 YES 90000
SYS_P69866 YES 100000
SYS_P69867 YES 110000
6 rows selected.
Is this too good to be true ? Of course it is, but you may have to pause for a moment to think why. When you merge two partitions Oracle copies the contents of the two segments into a new segment – always; even if one of the two segments is empty. When you do a “split partition” Oracle runs a check to see if the split would leave all the data in a single segment and if it would then Oracle doesn’t do any copying but simply plays clever games in the data dictionary – unfortunately Oracle doesn’t use the same sort of trick to optimise a merge.
So the merge partition mechanism carries less risk than the “interval/range/interval”, but you either pay the cost of the merge or you carefully code the mechanism so that the bottom two partitions are always empty when you merge: for example you might always leave the bottom (range) partition empty and use your scheduled code to truncate (or exchange out) the lowest interval partition, then do the merge.
The good news
When you upgrade to 12.2.0.1 you can drop the lowest partition – and Oracle will simply turn the lowest interval partition currently in existence into a range partition. (That may be a bit of a nuisance if there’s a gap between the range partition and the current lowest interval partition.)
The Bad News
It doesn’t really matter which strategy you use to deal with this problem (even if you’ve upgraded to 12.2) – you still pay one other penalty for both mechanisms. And that’s the bit which re-introduces a problem that last existed in 9i.
Ask youself “How does Oracle know which interval a partition is for and what the limit is on the partitioning key ?” Then look at the data dictionary, or maybe build a very simple model and trace what happens when you use either of the methods above – but in your model create a significant number or partitions first. I’m going to take the data dictionary method – starting from the point where I’ve created and populated the table. Again this is cut-n-paste, and do note that I switch to the sys account after creating the table:
SQL> select object_id, object_name, subobject_name from user_objects;
OBJECT_ID OBJECT_NAME SUBOBJECT_NAME
---------- -------------------- ----------------------
185164 T1
185165 T1 P10000
185166 T1 P20000
185167 T1 P30000
185168 T1 P40000
185169 T1 P50000
185170 T1 SYS_P69868
185171 T1 SYS_P69869
185172 T1 SYS_P69870
185173 T1 SYS_P69871
185174 T1 SYS_P69872
185175 T1 SYS_P69873
12 rows selected.
SQL> connect / as sysdba
Connected.
SQL> select obj#, dataobj#, part# from tabpart$ where bo# = 185164 order by part#;
OBJ# DATAOBJ# PART#
---------- ---------- ----------
185165 185165 10
185166 185166 20
185167 185167 30
185168 185168 40
185169 185169 50
185170 185170 2147483648
185171 185171 2147483649
185172 185172 2147483650
185173 185173 2147483651
185174 185174 2147483652
185175 185175 2147483653
11 rows selected.
I’ve queried user_objects to find the object_id of the table then used that as the “base object number” (bo#) to query tabpart$, which holds the table partition definitions. Note how there are 5 partitions where the partition number goes up 10 at a time, and 6 where it goes up one at a time. Prior to 10g (and interval partitions, of course) the stored partition number would increase in steps of 1 but if you wanted to do a split, merge or drop partition (and the last of the three was the most significant one) every single partition position about the split/merge/drop point would have to be renumbered, and that was done by a single row update to the data dictionary to keep the numbering intact. The steps of 10 were introduced in 10g to deal with the inherent performance problems – particularly the shared pool catastrophe that this could cause.
The steps of 1 for interval partitions allows Oracle to keep track (easily) of what high_value each partition partition represents, and the highest legal partition. Try inserting the values 1,000,000 into the table and re-run the query against tabpart$ and you’ll see Oracle adding part# = 2147483743. So what do you think is going to happen if you try to apply the two mechanisms ?
If you do the interval/range/interval switch every interval part# will be renumbered so to follow the “increment by 10” pattern. If you drop partitions p10000 to p40000 nothing happens to the existing part# values until you get to the command to merge p50000 with the next partition up and then you see this:
SQL> alter table test_user.t1 merge partitions for (45000), for (55000) into partition p_low;
Table altered.
SQL> select obj#, dataobj#, part# from tabpart$ where bo# = 185164 order by part#;
OBJ# DATAOBJ# PART#
---------- ---------- ----------
185177 185177 10
185171 185171 2147483648
185172 185172 2147483649
185173 185173 2147483650
185174 185174 2147483651
185175 185175 2147483652
185176 185176 2147483742
7 rows selected.
The newly merged partition is a new object, of course, so has a completely new obj# and dataobj#, and it’s been given the part# of 10 (the lowest value for a clean range-partitioned object). Every single interval partition has had its part# decreased by one. The lowest possible interval partition is always given the part# of 2147483648 (0x80000000) and the partition numbering increments by 1 from there onwards. (The numbering gets a little more subtle when you have composite partitioning but a similar approach takes place in tabcompart$).
Pause for thought – if you’re thinking of creating an interval partitioned table that could get close to a running level of 1 million partitions and you start to get rid of old partitions in any version of Oracle then each “drop/merge” partition will update about 1 million rows in the data dictionary – and that’s assuming you don’t have any local indexes that will need to be renumbered in the same way!
Here’s a critical part of the output from tkprof when I recreated the table with 1,000,000 rows – which means 101 partitions – and created a local index on it, before dropping the first 4 partitions and then enabled tracing just before merging the bottom interval partition with the anchor range partition.
update indpart$ set dataobj# = :1, part# = :2, flags = :3, ts# = :4, file# =
:5, block# = :6, pctfree$ = :7, initrans = :8, maxtrans = :9, analyzetime =
:10, samplesize = :11, rowcnt = :12, blevel = :13, leafcnt = :14, distkey =
:15, lblkkey = :16, dblkkey = :17, clufac = :18, pctthres$ = :19
where
obj# = :20
call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 94 0.00 0.00 0 0 0 0
Execute 94 0.00 0.01 0 94 480 94
Fetch 0 0.00 0.00 0 0 0 0
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 188 0.01 0.01 0 94 480 94
update tabpart$ set dataobj# = :1, part# = :2, ts# = :3, file# = :4, block# =
:5, pctfree$ = :6, pctused$ = :7, initrans = :8, maxtrans = :9, flags = :10,
analyzetime = :11, samplesize = :12, rowcnt = :13, blkcnt = :14, empcnt =
:15, avgspc = :16, chncnt = :17, avgrln = :18
where
obj# = :19
call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 94 0.00 0.00 0 0 0 0
Execute 94 0.00 0.00 0 188 489 94
Fetch 0 0.00 0.00 0 0 0 0
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 188 0.00 0.00 0 188 489 94
That’s not a lot of work for my little example with less than 100 partitions – but when you’ve got a million of them, with a handful of indexes, and the partitions have been created over time that’s going to turn into a lot of work that’s going to disrupt the shared pool for a long time, generate a lot of redo, and do a lot of disk reads and writes.
So be cautious with interval partitioning – even in 12.2 (and 18.1, possibly) the ease of use may disappear if you realise too late that you’re going to get into a cycle of partition maintenance.
Footnote for composite partitioning – the limits of 2^20-1 segments (hence subpartitions) still applies, but the necessary update is relevant only at the partition level, not at the subpartition level. The objects updated are tabcompart$ and indcompart$.
Update (included for ironic effect)
The day I posted this note my “Oracle Support Hot Topics” email with a report of the following bug:
Bug 19294302 : DBMS_REDEFINITION DOES NOT WORK WITH INTERVAL PARTITIONS
This was reported for 11.2.0.4, fixed in 12.2. The rediscovery information is:
ORA-14024 during copy_table_dep when the interim table is interval partitioned.
The problem arises if you change a table from simple range partitioned to range with interval – so might be relevant if you have a strategy of doing the interval/range/interval trick.
Update (Nov 2021)
I’ve just run through the test script again on 21.3.0.0, and the key feature of dropping the “anchor” range partition (i.e. the highest partition that is not an interval partition) is that the lowest interval partition becomes the new anchor (which means it changes to a range partition) and every partition above it has its internal part number (tabpart$.part#) updated.
You probably won’t have an enormous number of interrvals above the anchor (even though there could, in principle, be roughly 1 million), but with even just hundred or so partitions (hourly for a week, say, or daily for a year) this could produce a massive stutter in the library cache as cursors were invalidated and sessions stalled waiting to be able to re-optimize their queries.