Interval partitioning is a popular strategy for partitioning date-based data. It’s an enhanced variant of range partitioning that allows you to define a starting partition and an interval that should be used to derive the high values for all subsequent partitions – and Oracle doesn’t even have to create intervening partitions if you insert data that goes far beyond the current partition, it automatically creates exactly the right partition (with the correct high_value and correctly inferred lower boundary) for the incoming data and behaves as if the intervening partitions will become available when they’re needed at some later point in time. So no need for DBAs to work out actual partition high_values, no need to ensure that all the partitions you need out into the future have been pre-created, no more batch processes crashing with Oracle error ORA-14400: inserted partition key does not map to any partition.
But there’s a surprising difference between traditional range partitioning and the newer interval partitioning that will increase CPU usage in some cases and may (though I haven’t yet investigated this in sufficient detail to create an example) lead to variations in execution plans.
To demonstrate the difference I’m going to create two tables with the same structure and content then run the same query against them and show you the resulting execution plans. Here’s the code to create and populate the tables:
rem
rem Script: interval_or_range.sql
rem Author: Jonathan Lewis
rem Dated: Oct 2020
rem
rem Last tested
rem 19.3.0.0
rem
create table t_interval(
order_date date not null,
order_id number(10,0) not null,
small_vc varchar2(10),
padding varchar2(100)
)
partition by range (order_date)
interval (numtoyminterval(1,'MONTH'))
subpartition by hash (order_id) subpartitions 4
(
partition start_p1 values less than (to_date('01-Jan-2020','dd-mon-yyyy'))
)
;
create table t_range(
order_date date not null,
order_id number(10,0) not null,
small_vc varchar2(10),
padding varchar2(100)
)
partition by range (order_date)
subpartition by hash (order_id) subpartitions 4
(
partition start_p1 values less than (to_date('01-Jan-2020','dd-mon-yyyy')),
partition start_p2 values less than (to_date('01-Feb-2020','dd-mon-yyyy')),
partition start_p3 values less than (to_date('01-Mar-2020','dd-mon-yyyy')),
partition start_p4 values less than (to_date('01-Apr-2020','dd-mon-yyyy')),
partition start_p5 values less than (to_date('01-May-2020','dd-mon-yyyy')),
partition start_p6 values less than (to_date('01-Jun-2020','dd-mon-yyyy')),
partition start_p7 values less than (to_date('01-Jul-2020','dd-mon-yyyy')),
partition start_p8 values less than (to_date('01-Aug-2020','dd-mon-yyyy')),
partition start_p9 values less than (to_date('01-Sep-2020','dd-mon-yyyy'))
)
;
insert into t_range select
to_date('01-Jan-2020','dd-mon-yyyy') + rownum,
rownum,
lpad(rownum,10,'0'),
lpad('x',100,'x')
from
all_objects
where
rownum <= 240 -- > comment to avoid wordpress format issue
;
insert into t_interval select
to_date('01-Jan-2020','dd-mon-yyyy') + rownum,
rownum,
lpad(rownum,10,'0'),
lpad('x',100,'x')
from
all_objects
where
rownum <= 240 -- > comment to avoid wordpress format issue
;
commit;
begin
dbms_stats.gather_table_stats(
ownname => null,
tabname => 'T_RANGE',
method_opt => 'for all columns size 1'
);
dbms_stats.gather_table_stats(
ownname => null,
tabname => 'T_INTERVAL',
method_opt => 'for all columns size 1'
);
end;
/
I’ve used composite partitioned tables in this example but the same anomaly appears with simple partitioning – which you can test by commenting out the “subpartion by ..” lines. The two tables have partitions defined to hold a month’s data. I’ve inserted a few rows into most of the partitions, and ensured that I haven’t accidentally attempted to insert data that falls outside the legal range of the table with the predefined partitions .
To show that the tables are nominally identical here’s the list of partitions with their high values and number of rows:
column table_name format a15
column partition_name format a15
column high_value format a80
break on table_name skip 1
select
table_name, partition_name, num_rows, high_value
from
user_tab_partitions
where
table_name in ('T_INTERVAL','T_RANGE')
order by
table_name, partition_name
;
ABLE_NAME PARTITION_NAME NUM_ROWS HIGH_VALUE
--------------- --------------- ---------- --------------------------------------------------------------------------------
T_INTERVAL START_P1 0 TO_DATE(' 2020-01-01 00:00:00', 'SYYYY-MM-DD HH24:MI:SS', 'NLS_CALENDAR=GREGORIA
SYS_P10722 30 TO_DATE(' 2020-02-01 00:00:00', 'SYYYY-MM-DD HH24:MI:SS', 'NLS_CALENDAR=GREGORIA
SYS_P10723 29 TO_DATE(' 2020-03-01 00:00:00', 'SYYYY-MM-DD HH24:MI:SS', 'NLS_CALENDAR=GREGORIA
SYS_P10724 31 TO_DATE(' 2020-04-01 00:00:00', 'SYYYY-MM-DD HH24:MI:SS', 'NLS_CALENDAR=GREGORIA
SYS_P10725 30 TO_DATE(' 2020-05-01 00:00:00', 'SYYYY-MM-DD HH24:MI:SS', 'NLS_CALENDAR=GREGORIA
SYS_P10726 31 TO_DATE(' 2020-06-01 00:00:00', 'SYYYY-MM-DD HH24:MI:SS', 'NLS_CALENDAR=GREGORIA
SYS_P10727 30 TO_DATE(' 2020-07-01 00:00:00', 'SYYYY-MM-DD HH24:MI:SS', 'NLS_CALENDAR=GREGORIA
SYS_P10728 31 TO_DATE(' 2020-08-01 00:00:00', 'SYYYY-MM-DD HH24:MI:SS', 'NLS_CALENDAR=GREGORIA
SYS_P10729 28 TO_DATE(' 2020-09-01 00:00:00', 'SYYYY-MM-DD HH24:MI:SS', 'NLS_CALENDAR=GREGORIA
T_RANGE START_P1 0 TO_DATE(' 2020-01-01 00:00:00', 'SYYYY-MM-DD HH24:MI:SS', 'NLS_CALENDAR=GREGORIA
START_P2 30 TO_DATE(' 2020-02-01 00:00:00', 'SYYYY-MM-DD HH24:MI:SS', 'NLS_CALENDAR=GREGORIA
START_P3 29 TO_DATE(' 2020-03-01 00:00:00', 'SYYYY-MM-DD HH24:MI:SS', 'NLS_CALENDAR=GREGORIA
START_P4 31 TO_DATE(' 2020-04-01 00:00:00', 'SYYYY-MM-DD HH24:MI:SS', 'NLS_CALENDAR=GREGORIA
START_P5 30 TO_DATE(' 2020-05-01 00:00:00', 'SYYYY-MM-DD HH24:MI:SS', 'NLS_CALENDAR=GREGORIA
START_P6 31 TO_DATE(' 2020-06-01 00:00:00', 'SYYYY-MM-DD HH24:MI:SS', 'NLS_CALENDAR=GREGORIA
START_P7 30 TO_DATE(' 2020-07-01 00:00:00', 'SYYYY-MM-DD HH24:MI:SS', 'NLS_CALENDAR=GREGORIA
START_P8 31 TO_DATE(' 2020-08-01 00:00:00', 'SYYYY-MM-DD HH24:MI:SS', 'NLS_CALENDAR=GREGORIA
START_P9 28 TO_DATE(' 2020-09-01 00:00:00', 'SYYYY-MM-DD HH24:MI:SS', 'NLS_CALENDAR=GREGORIA
18 rows selected.
With this setup we can now run a simple query against the two tables using a where clause that is an exact match for the partition definition for the July data:
set serveroutput off
select
count(*)
from
t_range
where order_date >= date '2020-07-01' -- to_date('01-Jul-2020','dd-mon-yyyy')
and order_date < date '2020-08-01' -- to_date('01-Aug-2020','dd-mon-yyyy')
/
select * from table(dbms_xplan.display_cursor);
select
count(*)
from
t_interval
where order_date >= date '2020-07-01' -- to_date('01-Jul-2020','dd-mon-yyyy')
and order_date < date '2020-08-01' -- to_date('01-Aug-2020','dd-mon-yyyy')
/
select * from table(dbms_xplan.display_cursor);
I’ve left in two versions of the date predicates – the “ANSI” style, and a strictly formatted “to_char()” style that is still the approached used most commonly in Oracle systems. The effect on the execution plans is the same for both variations of the date declaration, and here are the two execution plans – reported from an instance of 19.3.0.0:
---------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | Pstart| Pstop |
---------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | | | 130 (100)| | | |
| 1 | SORT AGGREGATE | | 1 | 8 | | | | |
| 2 | PARTITION RANGE SINGLE| | 31 | 248 | 130 (3)| 00:00:01 | 8 | 8 |
| 3 | TABLE ACCESS FULL | T_RANGE | 31 | 248 | 130 (3)| 00:00:01 | 8 | 8 |
---------------------------------------------------------------------------------------------------
------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | Pstart| Pstop |
------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | | | 130 (100)| | | |
| 1 | SORT AGGREGATE | | 1 | 8 | | | | |
| 2 | PARTITION RANGE SINGLE| | 31 | 248 | 130 (3)| 00:00:01 | 8 | 8 |
|* 3 | TABLE ACCESS FULL | T_INTERVAL | 31 | 248 | 130 (3)| 00:00:01 | 8 | 8 |
------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
3 - filter(("ORDER_DATE">=TO_DATE(' 2020-07-01 00:00:00', 'syyyy-mm-dd hh24:mi:ss') AND
"ORDER_DATE"<TO_DATE(' 2020-08-01 00:00:00', 'syyyy-mm-dd hh24:mi:ss')))
It’s quite possible that you won’t notice the difference these plans, even though I’ve put them directly one below the other; and it’s very likely that most people wouldn’t think about the difference if they didn’t have the two plans available at the same time to compare.
If you can’t spot the difference it’s because I’ve deliberately arranged them in a way that might fool you into not looking carefully enough.
If you’ve spotted the difference it’s probably because you’ve examined the Predicate Information section. There isn’t one for the the t_range example but there is one for the t_interval example – and it was a little sneaky of me to make it easy for you to assume that the one set of predicates I showed was common to the two plans.
Since the plans have been pulled from memory (v$sql_plan) it seems likely that they are truthful and the query based on the interval partitioning is actually checking every row in the partition against two date values. In my example that’s not going to make much difference to performance, but in a datawarehouse with millions of rows per partition there’s scope for a noticeable increase in CPU between the two queries – especially since the block reads are likely to be direct path, bypassing much of the usual (CPU -intensive) buffer cache activity.
Interestingly when I checked the 10053 trace files for the two queries the CPU cost for the pair was identical. Normally you expect to see at least a little CPU cost (even if it’s only tens of units out of millions) for checking a predicate. This raises the question – does the optimizer allow a cost for the t_range table for an event that isn’t going to happen, or is it reporting an event that isn’t going to happen for the t_interval table. [Edit: easy enough to check by doing the test on a simple heap table that clones the data from that one partition, and checking the calculated cost with and without the predicate]
There’s a further ramification to this anomaly, relating to the question on the Oracle-L list server that prompted the investigation. What happens on Exadata where the predicates can be offloaded to storage?
The original question wasn’t actually about the appearance (or not) of the predicates, it was about an unexpected cardinality estimate for a query involving two equi-partitioned tables, so this note has gone completely off-topic from the question; but another strange detail about the predicates showed up when I suggested the creation of a column group on the join columns. A (redundant) predicate that had been offloaded to storage stopped being offloaded; here, from the original posting with the original tables, are the two different Predicate Information sections that appeared – the first without the column group, the second after the column group had been created and its stats collected:
Predicate Information (identified by operation id):
---------------------------------------------------
4 - access("ORDER_DETL"."ORDR_RQST_NUMBER"="ORDER_REQST"."ORDR_RQST_NUMBER" AND
"ORDER_DETL"."ORDR_RQST_DATE"="ORDER_REQST"."ORDR_RQST_DATE")
6 - storage("ORDER_REQST"."ORDR_RQST_DATE">=TO_DATE(' 2020-07-01
00:00:00', 'syyyy-mm-dd hh24:mi:ss') AND
"ORDER_REQST"."ORDR_RQST_DATE"<=TO_DATE(' 2020-07-31
00:00:00', 'syyyy-mm-dd hh24:mi:ss'))
filter("ORDER_REQST"."ORDR_RQST_DATE">=TO_DATE(' 2020-07-01
00:00:00', 'syyyy-mm-dd hh24:mi:ss') AND
"ORDER_REQST"."ORDR_RQST_DATE"<=TO_DATE(' 2020-07-31
00:00:00', 'syyyy-mm-dd hh24:mi:ss'))
8 - storage("ORDER_DETL"."ORDR_RQST_DATE">=TO_DATE(' 2020-07-01
00:00:00', 'syyyy-mm-dd hh24:mi:ss') AND
"ORDER_DETL"."ORDR_RQST_DATE"<=TO_DATE(' 2020-07-31
00:00:00', 'syyyy-mm-dd hh24:mi:ss'))
filter("ORDER_DETL"."ORDR_RQST_DATE">=TO_DATE(' 2020-07-01
00:00:00', 'syyyy-mm-dd hh24:mi:ss') AND
"ORDER_DETL"."ORDR_RQST_DATE"<=TO_DATE(' 2020-07-31
00:00:00', 'syyyy-mm-dd hh24:mi:ss'))
Predicate Information (identified by operation id):
---------------------------------------------------
4 -
access("ORDER_DETL"."ORDR_RQST_NUMBER"="ORDER_REQST"."ORDR_RQST_NUMBER" AND
"ORDER_DETL"."ORDR_RQST_DATE"="ORDER_REQST"."ORDR_RQST_DATE")
6 - storage("ORDER_REQST"."ORDR_RQST_DATE">=TO_DATE(' 2020-07-01
00:00:00', 'syyyy-mm-dd hh24:mi:ss') AND
"ORDER_REQST"."ORDR_RQST_DATE"<TO_DATE(' 2020-08-01
00:00:00', 'syyyy-mm-dd hh24:mi:ss'))
filter("ORDER_REQST"."ORDR_RQST_DATE">=TO_DATE(' 2020-07-01
00:00:00', 'syyyy-mm-dd hh24:mi:ss') AND
"ORDER_REQST"."ORDR_RQST_DATE"<TO_DATE(' 2020-08-01
00:00:00', 'syyyy-mm-dd hh24:mi:ss'))
8 - filter("ORDER_DETL"."ORDR_RQST_DATE">=TO_DATE(' 2020-07-01
00:00:00', 'syyyy-mm-dd hh24:mi:ss') AND
"ORDER_DETL"."ORDR_RQST_DATE"<TO_DATE(' 2020-08-01 00:00:00',
'syyyy-mm-dd hh24:mi:ss'))
Notice how the storage() predicate that appears at operation 8 in the first set has disappeared from the second even though operation 6 manages to retain its storage() predicate throughout.
I’ve modelled a further example of odd behaviour using two pairs of tables – one pair using range/hash partitioning the other interval/hash partitioning. I won’t reproduce it here but the problem of redundant predicates appearing and then disappearing gets even stranger.
I haven’t yet produced an example where the unexpected predicate behaviour has affected the costs or cardinalities – but I’ve only spent a couple of hours playing around with well-formed examples: it’s possible that in badly formed examples (e.g. with statistical inconsistencies) the side effect could mean that two notionally identical queries produce different numbers and different plans because they end up with different predicates in the final transformed query.
Footnote
Following an email from David Kurtz, it occurred to me that I should have made it clear that the disappearance of predicates on the partition key is expected behaviour when the predicates are clearly synchronised with the partition boundaries. The behaviour for the interval partitioning is the oddity, the behaviour for the “normal” range partitioning is the standard.
Comments and related questions are welcome.