Interval Partition Problem

February 21, 2018

Interval Partition Problem

Filed under: Bugs,Oracle,Partitioning,Troubleshooting — Jonathan Lewis @ 8:40 am GMT Feb 21,2018

Assume you’ve got a huge temporary tablespace, there’s plenty of space in your data tablespace, you’ve got a very boring, simple table you want to copy and partition, and no-one and nothing is using the system. Would you really expect a (fairly) ordinary “create table t2 {interval partitioning clause etc.} as select * from t1” to end with an Oracle error “ORA-01652: unable to extend temp segment by 128 in tablespace TEMP” . That’s the temporary tablespace that’s out of space, not the target tablespace for the copy.

Here’s a sample data set (tested on 11.2.0.4 and 12.1.0.2) to demonstrate the surprise – you’ll need about 900MB of space in your default tablespace by the time the entire model has run to completion (if it runs to completion):

rem
rem     Script:         pt_interval_threat_2.sql
rem     Author:         Jonathan Lewis
rem     Dated:          Feb 2018
rem

column today new_value m_today
select to_char(sysdate,'dd-mon-yyyy') today from dual;

create table t1
as
with g as (
        select rownum id
        from dual
        connect by level <= 2e3
)
select
        rownum id,
        trunc(sysdate) + g2.id  created,
        rpad('x',50)            padding
from
        g g1,
        g g2
where
        rownum <= 4e6 --> comment to avoid WordPress format issue
;

execute dbms_stats.gather_table_stats(user,'t1',method_opt=>'for all columns size 1')

I’ve created a table of 4 million rows, covering 2,000 dates out into the future starting from sysdate+1 (tomorrow). As you can see there’s nothing in the slightest bit interesting, unusual, or exciting about the data types and content of the table.

I said my “create table as select” was fairly ordinary – but it’s actually a tiny bit out off the beaten track because it’s going to create an interval partitioned copy of this table.


execute snap_my_stats.start_snap

create table t2
partition by range(created)
interval(numtodsinterval(7, 'day'))
(
        partition p_start       values less than (to_date('&m_today','dd-mon-yyyy'))
)
storage(initial 1M)
nologging
as
select
        *
from
        t1
;

set serveroutput on
execute snap_my_stats.end_snap

I’ve created the table as a range-partitioned table with an interval() declared. Conveniently I need only mention the partitioning column by name in the declaration, rather than listing all the columns with their types, and I’ve only specified a single starting partition. Since the interval is 7 days and the data spans 2,000 days I’m going to end up with nearly 290 partitions added.

There’s no guarantee that you will see the ORA-01652 error when you run this test – the data size is rather small and your machine may have sufficient other resources to hide the problem even when you’re looking for it – but the person who reported the problem on one of the Oracle database forums was copying a table of 2.5 Billion rows using about 200 GB of storage, so size is probably important, hence the 4 million rows as a starting point on my small system.

Of course, hitting an ORA-01652 on TEMP when doing a simple “create as select” is such an unlikely sounding error that you don’t necessarily have to see it actually happen; all you need to see (at least as a starting point in a small model) is TEMP being used unexpectedly so, for my first test (on 11.2.0.4), I’ve included some code to calculate and report changes in the session stats – that’s the calls to the package snap_my_stats. Here are some of the more interesting results:


---------------------------------
Session stats - 20-Feb 16:58:24
Interval:-  14 seconds
---------------------------------
Name                                                                     Value
----                                                                     -----
table scan rows gotten                                               4,000,004
table scan blocks gotten                                                38,741

session pga memory max                                             181,338,112

sorts (rows)                                                         2,238,833

physical reads direct temporary tablespace                              23,313
physical writes direct temporary tablespace                             23,313

The first couple of numbers show the 4,000,000 rows being scanned from 38,741 table blocks – and that’s not a surprise. But for a simple copy the incremental 181MB of PGA memory we’ve acquired is a little surprising, though less so when we see that we’ve sorted 2.2M rows, and then ended up spilling 23,313 blocks to the temporary tablespace. But why are we sorting anything – what are those rows ?

My first thought was that there was a bug in some recursive SQL that was trying to define or identify dynamically created partitions, or maybe something in the space management code trying to find free space, so the obvious step was to enable extended tracing and look for any recursive statements that were running a large number of times or doing a lot of work. There weren’t any – and the trace file (particularly the detailed wait events) suggested the problem really was purely to do with the CTAS itself; so I ran the code again enabling events 10032 and 10033 (the sort traces) and found the following:


---- Sort Statistics ------------------------------
Initial runs                              1
Input records                             2140000
Output records                            2140000
Disk blocks 1st pass                      22292
Total disk blocks used                    22294
Total number of comparisons performed     0
Temp segments allocated                   1
Extents allocated                         175
Uses version 1 sort
Uses asynchronous IO

One single operation had resulted in Oracle sorting 2.14 million rows (but not making any comparisons!) – and the only table in the entire system with enough rows to do that was my source table! Oracle seems to be “sorting” (but possibly just buffering) a large fraction of the data for no obvious reason before inserting it.

Why, and why only 2.14M out of 4M ?
Does it do the same on 12.1.0.2 (yes), what about 12.2.0.1 (no – but I still saw a large PGA allocation, possibly stats gathering).
- but see Update March 2022
Is there any clue about this on MoS (yes Bug 17655392 – though that one is erroneously, I think, flagged as “closed not a bug”)
Is there a workaround ? (Yes – I think so).

Playing around and trying to work out what’s happening the obvious pointers are the large memory allocation and the “incomplete” spill to disc – what would happen if I fiddled around with workarea sizing – switching it to manual, say, or setting the pga_aggregate_target to a low value. At one point I got results showing 19M rows (that’s not a typo, it really was close to 5 times the number of rows in the table) sorted with a couple of hundred thousand blocks of TEMP used – the 10033 trace showed 9 consecutive passes (that I can’t explain) as the code executed from which I’ve extract the row counts, temp blocks used, and number of comparisons made:


Input records                             3988000
Total disk blocks used                    41544
Total number of comparisons performed     0

Input records                             3554000
Total disk blocks used                    37023
Total number of comparisons performed     0

Input records                             3120000
Total disk blocks used                    32502
Total number of comparisons performed     0

Input records                             2672000
Total disk blocks used                    27836
Total number of comparisons performed     0

Input records                             2224000
Total disk blocks used                    23169
Total number of comparisons performed     0

Input records                             1762000
Total disk blocks used                    18357
Total number of comparisons performed     0

Input records                             1300000
Total disk blocks used                    13544
Total number of comparisons performed     0

Input records                             838000
Total disk blocks used                    8732
Total number of comparisons performed     0

Input records                             376000
Total disk blocks used                    3919
Total number of comparisons performed     0

There really doesn’t seem to be any good reason why Oracle should do any sorting of the data (and maybe it wasn’t given the total number of comparisons performed in this case) – except, perhaps, to allow it to do bulk inserts into each partition in turn or, possibly, to avoid creating an entire new partition at exactly the moment it finds just the first row that needs to go into a new partition. Thinking along these lines I decided to pre-create all the necessary partitions just in case this made any difference – the code is at the end of the blog note. Another idea was to create the table empty (with, and without, pre-created partitions), then do an “insert /*+ append */” of the data.

Nothing changed (much – though the number of rows sorted kept varying).

And then — it all started working perfectly with virtually no rows reported sorted and no I/O to the temporary tablespace !

Fortunately I thought of looking at v$memory_resize_ops and found that the automatic memory management had switched a lot of memory to the PGA, allowing Oracle to do whatever it needed to do completely in memory without reporting any sorting (and a couple of checks on v$process_memory showed that the amount of memory assigned to category “Other” – rather than “SQL”, interestingly – had exceeded the current value of the pga_aggregate_target). A quick re-start of the instance fixed that “internal workaround”.

Still struggling with finding a reasonable workaround I decided to see if the same anomaly would appear if the table were range partitioned but didn’t have an interval clause. This meant I had to precreate all the necessary partitions, of course – which I did by starting with an interval partitioned table, letting Oracle figure out which partitions to create, then disabling the interval feature – again, see the code at the end of this note.

The results: no rows sorted on the insert, no writes to temp. Unless it’s just a question of needing even more data to reproduce the problem with simple range partitioned tables, it looks as if there’s a problem somewhere in the code for interval partitioned tables and all you have to do to work around it is precreate loads of partitions, disable intervals, load, then re-enable the intervals.

Footnote:

Here’s the “quick and dirty” code I used to generate the t2 table with precreated partitions:


create table t2
partition by range(created)
interval(numtodsinterval(7, 'day'))
(
        partition p_start values less than (to_date('&m_today','dd-mon-yyyy'))
)
storage(initial 1M)
nologging
monitoring
as
select
        *
from
        t1
where
        rownum <= 0
;


<<expand>>
declare
        m_max_date      date;
begin
        select  max(created)
        into    expand.m_max_date
        from    t1
        ;

        <<loop>>
        for i in 1..expand.m_max_date - trunc(sysdate) loop
                dbms_output.put(
                        to_char(trunc(sysdate) + loop.i,'dd-mon-yyyy') || chr(9)
                );
                execute immediate
                        'lock table t2 partition for ('''  ||
                        to_char(trunc(sysdate) + loop.i,'dd-mon-yyyy') ||
                        ''') in exclusive mode'
                ;
        end loop;
        dbms_output.new_line();
end;
/

prompt  ========================
prompt  How to disable intervals
prompt  ========================

alter table t2 set interval();

The code causes partitions to be created by locking the relevant partition for each date between the minimum and maximum dates in the t1 table – locking the partition is enough to create it if it doesn’t already exists. The code is a little wasteful since it locks each partition 7 times as we walk through the dates, but it’s only a quick demo for a model and for copying a very large table wastage would probably be very small compared to the work of doing the actual data copy. Obviously one could be more sophisticated and limit the code to locking and creating only the partitions needed, and only locking them once each.

Update (28th Feb 2018)

In comment #2 below Alexander Chervinskiy supplies another workaround. If your load is going to go to one partition, and you know which one it is, and you write your SQL specifying the target partition you do not do a massive sort – e.g. if you know you data is for 10th March 2018 then the following SQL would behave nicely:


insert
         /*+ append */
 into
        t2 partition for ('10-Mar-2018')
select
        *
from
        t1
/

Note, particularly, the use of the “for ({value})” clause that makes it possible to identify the correct partition without knowing its name.

Update (March 2022)

Responding to a question about interval partitioning that appeared recently on the Oracle database forum I pointed to this article as a warning about the threat of a possible large sort or large PGA allocation taking place on an “insert /*+ append */” to an interval partitioned table. Since the question related to version 19.14 I re-ran the test case on the 19.11 that I had in a VM on my laptop, but increased the rowcount from 4,000,000 to 8,000.000 – which required a tablespace of roughly 1.5GB.

The results showed the large PGA allocation that I had noted in 12.2, but they also showed a large number of rows sorted – which suggests that I had been lucky enough (or unlucky enough, depending on your point of view) to have enough memory in the 12c instance that the operation that resulted in sorting in 11.2 and 12.1 had completed in memory in 12.2.

Here are the interesting session activity stats from 19.11:


Name                                                     Value
----                                                     -----
table scan rows gotten                               8,001,263

physical writes                                         92,447
physical writes direct                                  92,447
physical writes direct temporary tablespace             14,610

physical reads                                          92,189
physical reads direct                                   92,185
physical reads direct temporary tablespace              14,610

table scan blocks gotten                                77,604
db block gets direct                                    77,837

sorts (rows)                                         1,226,792
session pga memory max                             757,963,920

“physical reads direct” – “physical reads direct temporary tablespace” = 77,575
“physical writes direct” – “physical writes direct temporary tablespace” = 77,837

You won’t be too surprised to hear that the number of blocks of t2 holding rows was 77,837 or that the number of blocks holding rows in t1 was 77,575. The 10032, 10033 and 10046 traces indicate that all the writes and reads on temp are for a single one-pass sort of the 1.2M rows, and dumping the PGA heap while the process is running shows a large fraction (590MB) of the memory as freeable “callheap”, with a further 150MB marked as free. I still don’t know what’s going on in this memory – I did wonder briefly it it might be something to do with gathering partition-level statistics (at 430 partitions 757MB of memory no longer seems unreasonable), but if the statistics are being gathered they’re not being written to the partition metadata.

Comments (15)

15 Comments »

Hello,

expand.m_max_date, where does EXPAND come from?

Thank you,
Frank

Comment by Frank Gordon — February 21, 2018 @ 4:14 pm GMT Feb 21,2018 | Reply
- Frank,
  
  Thanks for the comment.
  
  “expand” and “loop” (which appears as “loop.i”) are pl/sql labels for the anonymous block and the for loop respectively.
  
  They are labels that should appear in double angle brackets (e.g. <<expand>>) but WordPress has taken the brackets as introducing some html tags and eliminated them. I didn’t notice they had disappeared when I published the note; I’m working on getting them to stay visible.
  
  Update: I seem to have got them in place, but I don’t know if whether they’ll stay there if I have to edit the piece in the future.
  
  Comment by Jonathan Lewis — February 21, 2018 @ 4:44 pm GMT Feb 21,2018 | Reply
We have seen similar behaviour on our production, when there was temp consumption on a simple batch insert of a big batch into not yet existing interval partition. The solution we found is to create a partition before inserting lots of data into it. The partitioning is still interval, so to create a partition we do a dummy insert, and roll it back.

Comment by Alexander Chervinskiy — February 22, 2018 @ 11:51 am GMT Feb 22,2018 | Reply
- Alexander,
  
  That’s interesting as it doesn’t appear (at first sight) to agree with the tests I’ve done. Which version ?
  
  Is all the data going to be loaded into a single partition each time, and was part of your workaround to write code that references the specific partition (possible using the “for {value}” syntax). Using a single identified partition on an interval partitioned table is not a test I’ve tried yet.
  
  Comment by Jonathan Lewis — February 22, 2018 @ 12:02 pm GMT Feb 22,2018 | Reply
  - It was 11.2.0.4, and yes, all the data was going to be inserted into one partition, current business date.
    
    Comment by Alexander Chervinskiy — February 25, 2018 @ 12:16 pm GMT Feb 25,2018 | Reply
    - Alexander,
      
      Thanks for the feedback.
      
      Comment by Jonathan Lewis — February 28, 2018 @ 10:18 am GMT Feb 28,2018
Would dbms_redefinition in your case return ORA-1652? I guess it does the same CTAS?

Comment by stee1rat — February 23, 2018 @ 4:11 pm GMT Feb 23,2018 | Reply
- Stee1rat,
  
  Tha last time I looked at dbms_redefinition the intial copy of the table was run as an: “insert /*+ append */ … ” so I assume it would run into the same problem.
  
  Comment by Jonathan Lewis — February 28, 2018 @ 10:17 am GMT Feb 28,2018 | Reply
Hi,

In my opinion the sorting has something to do with the maximum number of partitions per table/index limit (and ORA-14299 or similar)… It probably went to search for max date in data (sorting) and calculate the number of all the interval partitions from p_start to last partition to be created.

Regards,
Goran

Comment by Goran Matahlija — March 9, 2018 @ 2:58 pm GMT Mar 9,2018 | Reply
Goran,

Something along the lines of: ” … except, perhaps, to allow it to do bulk inserts into each partition in turn or, possibly, to avoid creating an entire new partition at exactly the moment it finds just the first row that needs to go into a new partition”

It would certainly make sense to find the maximum value to be used in case it exceeded the limit for which the intervals could be created, but a max() on its own doesn’t require a sort – so sorting to collate all the data for each partition seems more likely.

A test of that hypothesis would be to generate a random set of data and see if it was sorted by the time it had been inserted.

UPDATE: I just did a quick check with randomly generated dates, and the data wasn’t sorted in the final table.

Comment by Jonathan Lewis — March 9, 2018 @ 3:24 pm GMT Mar 9,2018 | Reply
[…] just been reminded of a problem I discovered a few months after writing this note that might be relevant. If you load data into an interval […]

Pingback by 12.2 Partitions | Oracle Scratchpad — February 14, 2020 @ 8:57 am GMT Feb 14,2020 | Reply
I suppose, I finally found a good case for STEP clause of 21c FOR loop :)
for i in 1..m_max_date – trunc(sysdate) by 7 loop …

Comment by Viacheslav Andzhich — June 19, 2021 @ 10:43 am BST Jun 19,2021 | Reply
[…] Interval Partitioning load threat (Feb 2018): a big insert as select, create as select to load/create an interval partitioned table can produce a MASSIVE sort. […]

Pingback by Partitioning Catalogue | Oracle Scratchpad — August 10, 2022 @ 12:45 pm BST Aug 10,2022 | Reply
[…] Interval Partitioning load threat (Feb 2018): a big insert as select, create as select to load/create an interval partitioned table can produce a MASSIVE sort. […]

Pingback by Troubleshooting catalogue | Oracle Scratchpad — August 10, 2022 @ 12:46 pm BST Aug 10,2022 | Reply
[…] is to issue a “lock table … partition for () …” statement (See footnote to this blog note). I did wonder if the attempt to explain a plan that needed a non-existent partition had actually […]

Pingback by Missing Partition | Oracle Scratchpad — March 12, 2024 @ 12:21 pm GMT Mar 12,2024 | Reply

RSS feed for comments on this post. TrackBack URI

Comments and related questions are welcome. Cancel reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Oracle Scratchpad

February 21, 2018