Assume you’ve got a huge temporary tablespace, there’s plenty of space in your data tablespace, you’ve got a very boring, simple table you want to copy and partition, and no-one and nothing is using the system. Would you really expect a (fairly) ordinary “create table t2 {interval partitioning clause etc.} as select * from t1” to end with an Oracle error “ORA-01652: unable to extend temp segment by 128 in tablespace TEMP” . That’s the temporary tablespace that’s out of space, not the target tablespace for the copy.
Here’s a sample data set (tested on 11.2.0.4 and 12.1.0.2) to demonstrate the surprise – you’ll need about 900MB of space in your default tablespace by the time the entire model has run to completion (if it runs to completion):
rem rem Script: pt_interval_threat_2.sql rem Author: Jonathan Lewis rem Dated: Feb 2018 rem column today new_value m_today select to_char(sysdate,'dd-mon-yyyy') today from dual; create table t1 as with g as ( select rownum id from dual connect by level <= 2e3 ) select rownum id, trunc(sysdate) + g2.id created, rpad('x',50) padding from g g1, g g2 where rownum <= 4e6 --> comment to avoid WordPress format issue ; execute dbms_stats.gather_table_stats(user,'t1',method_opt=>'for all columns size 1')
I’ve created a table of 4 million rows, covering 2,000 dates out into the future starting from sysdate+1 (tomorrow). As you can see there’s nothing in the slightest bit interesting, unusual, or exciting about the data types and content of the table.
I said my “create table as select” was fairly ordinary – but it’s actually a tiny bit out off the beaten track because it’s going to create an interval partitioned copy of this table.
execute snap_my_stats.start_snap create table t2 partition by range(created) interval(numtodsinterval(7, 'day')) ( partition p_start values less than (to_date('&m_today','dd-mon-yyyy')) ) storage(initial 1M) nologging as select * from t1 ; set serveroutput on execute snap_my_stats.end_snap
I’ve created the table as a range-partitioned table with an interval() declared. Conveniently I need only mention the partitioning column by name in the declaration, rather than listing all the columns with their types, and I’ve only specified a single starting partition. Since the interval is 7 days and the data spans 2,000 days I’m going to end up with nearly 290 partitions added.
There’s no guarantee that you will see the ORA-01652 error when you run this test – the data size is rather small and your machine may have sufficient other resources to hide the problem even when you’re looking for it – but the person who reported the problem on one of the Oracle database forums was copying a table of 2.5 Billion rows using about 200 GB of storage, so size is probably important, hence the 4 million rows as a starting point on my small system.
Of course, hitting an ORA-01652 on TEMP when doing a simple “create as select” is such an unlikely sounding error that you don’t necessarily have to see it actually happen; all you need to see (at least as a starting point in a small model) is TEMP being used unexpectedly so, for my first test (on 11.2.0.4), I’ve included some code to calculate and report changes in the session stats – that’s the calls to the package snap_my_stats. Here are some of the more interesting results:
--------------------------------- Session stats - 20-Feb 16:58:24 Interval:- 14 seconds --------------------------------- Name Value ---- ----- table scan rows gotten 4,000,004 table scan blocks gotten 38,741 session pga memory max 181,338,112 sorts (rows) 2,238,833 physical reads direct temporary tablespace 23,313 physical writes direct temporary tablespace 23,313
The first couple of numbers show the 4,000,000 rows being scanned from 38,741 table blocks – and that’s not a surprise. But for a simple copy the incremental 181MB of PGA memory we’ve acquired is a little surprising, though less so when we see that we’ve sorted 2.2M rows, and then ended up spilling 23,313 blocks to the temporary tablespace. But why are we sorting anything – what are those rows ?
My first thought was that there was a bug in some recursive SQL that was trying to define or identify dynamically created partitions, or maybe something in the space management code trying to find free space, so the obvious step was to enable extended tracing and look for any recursive statements that were running a large number of times or doing a lot of work. There weren’t any – and the trace file (particularly the detailed wait events) suggested the problem really was purely to do with the CTAS itself; so I ran the code again enabling events 10032 and 10033 (the sort traces) and found the following:
---- Sort Statistics ------------------------------ Initial runs 1 Input records 2140000 Output records 2140000 Disk blocks 1st pass 22292 Total disk blocks used 22294 Total number of comparisons performed 0 Temp segments allocated 1 Extents allocated 175 Uses version 1 sort Uses asynchronous IO
One single operation had resulted in Oracle sorting 2.14 million rows (but not making any comparisons!) – and the only table in the entire system with enough rows to do that was my source table! Oracle seems to be “sorting” (but possibly just buffering) a large fraction of the data for no obvious reason before inserting it.
- Why, and why only 2.14M out of 4M ?
- Does it do the same on 12.1.0.2 (yes), what about 12.2.0.1 (no – but I still saw a large PGA allocation, possibly stats gathering).
- Is there any clue about this on MoS (yes Bug 17655392 – though that one is erroneously, I think, flagged as “closed not a bug”)
- Is there a workaround ? (Yes – I think so).
Playing around and trying to work out what’s happening the obvious pointers are the large memory allocation and the “incomplete” spill to disc – what would happen if I fiddled around with workarea sizing – switching it to manual, say, or setting the pga_aggregate_target to a low value. At one point I got results showing 19M rows (that’s not a typo, it really was close to 5 times the number of rows in the table) sorted with a couple of hundred thousand blocks of TEMP used – the 10033 trace showed 9 consecutive passes (that I can’t explain) as the code executed from which I’ve extract the row counts, temp blocks used, and number of comparisons made:
Input records 3988000 Total disk blocks used 41544 Total number of comparisons performed 0 Input records 3554000 Total disk blocks used 37023 Total number of comparisons performed 0 Input records 3120000 Total disk blocks used 32502 Total number of comparisons performed 0 Input records 2672000 Total disk blocks used 27836 Total number of comparisons performed 0 Input records 2224000 Total disk blocks used 23169 Total number of comparisons performed 0 Input records 1762000 Total disk blocks used 18357 Total number of comparisons performed 0 Input records 1300000 Total disk blocks used 13544 Total number of comparisons performed 0 Input records 838000 Total disk blocks used 8732 Total number of comparisons performed 0 Input records 376000 Total disk blocks used 3919 Total number of comparisons performed 0
There really doesn’t seem to be any good reason why Oracle should do any sorting of the data (and maybe it wasn’t given the total number of comparisons performed in this case) – except, perhaps, to allow it to do bulk inserts into each partition in turn or, possibly, to avoid creating an entire new partition at exactly the moment it finds just the first row that needs to go into a new partition. Thinking along these lines I decided to pre-create all the necessary partitions just in case this made any difference – the code is at the end of the blog note. Another idea was to create the table empty (with, and without, pre-created partitions), then do an “insert /*+ append */” of the data.
Nothing changed (much – though the number of rows sorted kept varying).
And then — it all started working perfectly with virtually no rows reported sorted and no I/O to the temporary tablespace !
Fortunately I thought of looking at v$memory_resize_ops and found that the automatic memory management had switched a lot of memory to the PGA, allowing Oracle to do whatever it needed to do completely in memory without reporting any sorting (and a couple of checks on v$process_memory showed that the amount of memory assigned to category “Other” – rather than “SQL”, interestingly – had exceeded the current value of the pga_aggregate_target). A quick re-start of the instance fixed that “internal workaround”.
Still struggling with finding a reasonable workaround I decided to see if the same anomaly would appear if the table were range partitioned but didn’t have an interval clause. This meant I had to precreate all the necessary partitions, of course – which I did by starting with an interval partitioned table, letting Oracle figure out which partitions to create, then disabling the interval feature – again, see the code at the end of this note.
The results: no rows sorted on the insert, no writes to temp. Unless it’s just a question of needing even more data to reproduce the problem with simple range partitioned tables, it looks as if there’s a problem somewhere in the code for interval partitioned tables and all you have to do to work around it is precreate loads of partitions, disable intervals, load, then re-enable the intervals.
Footnote:
Here’s the “quick and dirty” code I used to generate the t2 table with precreated partitions:
create table t2 partition by range(created) interval(numtodsinterval(7, 'day')) ( partition p_start values less than (to_date('&m_today','dd-mon-yyyy')) ) storage(initial 1M) nologging monitoring as select * from t1 where rownum <= 0 ; <<expand>> declare m_max_date date; begin select max(created) into expand.m_max_date from t1 ; <<loop>> for i in 1..expand.m_max_date - trunc(sysdate) loop dbms_output.put( to_char(trunc(sysdate) + loop.i,'dd-mon-yyyy') || chr(9) ); execute immediate 'lock table t2 partition for (''' || to_char(trunc(sysdate) + loop.i,'dd-mon-yyyy') || ''') in exclusive mode' ; end loop; dbms_output.new_line(); end; / prompt ======================== prompt How to disable intervals prompt ======================== alter table t2 set interval();
The code causes partitions to be created by locking the relevant partition for each date between the minimum and maximum dates in the t1 table – locking the partition is enough to create it if it doesn’t already exists. The code is a little wasteful since it locks each partition 7 times as we walk through the dates, but it’s only a quick demo for a model and for copying a very large table wastage would probably be very small compared to the work of doing the actual data copy. Obviously one could be more sophisticated and limit the code to locking and creating only the partitions needed, and only locking them once each.
Update (28th Feb 2018)
In comment #2 below Alexander Chervinskiy supplies another workaround. If your load is going to go to one partition, and you know which one it is, and you write your SQL specifying the target partition you do not do a massive sort – e.g. if you know you data is for 10th March 2018 then the following SQL would behave nicely:
insert /*+ append */ into t2 partition for ('10-Mar-2018') select * from t1 /
Note, particularly, the use of the “for ({value})” clause that makes it possible to identify the correct partition without knowing its name.
Update (March 2022)
Responding to a question about interval partitioning that appeared recently on the Oracle database forum I pointed to this article as a warning about the threat of a possible large sort or large PGA allocation taking place on an “insert /*+ append */” to an interval partitioned table. Since the question related to version 19.14 I re-ran the test case on the 19.11 that I had in a VM on my laptop, but increased the rowcount from 4,000,000 to 8,000.000 – which required a tablespace of roughly 1.5GB.
The results showed the large PGA allocation that I had noted in 12.2, but they also showed a large number of rows sorted – which suggests that I had been lucky enough (or unlucky enough, depending on your point of view) to have enough memory in the 12c instance that the operation that resulted in sorting in 11.2 and 12.1 had completed in memory in 12.2.
Here are the interesting session activity stats from 19.11:
Name Value ---- ----- table scan rows gotten 8,001,263 physical writes 92,447 physical writes direct 92,447 physical writes direct temporary tablespace 14,610 physical reads 92,189 physical reads direct 92,185 physical reads direct temporary tablespace 14,610 table scan blocks gotten 77,604 db block gets direct 77,837 sorts (rows) 1,226,792 session pga memory max 757,963,920
“physical reads direct” – “physical reads direct temporary tablespace” = 77,575
“physical writes direct” – “physical writes direct temporary tablespace” = 77,837
You won’t be too surprised to hear that the number of blocks of t2 holding rows was 77,837 or that the number of blocks holding rows in t1 was 77,575. The 10032, 10033 and 10046 traces indicate that all the writes and reads on temp are for a single one-pass sort of the 1.2M rows, and dumping the PGA heap while the process is running shows a large fraction (590MB) of the memory as freeable “callheap”, with a further 150MB marked as free. I still don’t know what’s going on in this memory – I did wonder briefly it it might be something to do with gathering partition-level statistics (at 430 partitions 757MB of memory no longer seems unreasonable), but if the statistics are being gathered they’re not being written to the partition metadata.
Hello,
expand.m_max_date, where does EXPAND come from?
Thank you,
Frank
Comment by Frank Gordon — February 21, 2018 @ 4:14 pm GMT Feb 21,2018 |
Frank,
Thanks for the comment.
“expand” and “loop” (which appears as “loop.i”) are pl/sql labels for the anonymous block and the for loop respectively.
They are labels that should appear in double angle brackets (e.g. <<expand>>) but WordPress has taken the brackets as introducing some html tags and eliminated them. I didn’t notice they had disappeared when I published the note; I’m working on getting them to stay visible.
Update: I seem to have got them in place, but I don’t know if whether they’ll stay there if I have to edit the piece in the future.
Comment by Jonathan Lewis — February 21, 2018 @ 4:44 pm GMT Feb 21,2018 |
We have seen similar behaviour on our production, when there was temp consumption on a simple batch insert of a big batch into not yet existing interval partition. The solution we found is to create a partition before inserting lots of data into it. The partitioning is still interval, so to create a partition we do a dummy insert, and roll it back.
Comment by Alexander Chervinskiy — February 22, 2018 @ 11:51 am GMT Feb 22,2018 |
Alexander,
That’s interesting as it doesn’t appear (at first sight) to agree with the tests I’ve done. Which version ?
Is all the data going to be loaded into a single partition each time, and was part of your workaround to write code that references the specific partition (possible using the “for {value}” syntax). Using a single identified partition on an interval partitioned table is not a test I’ve tried yet.
Comment by Jonathan Lewis — February 22, 2018 @ 12:02 pm GMT Feb 22,2018 |
It was 11.2.0.4, and yes, all the data was going to be inserted into one partition, current business date.
Comment by Alexander Chervinskiy — February 25, 2018 @ 12:16 pm GMT Feb 25,2018 |
Alexander,
Thanks for the feedback.
Comment by Jonathan Lewis — February 28, 2018 @ 10:18 am GMT Feb 28,2018
Would dbms_redefinition in your case return ORA-1652? I guess it does the same CTAS?
Comment by stee1rat — February 23, 2018 @ 4:11 pm GMT Feb 23,2018 |
Stee1rat,
Tha last time I looked at dbms_redefinition the intial copy of the table was run as an: “insert /*+ append */ … ” so I assume it would run into the same problem.
Comment by Jonathan Lewis — February 28, 2018 @ 10:17 am GMT Feb 28,2018 |
Hi,
In my opinion the sorting has something to do with the maximum number of partitions per table/index limit (and ORA-14299 or similar)… It probably went to search for max date in data (sorting) and calculate the number of all the interval partitions from p_start to last partition to be created.
Regards,
Goran
Comment by Goran Matahlija — March 9, 2018 @ 2:58 pm GMT Mar 9,2018 |
Goran,
Something along the lines of: ” … except, perhaps, to allow it to do bulk inserts into each partition in turn or, possibly, to avoid creating an entire new partition at exactly the moment it finds just the first row that needs to go into a new partition”
It would certainly make sense to find the maximum value to be used in case it exceeded the limit for which the intervals could be created, but a max() on its own doesn’t require a sort – so sorting to collate all the data for each partition seems more likely.
A test of that hypothesis would be to generate a random set of data and see if it was sorted by the time it had been inserted.
UPDATE: I just did a quick check with randomly generated dates, and the data wasn’t sorted in the final table.
Comment by Jonathan Lewis — March 9, 2018 @ 3:24 pm GMT Mar 9,2018 |
[…] just been reminded of a problem I discovered a few months after writing this note that might be relevant. If you load data into an interval […]
Pingback by 12.2 Partitions | Oracle Scratchpad — February 14, 2020 @ 8:57 am GMT Feb 14,2020 |
I suppose, I finally found a good case for STEP clause of 21c FOR loop :)
for i in 1..m_max_date – trunc(sysdate) by 7 loop …
Comment by Viacheslav Andzhich — June 19, 2021 @ 10:43 am BST Jun 19,2021 |
[…] Interval Partitioning load threat (Feb 2018): a big insert as select, create as select to load/create an interval partitioned table can produce a MASSIVE sort. […]
Pingback by Partitioning Catalogue | Oracle Scratchpad — August 10, 2022 @ 12:45 pm BST Aug 10,2022 |
[…] Interval Partitioning load threat (Feb 2018): a big insert as select, create as select to load/create an interval partitioned table can produce a MASSIVE sort. […]
Pingback by Troubleshooting catalogue | Oracle Scratchpad — August 10, 2022 @ 12:46 pm BST Aug 10,2022 |
[…] is to issue a “lock table … partition for () …” statement (See footnote to this blog note). I did wonder if the attempt to explain a plan that needed a non-existent partition had actually […]
Pingback by Missing Partition | Oracle Scratchpad — March 12, 2024 @ 12:21 pm GMT Mar 12,2024 |