A question about hunting down the source of the error “ORA-01652 unable to extend temp segment by NNN in tablespace XXX” shows up on the Oracle-L mailing list or the Oracle developer community forum from time to time. In most cases the tablespace referenced is the temporary tablespace, which means the session reporting the error was probably trying to allocate some space for sorting, or doing a hash join, or instantiating a GTT (global temporary table) or a CTE (common table expression / “with” subquery). The difficulty in cases like this is that the session reporting the error might be the victim of some other session’s greed – so looking at what the session was doing won’t necessarily point you to the real problem.
Of course you then run into a further problem tracking down the source of the problem. By the time you hear the complaint (even if it’s only seconds after the error appeared) the session that had been hogging the temporary tablespace may have finished what it was doing, leaving a huge amount of free space in the temporary tablespace and suggesting (to the less experienced DBA) that there may be something wrong with the way Oracle has been accounting for space usage.
If you find yourself in this situation remember that (if you’re licensed to take advantage of it) the Active Session History (ASH) may be able to help. One of the columns in v$active_session_history is called temp_space_allocated with the slightly misleading description: “Amount of TEMP memory (in bytes) consumed by this session at the time this sample was taken”. A simple query against v$active_session_history may be enough to identify the session and SQL statement that had been holding the temporary space when the error was raised; for example:
column pga_allocated format 999,999,999,999 column temp_space_allocated format 999,999,999,999 break on session_id skip 1 on session_serial# select session_id, session_serial#, sample_id, sql_id, pga_allocated, temp_space_allocated from v$active_session_history where sample_time between sysdate - 5/(24*60) and sysdate and nvl(temp_space_allocated,0) != 0 order by session_id, sample_id /
All I’ve done for this example is query v$active_session_history for the last 5 minutes reporting a minimum of information from any rows that show temp space usage. As a minor variation on the theme you can obviously change the time range and you might want to limit the output to rows reporting more than 1MB (say) of temp space usage.
You’ll notice that I’ve also reported the pga_allocated (Description: Amount of PGA memory (in bytes) consumed by this session at the time this sample was taken) in this query; this is just a little convenience – a query that’s taking a lot of temp space will probably start by acquiring a lot of PGA memory so it’s nice to be able to see the two figures together.
There are plenty of limitations and flaws in the usefulness of this query and I’ll say something about that after showing an example of usage. Let’s start with a script to build some data before running a space-consuming query:
rem rem Script: allocate_tempspace.sql rem Author: Jonathan Lewis rem Dated: Nov 2019 rem rem Last tested rem 19.3.0.0 rem 12.2.0.1 rem create table t1 as select * from all_objects ; insert into t1 select * from t1; commit; insert into t1 select * from t1; commit; insert into t1 select * from t1; commit; insert into t1 select * from t1; commit; execute dbms_stats.gather_table_stats(null,'t1') execute dbms_lock.sleep(20) set pagesize 60 set linesize 255 set trimspool on set serveroutput off alter session set statistics_level = all; with ttemp as ( select /*+ materialize */ * from t1 ) select /*+ no_partial_join(@sel$2 t1b) no_place_group_by(@sel$2) */ t1a.object_type, max(t1a.object_name) from ttemp t1a, ttemp t1b where t1a.object_id = t1b.object_id group by t1a.object_type order by t1a.object_type ; select * from table(dbms_xplan.display_cursor(null,null,'allstats last'));
My working table t1 consists of 16 copies of the view all_objects – so close to 1 million rows in my case – and the query is hinted to avoid any of the clever transformations that the optimizer could use to reduce the workload so it’s going to do a massive hash join and aggregation to report a summary of a couple of dozen rows. Here’s the execution plan (in this case from 12.2.0.1, though the plan is the same for 19.3 with some variations in the numbers).
SQL_ID 1cwabt12zq6zb, child number 0 ------------------------------------- Plan hash value: 1682228242 ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers | Reads | Writes | OMem | 1Mem | Used-Mem | Used-Tmp| ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | | 29 |00:00:10.03 | 47413 | 21345 | 12127 | | | | | | 1 | TEMP TABLE TRANSFORMATION | | 1 | | 29 |00:00:10.03 | 47413 | 21345 | 12127 | | | | | | 2 | LOAD AS SELECT (CURSOR DURATION MEMORY)| SYS_TEMP_0FD9D665B_E2772D3 | 1 | | 0 |00:00:01.51 | 28915 | 0 | 9217 | 2068K| 2068K| | | | 3 | TABLE ACCESS FULL | T1 | 1 | 989K| 989K|00:00:00.24 | 19551 | 0 | 0 | | | | | | 4 | SORT GROUP BY | | 1 | 29 | 29 |00:00:08.51 | 18493 | 21345 | 2910 | 6144 | 6144 | 6144 (0)| | |* 5 | HASH JOIN | | 1 | 15M| 15M|00:00:03.93 | 18493 | 21345 | 2910 | 48M| 6400K| 65M (1)| 25600 | | 6 | VIEW | | 1 | 989K| 989K|00:00:00.36 | 9233 | 9218 | 0 | | | | | | 7 | TABLE ACCESS FULL | SYS_TEMP_0FD9D665B_E2772D3 | 1 | 989K| 989K|00:00:00.35 | 9233 | 9218 | 0 | | | | | | 8 | VIEW | | 1 | 989K| 989K|00:00:00.40 | 9257 | 9217 | 0 | | | | | | 9 | TABLE ACCESS FULL | SYS_TEMP_0FD9D665B_E2772D3 | 1 | 989K| 989K|00:00:00.39 | 9257 | 9217 | 0 | | | | | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 5 - access("T1A"."OBJECT_ID"="T1B"."OBJECT_ID")
Critically this plan shows us two uses of the temp space but only reports one of them as Used-Tmp. The “hash join” at operation 5 tells us that it reached 65MB of (tunable PGA) memory before going “1-pass”, eventually dumping 25,600 KB to disc. This space usage is corroborated by the 2,910 writes (which, at an 8KB block size, would be 23,280 KB). The missing Used-Tmp, however, is the space taken up by the materialized CTE. We can see that operation 2 is a “load as select” that writes 9,217 blocks to disc (subsequently read back twice – the tablescans shown in operations 7 and 9). That’s 74,000 KB of temp space that doesn’t get reported as Used-Tmp.
If we take a look at the plan from 19.3 we see different numbers, but the same “error of omission”:
SQL_ID 1cwabt12zq6zb, child number 0 ------------------------------------- Plan hash value: 1682228242 ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers | Reads | Writes | OMem | 1Mem | Used-Mem | Used-Tmp| ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | | 25 |00:00:08.15 | 34905 | 13843 | 8248 | | | | | | 1 | TEMP TABLE TRANSFORMATION | | 1 | | 25 |00:00:08.15 | 34905 | 13843 | 8248 | | | | | | 2 | LOAD AS SELECT (CURSOR DURATION MEMORY)| SYS_TEMP_0FD9D6624_E259E68 | 1 | | 0 |00:00:01.26 | 23706 | 0 | 5593 | 2070K| 2070K| | | | 3 | TABLE ACCESS FULL | T1 | 1 | 907K| 907K|00:00:00.21 | 18024 | 0 | 0 | | | | | | 4 | SORT GROUP BY | | 1 | 25 | 25 |00:00:06.89 | 11193 | 13843 | 2655 | 6144 | 6144 | 6144 (0)| | |* 5 | HASH JOIN | | 1 | 14M| 14M|00:00:03.55 | 11193 | 13843 | 2655 | 44M| 6400K| 64M (1)| 23M| | 6 | VIEW | | 1 | 907K| 907K|00:00:00.26 | 5598 | 5594 | 0 | | | | | | 7 | TABLE ACCESS FULL | SYS_TEMP_0FD9D6624_E259E68 | 1 | 907K| 907K|00:00:00.25 | 5598 | 5594 | 0 | | | | | | 8 | VIEW | | 1 | 907K| 907K|00:00:00.34 | 5595 | 5594 | 0 | | | | | | 9 | TABLE ACCESS FULL | SYS_TEMP_0FD9D6624_E259E68 | 1 | 907K| 907K|00:00:00.33 | 5595 | 5594 | 0 | | | | | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 5 - access("T1A"."OBJECT_ID"="T1B"."OBJECT_ID")
With slightly fewer rows in t1 (907K vs. 989K) we write 5,593 blocks for the materialized CTE (instead of 9,217) and spill 2,655 blocks during the hash join (instead of 2,910). But again it’s only the hash join spill that is reported under Used-Tmp. Note, by the way, that the Used-Tmp in 12.2 was reported in KB when it’s reported in MB in 19.3.0.0.
Side note: comparing the number of rows created and blocks written for the CTE, it looks as if 19.3 is using the data blocks much more efficiently than 12.2. There’s no obvious reason for this (though a first guess would be that the older mechanism is to write a GTT with pctfree=10 while the new avoid any free space and transactional details) so, as ever, I now have another draft for a blog note reminding me to investigate (eventually) what differences there are in CTE storage on the upgrade. It’s something that might make a difference in a few special cases.
With the figures from the execution plans in mind we can now look at the results of the query against v$active_session_history. Conveniently the queries took a few seconds to complete, so we’re going to see several rows for each execution.
First the results from 12.2.0.1
SESSION_ID SESSION_SERIAL# SAMPLE_ID SQL_ID PGA_ALLOCATED TEMP_SPACE_ALLOCATED ---------- --------------- ---------- ------------- ---------------- -------------------- 14 22234 15306218 1cwabt12zq6zb 95,962,112 1,048,576 15306219 1cwabt12zq6zb 97,731,584 37,748,736 15306220 1cwabt12zq6zb 148,194,304 77,594,624 15306221 1cwabt12zq6zb 168,117,248 85,983,232 15306222 1cwabt12zq6zb 168,117,248 90,177,536 15306223 1cwabt12zq6zb 168,117,248 95,420,416 15306224 1cwabt12zq6zb 168,117,248 98,566,144 15306225 1cwabt12zq6zb 168,117,248 102,760,448 15306226 1cwabt12zq6zb 116,933,632 103,809,024 15306227 1cwabt12zq6zb 116,933,632 103,809,024 15306228 b66ycurnwpgud 8,602,624 1,048,576
I pointed out that we had 25,600 KB reported as Used-Tmp and roughly 74,000 KB unreported – a total of nearly 100,000 KB that is reasonably close to the 103,800,000 bytes reported by ASH. Moreover the timing of the plan (loading the CTE in the first 2 seconds) seems to agree with the growth to 77,590,000 of temp_space_allocated by the time we get to sample_id 15306220 in ASH. Then we have several seconds of slow growth as the hash join takes place and feeds its resulte up to the sort group by. At the end of the query we happen to have been lucky enough to catch one last sample just before the session had released all its temp space and ceased to be active. (Note: however, that the sql_id at that sample point was not the sql_id of our big query – and that’s a clue about one of the limitations of using ASH to find the greedy SQL.)
We see the same pattern of behaviour in 19.3.0.0:
SESSION_ID SESSION_SERIAL# SAMPLE_ID SQL_ID PGA_ALLOCATED TEMP_SPACE_ALLOCATED ---------- --------------- ---------- ------------- ---------------- -------------------- 136 42767 2217500 1cwabt12zq6zb 143,982,592 46,137,344 2217501 1cwabt12zq6zb 193,527,808 54,525,952 2217502 1cwabt12zq6zb 193,527,808 57,671,680 2217503 1cwabt12zq6zb 193,527,808 61,865,984 2217504 1cwabt12zq6zb 197,722,112 67,108,864 2217505 1cwabt12zq6zb 150,601,728 70,254,592 2217506 1cwabt12zq6zb 150,601,728 70,254,592
We start with an almost instantaneous jump to 46MB of temp_space_allocated in the first second of the query – that’s the 5,593 blocks of the CTE being materialized, then the slow growth of temp space as the hash join runs, spills to disc, and passes its data up to the sort group by. Again we can see that the peak usage was the CTE (46MB) plus the reported spill of 23MB (plus rounding errors and odd bits).
Preliminary Observations
Queries against ASH (v$active_session_history) can show us sessions that were holding space in the temporary tablespace at the moment a sample of active sessions was taken. This may allow us to identify greedy sessions that were causing other sessions to fail with ORA-01652 (unable to allocate temp segment).
We have seen that there is at least one case where we get better information about temp space allocation from ASH than we do from the variants on v$sql_plan that include the SQL Workarea information (v$sql_workarea, v$sql_workarea_active) because the space acquired during materialization of CTEs is not reported as a “tunable SQL workarea” but does appear in the ASH temp_space_allocated.
At first sight it looks as if we may be able to use the query against ASH to identify the statement (by sql_id) that was the one being run by the greedy session when it consumed all the space. As we shall see in a further article, there are various reasons why this may over-optimistic, however in many cases there’s a fair chance that when you see the same sql_id appearing in a number of consecutive rows of the report then that statement may be the thing that is responsible for the recent growth in temp space usage – and you can query v$sql to find the text and call dbms_xplan.display_cursor() to get as much execution plan information as possible.
Further questions
- When does a session release the temp_space_allocated? Will the space be held (locked) as long as the cursor is open, or can it be released when it is no longer needed? Will it be held, but releasable, even after the cursor has (from the client program’s perspective) been closed?
- Could we be fooled by a report that said a session was holding a lot of space when it didn’t need it and would have released it if the demand had appeared?
- Under what conditions might the temp_space_allocated in an ASH sample have nothing to do with the sql_id reported in the same sample?
- Are there any other cases where ASH might report temp_space_allocated when an execution plan doesn’t?
- Is temp_space_allocated only about the temporary tablespace, or could it include informatiom about other (“permanent”) tablespaces ?
Stay tuned for further analysis of the limitations of using v$active_session_history.temp_space_allocated to help identify the srouce of a space management ORA-01652 issue.
Many thanks for this post, Jonathan. I‘m looking forward on your next findings.
The simple way of evaluating temp_space_allocated from ASH I‘ve also placed in a GUI workflow like shown here:
https://rammpeter.blogspot.com/2016/03/panorama-how-to-identify-root-cause.html
Comment by Peter Ramm — December 11, 2019 @ 9:35 pm GMT Dec 11,2019 |