Oracle Scratchpad

February 4, 2020


Filed under: Bugs,Infrastructure,Oracle,undo — Jonathan Lewis @ 3:50 pm GMT Feb 4,2020

The view v$undostat is a view holding summary information about undo activity that can be used by the automatic undo mechanism to deal with optimising the undo retention time (hence undo space allocation). The view holds one row for every ten minute interval in the last 4 days (96 hours) and includes two columns called maxquerylen and maxqueryid – which tell you something about the query that was considered to be the longest running query active in the interval.

In this note I want to explain why the contents of these two columns are sometimes (possibly often) completely irrelevant despite there being a few notes on the internet about how you should investigate them to help you decide on a suitable setting for the undo_retention.

The descriptions in the 19c reference manual for these columns are as follows:

  • maxquerylen – Identifies the length of the longest query (in seconds) executed in the instance during the period. You can use this statistic to estimate the proper setting of the UNDO_RETENTION initialization parameter. The length of a query is measured from the cursor open time to the last fetch/execute time of the cursor. Only the length of those cursors that have been fetched/executed during the period are reflected in the view.
  • maxqueryid – SQL identifier of the longest running SQL statement in the period

It would be hard to misunderstand the meaning of the second column: if the first column tells us that Oracle has spotted a “longest query” then the second column gives us the sql_id so we can check v$sql to find out what it was. But what sort of queries are going to show up as the longest query in the interval?

Here’s an example from a few hours of a nearly idle instance, querying the begin and end times (formatted to show just day of month + time), with the two critical columns,

        begin_time, end_time, maxquerylen, maxqueryid
order by

--------------- ----------- ----------- -------------
04 10:50:18	04 11:00:18	      0
04 11:00:18	04 11:10:18	    356 f3yfg50ga0r8n
04 11:10:18	04 11:20:18	    883 25u1mbkcr9rnu
04 11:20:18	04 11:30:18	   1486 25u1mbkcr9rnu
04 11:30:18	04 11:40:18	   2090 25u1mbkcr9rnu
04 11:40:18	04 11:50:18	      0
04 11:50:18	04 12:00:18	   3299 25u1mbkcr9rnu
04 12:00:18	04 12:10:18	   3903 25u1mbkcr9rnu
04 12:10:18	04 12:20:18	   4507 25u1mbkcr9rnu
04 12:20:18	04 12:30:18	      0
04 12:30:18	04 12:40:18	      0
04 12:40:18	04 12:50:18	      0
04 12:50:18	04 13:00:18	      0
04 13:00:18	04 13:10:18	      0
04 13:10:18	04 13:20:18	      0
04 13:20:18	04 13:30:18	      0
04 13:30:18	04 13:37:27	   9035 25u1mbkcr9rnu

173 rows selected.

Notice, particularly, that the SQL_ID 25u1mbkcr9rnu disappears from the 11:40 interval, then re-appears at 11:50, then disappears again from 12:20 through 13:20 (lunchtime), then reappears again at 13:30. And when it reappears after an absence the query length has increased in a way that’s consistent with the gap. So it looks as if the query wasn’t running during the gap, but turns out to have been running after the gap ended. (Is this a Quantum query – does it only run if you’re looking at it?)

The explanation is in the detail of the definition: “from the cursor open time to the last fetch/execute time”. From an SQL*Plus session I “set pause on” then executed the query “select * from all_objects” and hit return a few times to get a few pages of output. Then, spread over the next couple of hours, I hit return a few more times to get a few more pages of output. Each time I hit return my session does another fetch call, and the code behind v$undostat notices that my query is still active.

I don’t know exactly how Oracle is keeping track of “active” statements because there seem to be some inconsistencies in the reporting (and I’ll comment on those in a moment) but, as a first approximation, until you close a cursor (either explicitly or implicitly) some piece of Oracle’s code registers the fact that the query might do further fetches, which means it might need to apply more undo to make current data read-consistent with the cursor’s start SCN, which means that it should take note of the time the cursor has been open because the undo retention time might need to be that long.


I said there were some inconsistencies in the reporting. I’ve noticed three anomalies – perhaps due to the extreme idleness of the instance I was using for testing.

  1. At about 12:45 I hit return a few times to get the maxquerylen and maxqueryid refreshed – but the code didn’t seem to notice that I had done a few more fetches of a long running query. So it seems to be possible for Oracle to miss the query that should be reported.
  2. At about 11:52 I hit return a few times again, and you can see that the two critical columns were refreshed, but if you do the arithmetic Oracle has attributed 600 seconds to the query – the whole of the 10 minute refresh interval, not just the time up to the moment of the last fetch that I had done in that interval.
  3. I didn’t hit return again until about 12:45 (the moment in point 1 above when the two columns didn’t refresh the way I though they should), but the columns kept updating through 12:00 and 12:10 intervals anyway before disappearing from the 12:20 interval. So it looks like queries can be reported as long running even when they haven’t been executing or fetching.

The is one last anomaly – and this relates to the reason I started looking at this columns. If you check the entry for 11:00 you’ll see that the SQL Id ‘f3yfg50ga0r8n’ has been reported as “running” for 356 seconds. But here’s what I found in v$sql for that query:

select  executions, parse_calls, fetches, end_of_fetch_count, elapsed_time, cpu_time, rows_processed, sql_text
from    v$sql
where   sql_id = 'f3yfg50ga0r8n'

---------- ----------- ---------- ------------------ ------------ ---------- -------------- ------------------------------------------
        79          79         79                  0        20487      10667             79 select obj# from obj$ where dataobj# = :1

The SQL lookes like a sys-recursive statement which, in 79 executions, has accumulated 20 milliseconds of elapsed time (rather than 356 seconds – but that difference could just be related to one or other of the anomalies I reported earlier). The key thing to note is that the value of column end_of_fetch_count is zero: this looks like a statement where Oracle has simply not bothered to fetch all the data and has then not bothered to close the cursor. As a result it’s possible that each time the statement is executed (note that parse_calls = executions, so we’re not looking at a “held” cursor) the code behind v$undostat looks back at the time the cursor was initially opened to measure the query length, rather than looking at the time the statement was re-executed.

This may go a long way to answering the question that came up on Oracle-l earlier on today as follows:

The following query (sql_id is 89w8y2pgn25yd) was recorded in v$undostat.maxqueryid in a 12.2. database during a period of a high undo usage: select ts# from sys.ts$ where ts$.online$ != 3 and bitand(flags,2048) != 2048;

from    v$undostat u
where   maxqueryid='89w8y2pgn25yd'

--------  --------  -----------  --------------  ----------
   39199      4027         1378               5     2531960

What is this query, and how does it manage to report a maximum query length of 1,378 seconds?

Just like the one above it’s a sys-recursive query; and this one appears when you query dba_tablespaces – and even though it executes once for every row it takes just fractions of a second to execute. But if you trace a query like “select tablespace_name from dba_tablespaces” you’ll see that every time the query is called the trace file will show lines for: “Parse, Exec, Fetch, Close” until the last call – which doesn’t report a “Close”.

Just like my original “select * from all_objects” there’s a query dangling with an open cursor, leaving Oracle with the opportunity to go back to the moment it was opened and report it as a “long running query”.


The maxquerylen and maxqueryid in v$undostat don’t tell you about statements that have taken a long time to change a lot of data and generate a lot of undo; they tell you about statements that might need to apply a lot of undo to see read-consistent data and therefore might become victims in a “ORA-01555: snapshot too old” event.

The statements reported may be completely irrelevant because there are various reasons (some, possibly, bug-related) why a cursor may be treated as if it opened a long time in the past when it was actually a re-use of an existing “dangling” cursor. It’s also possible that some bad client code will treat cursor in a way that does no harm to the client program but hides a more interesting query that would otherwise have been reported by these two columns.

Update Feb 2020

I don’t know why I didn’t think of searching MOS before poking around at this issue, but I’ve just done a search on “maxquerylen maxqueryid” and one of the first Knowledgebase articles to show up – dated September 2019 – was Doc ID 2005931.1: Wrongly calculated MAXQUERYLEN Causing ORA-01555 or ORA-30036 Errors.

The document reports several bugs all relating to the same issue, and all closed as duplicates of unpublished bug: 12314489: OCIRELEASESTMT() TO PUT THE CURSORS IN CURBOUND STATE WHEN PLACING THEM IN CACHE This is also closed and replaced by bug 26833932 which is also unpublished and doesn’t even have a description.

Update Dec 2020

A question on the MOS Community Forums today raised a question about SQL_ID f3yfg50ga0r8n appearing in v$active_session_history with a an sql_exec_start (22:25) that was nearly 38 minutes before the sample_time (23:03) with no other samples appearing in ASH between those two times. This doesn’t necessarily mean anything special (it could be a query which is very efficient at the database but is only asked to return a few rows to the client every few seconds), but it’s the same query that I commented on in the section on Inconsistencies.

This prompted me to do another search for the bug numbers I reported in February, and I found that something has been published fairly recently about bug 26833932 – search for the document id 26833932.8, and you’ll find “Bug 26833932 OCIStmtRelease() To Put Cursors In CURBOUND State When Placing Them In Cache” . This is reported as fixed in 18.1 with various patches available for and

Update Jan 2021

Writing up some notes on undo handling I noticed that my 19.3 v$undostat was reporting a maxqueryid of  g0t052az3rx44 that had spontaneously appeared with a maxquerylen of 42,000 seconds, but no previous appearances. The text of the query was another sys-recursive statement:

select name,intcol#,segcol#,type#,length,nvl(precision#,0),decode(type#,2,nvl(scale,-127/*MAXSB1MINAL*/),178,scale,179,scale,180,sca
le,181,scale,182,scale,183,scale,231,scale,0),null$,fixedstorage,nvl(deflength,0),default$,rowid,col#,property, nvl(charsetid,0),nvl
(charsetform,0),spare1,spare2,nvl(spare3,0), nvl(evaledition#,1),nvl(unusablebefore#,0),nvl(unusablebeginning#,0), case when (type#
in (1,8,9,96,112)) then nvl(collid, 16382) else 0 end case, nvl(collintcol#,0), nvl(acdrrescol#, 0), nvl(spare7, 0), nvl(spare9, 0),
 nvl(spare10, 0) from col$ where obj#=:1 order by intcol#

So the problem still hasn’t gone away yet.

September 10, 2019

Update restarts

Filed under: Infrastructure,Oracle,redo,undo — Jonathan Lewis @ 11:00 am BST Sep 10,2019

Somewhere I think I’ve published a note about an anomaly that’s been bugging me since at least Oracle 10g – but if it is somewhere on the Internet it’s hiding itself very well and I can’t find it, though I have managed to find a few scripts on my laptop that make a casual reference to the side effects of the problem. [Ed: a tweet from Timur Ahkmadeev has identified a conversation in the comments on an older post that could explain why I thought I’d written something about the issue.]

Anyway, I’ve decided to create some new code and write the article (all over again, maybe). The problem is an unexpected overhead that can appear when you do a simple but large update driving off a tablescan. [Ed: see Update 2021 – this is not restricted to an update by tablescan]

If you just want the conclusion and don’t feel like reading the detail, it’s this: a big update may roll itself back before it gets to the end of the tablescan and restart “invisibly”, doing a “select for update” (2nd tablescan, one row at a time) followed by the update (3rd tablescan, by array) massively increasing the redo and undo generated.

Here’s the script I’ve created to demonstrate the issue. As its header suggests the results I’ll be showing come from

rem     Script:         update_restart.sql
rem     Author:         Jonathan Lewis
rem     Dated:          Sept 2019
rem     Purpose:        
rem     Last tested 

create table t1 
pctfree 80 
with g as (
        select rownum id 
        from dual 
        connect by level <= 500 -- > comment to avoid wordpress format issue
        rownum id, 
        lpad(rownum,10) v1
from    g,g

                ownname     => null,
                tabname     => 'T1',
                method_opt  => 'for all columns size 1'

select  blocks, num_rows
from    user_tables 
where   table_name = 'T1'

I’ve created a table with 250,000 rows (500 * 500) and gathered stats on it (the table was 3,345 blocks) as gathering stats tends to sort out any issues of block cleanout that might complicate the patterns. Having done that I’m going to do an update of every row in the table:

alter system switch logfile;
execute snap_my_stats.start_snap

update t1 set id = id;

execute snap_my_stats.end_snap
execute dump_log

The update doesn’t actually change the column value (and in 12.2 there’s an interesting change to the Oracle code that comes into play in that case) but it will still generate redo and undo for every row in the table and update ITL entries and do all the rest of the work we normally associate with an update.

Before the update I’ve issued a “switch logfile” and after the update I’ve called a procedure I wrote to do a symbolic dump of the current log file (“alter system dump logfile”) (See footnote). The calls to the snap_my_stats package allow me to see the changes in the session activity stats due to the update. A simple check of these statistics can often give you a good idea of what’s going on. So you might ask yourself what you might expect to see in the stats from this update. (As Isaac Asimov once said: The most exciting phrase to hear in science, the one that heralds new discoveries, is not ‘Eureka!’, but ‘That’s funny …’)

Having updated 250,000 rows by tablescan I think we might expect to see a tablescan of 250,000 rows; we might also expect to see (in the worst case) something like 250,000 redo entries although we might expect to see far fewer since Oracle can probably do array updates and create one redo change vector and one undo record per block changed rather than per row changed.

So here are some of the more interesting numbers that we actually see (though the specific results are not 100% reproducible):

Name                                                                     Value
----                                                                     -----
session logical reads                                                  270,494
db block gets                                                          263,181
consistent gets                                                          7,313
db block changes                                                       534,032
free buffer requested                                                    7,274
switch current to new buffer                                             3,534
redo entries                                                           265,128
redo size                                                           83,803,996
rollback changes - undo records applied                                  1,002
table scan rows gotten                                                 522,470
table scan blocks gotten                                                 6,881 
HSC Heap Segment Block Changes                                         264,135
Heap Segment Array Updates                                              14,135

A pair of figures that stand out as most surprising are the tablescan figures – I’ve got a table of 3,345 blocks and 250,000 rows, but my tablescans total 6,881 blocks and 522,370 rows – I seem to have scanned the table (slightly more than) twice! At the same time I’ve requested 7,274 free buffers (some of them probably for undo blocks) and “switched current to new buffer” 3,534 times – so I seem to have read every table block then cloned it in memory.

Side note: the “switched current to new buffer” is a common feature of a tablescan update and not particularly relevant to this article. I assume the intent is to “leave behind” a read-consistent copy for any other queries that might have started before the big update.

Another really strange statistic is the one that says “rollback changes – undo records applied” – I’ve used 1,002 undo records in a rollback operation of some sort. I haven’t show you the stats for “user rollbacks” or “transaction rollbacks” because, despite the undo records being applied, there were no rollbacks reported.

What about changes? We see 534,000 “db block changes” – and maybe it’s okay for that statistic to be roughly twice the number of rows we expect to change because for every row we change we have to generate some undo, so may have to modify an undo block to create an undo record – but there are still 34,000 “db block changes” too many. The number of changes is fairly consistent with the number of “redo entries”, of course, allowing for a factor of 2 because redo entries often consist of a matching pair of changes, one for the “forward change” and one for the “undo block change”. Again, though, there a bit of an excess: the 34,000 extra “db block changes” seem to have resulted in an excess 15,000 “redo entries”. And we still have to ask: “Whatever happened to array processing ?”

In fact, we do see some array processing – we did 14,135 “Heap Segment Array Updates” which, at an average of about 18 rows per array would account for our 250,000 row table – but we also did 264,135 “HSC Heap Segment Block Changes”, and here’s a (“That’s funny …”) coincidence: 264,135 – 14,135 = 250,000: the number of rows in the table. Maybe that means something, maybe not. Another funny coincidence: 264,135 (HSC Heap Segment Block Changes) + 1,002 (rollback changes – undo records applied) = 265,137 which is remarkably close to the number of “redo entries”.

If, by now, you’re not beginning to suspect that something a little odd has happened you haven’t been reading enough Agatha Christie novels, or watching enough Sherlock Holmes (or House) videos.

I don’t always … dump trace files but when I do I like to go over the top. My update generated 83MB of redo so I made sure my log files were 100MB each, and when I dumped the current log file I got a trace file that was a little over 466MB and 11M lines. Clearly I had no intention of reading it all. A critical skill in reading ANY type of trace file is knowing in advance what you ought to be looking for and working out a painless way of finding it. (Practising by slugging your way through a very small example is a good starting point.)

In this case I started by using grep to extract all the lines containing “OP:”, stripping them down to just the “OP:” followed by the redo op code, and then sorting and summarising the use of op codes. Here (with some annotations) is what I got:

grep OP or32_ora_14652.trc | sed 's/.*OP/OP/' | sed 's/ .*$//' | sort | uniq -c | sort -n

      1 OP:11.3
      2 OP:10.4
      2 OP:11.5
      3 OP:23.1
      3 OP:4.1
      4 OP:14.4
      4 OP:22.5
      5 OP:13.22
      8 OP:11.2
      8 OP:22.2
     13 OP:10.2
     25 OP:5.4
    119 OP:5.11         Mark undo as applied during rollback
    883 OP:5.6          Applying undo - NB 883 + 119 = 1002
   3776 OP:5.2          Get undo segment header
  15139 OP:11.19        Array update of table
 250000 OP:11.4         Lock table row piece
 264190 OP:5.1          Update undo block

The interesting point here is that I seem to have locked every single row in the table separately, but I’ve also done array updates on the table, and I’ve done some rolling back. So I need to do a more detailed check to see when things happened. So let’s begin by examining a few thousand lines near the start of the trace:

head -10000 or32_ora_14652.trc | grep -n OP | sed 's/CON_ID.*OP/ OP/' | sed 's/ ENC.*$//'

68:CHANGE #1  OP:11.19
214:CHANGE #2  OP:5.2
217:CHANGE #3  OP:11.19
363:CHANGE #4  OP:11.19
509:CHANGE #5  OP:5.2
512:CHANGE #6  OP:11.19
658:CHANGE #7  OP:11.19
692:CHANGE #8  OP:11.19
838:CHANGE #9  OP:11.19
984:CHANGE #10  OP:11.19
1130:CHANGE #11  OP:11.19
1276:CHANGE #12  OP:11.19
1422:CHANGE #13  OP:11.19
1568:CHANGE #14  OP:11.19
1714:CHANGE #15  OP:5.2
1717:CHANGE #16  OP:11.19
1863:CHANGE #17  OP:11.19
2009:CHANGE #18  OP:11.19
2155:CHANGE #19  OP:11.19
2301:CHANGE #20  OP:11.19
2447:CHANGE #21  OP:11.19
2593:CHANGE #22  OP:11.19
2739:CHANGE #23  OP:11.19
2885:CHANGE #24  OP:5.2
2888:CHANGE #25  OP:11.19
3034:CHANGE #26  OP:11.19
3180:CHANGE #27  OP:5.1
3336:CHANGE #28  OP:5.1
3489:CHANGE #29  OP:5.1
3642:CHANGE #30  OP:5.1
3795:CHANGE #31  OP:5.1
3836:CHANGE #32  OP:5.1
3989:CHANGE #33  OP:5.1
4142:CHANGE #34  OP:5.1
4295:CHANGE #35  OP:5.1
4448:CHANGE #36  OP:5.1
4601:CHANGE #37  OP:5.1
4754:CHANGE #38  OP:5.1
4907:CHANGE #39  OP:5.1
5060:CHANGE #40  OP:5.1
5213:CHANGE #41  OP:5.1
5366:CHANGE #42  OP:5.1
5519:CHANGE #43  OP:5.1
5672:CHANGE #44  OP:5.1
5825:CHANGE #45  OP:5.1
5978:CHANGE #46  OP:5.1
6131:CHANGE #47  OP:5.1
6284:CHANGE #48  OP:5.1
6440:CHANGE #1  OP:5.1
6593:CHANGE #2  OP:11.19
6742:CHANGE #1  OP:5.1
6895:CHANGE #2  OP:11.19
7044:CHANGE #1  OP:5.1
7197:CHANGE #2  OP:11.19
7346:CHANGE #1  OP:5.1
7499:CHANGE #2  OP:11.19
7648:CHANGE #1  OP:5.1
7801:CHANGE #2  OP:11.19
7950:CHANGE #1  OP:5.2
7953:CHANGE #2  OP:5.1
8106:CHANGE #3  OP:11.19
8255:CHANGE #1  OP:5.1
8408:CHANGE #2  OP:11.19
8557:CHANGE #1  OP:5.1
8710:CHANGE #2  OP:11.19
8859:CHANGE #1  OP:5.1
9012:CHANGE #2  OP:11.19
9161:CHANGE #1  OP:5.1
9314:CHANGE #2  OP:11.19
9463:CHANGE #1  OP:5.1
9616:CHANGE #2  OP:11.19
9765:CHANGE #1  OP:5.1
9918:CHANGE #2  OP:11.19

Everything up to line 6284 (change# 48) is one big “private redo” record where we do several “11.19 (table array updates)” followed by a matching group of “5.1 (update undo block)”, before switching to the “public redo” mechanism used by long transactions where each redo record is a matched pair of “backward” and “forward” change vectors, i.e. one 5.1 with a matching 11.19.

So we start by doing array updates without doing any single row locking – so I want to know when the array updates stop. Here’s a small but continuous piece from the next grep I ran (which produced an output totalling 15,000 lines):

grep -n "OP:11.19" or32_ora_14652.trc | sed 's/:.*OP/:-OP/' | sed 's/ .*$//' 
7461225:-OP:11.19    *** Big jump

The 11.19 opcodes were appearing roughly every 300 lines of trace, except they stopped after 2,004 occurrences then disappeared for about 7M lines before reappering and running evenly (ca. every 300 lines) to the end of the trace file. Clearly we need to know what happened somewhere around lines 447200 and 7461225. But before looking too closely, let’s find out when the “11.4 (lock row piece)” and “5.6 (Apply undo)” start to appear.

grep -n "OP:11.4" or32_ora_14652.trc | sed 's/:.*OP/:-OP/' | sed 's/ .*$//' 

... lots of evenly spread 11.4, every 28 lines

Note where the 11.4 opcodes begin and end – they fit exactly into the gap between the two sets of 11.19 opcodes. And where do the 5.6 op codes come into play ?

grep -n "OP:5.6" or32_ora_14652.trc | sed 's/:.*OP/:-OP/' | sed 's/ .*$//'

... a few hundred appearances of 5.6, common spaced by 155 lines 

That was the entire set of “5.6 Apply undo” records – a short burst of activity which finishes just before the big gap in the 11.19 opcodes . This leads to the question: “does anything difference happen around the time that the 5.6 op codes start to appear?”  Time to look at a very small extract from a very big grep output:

grep -n "OP" or32_ora_14652.trc | sed 's/ CON_ID.*OP/:-OP/' | sed 's/ ENC.*$//' 
294895:CHANGE #1:-OP:5.1
295048:CHANGE #2:-OP:11.19

295197:CHANGE #1:-OP:5.1
295322:CHANGE #2:-OP:11.19

295443:CHANGE #1:-OP:5.1
295596:CHANGE #2:-OP:11.19

295745:CHANGE #1:-OP:22.2

295753:CHANGE #1:-OP:22.5

295760:CHANGE #1:-OP:14.4
295766:CHANGE #2:-OP:22.2

295773:CHANGE #1:-OP:11.19
295919:CHANGE #2:-OP:5.6

295928:CHANGE #1:-OP:11.19
296046:CHANGE #2:-OP:5.6

296055:CHANGE #1:-OP:11.19
296201:CHANGE #2:-OP:5.6


I’ve inserted blank lines between redo records in this tiny extract from around the point where the 5.6 opcodes appear. The first few records show Oracle doing the update, then there’s a little bit of space management (Op codes in the 22 and 14 layer) and then Oracle starts rolling back. (Note: whether updating or rolling back Oracle is applying “11.19 (table array updates)” vectors, but “5.6 (undo applied)” vectors appear as change vector #2 in the redo record as we roll back while “5.1 (create undo)” vectors appear as change vector #1 as we make forward changes).

If we count the number of 11.19 opcodes up to the point where the 11.4 opcodes appear we find that we’ve used 1,002 to update the table, then another 1,002 to rollback the update.


Without going into very fine detail examining individual redo change vectors, what we’ve seen is as follows:

  • On executing the very large update the session activity statistics show the number of rows accessed by tablescan was more than twice the expected number. This obvious oddity made me look more closely and notice the surprising “rollback changes – undo records applied” which just shouldn’t have been there at all, and this led me to look more closely at the number of redo entries which averaged more than one per row when it should have been a much smaller number due to Oracle’s internal array update mechanism.
  • Extracting just the redo op codes from a dump of the redo log generated during this update we’ve extracted summary information that the work we’ve done includes: (a) locking every single row in the table separately, (b) performing a load of table array updates, (c) applying a significant number of undo records to rollback changes.
  • Digging in for just a little more detail we see (in order of activity):
    • A thousand array updates  (roughly 20,000 rows by tablescan – I did actually check a few redo vectors in detail, 20 rows was a common array size)
    • The thousand updates being rolled back (luckily a long time before we got near the end of the first tablescan)
    • 250,000 rows locked individually (250,000 rows by tablescan – the 2nd tablescan) (NB each LKR – lock row – undo record is 68 bytes of undo)
    • 13,000 array updates   (250,000 rows by tablescan – the third tablescan) – at roughly 100 bytes per row of “overhead” in the undo

I don’t know why this happens – and from memory I think it may have started happening in 10g. I have an idea it relates to changes in the global SCN as the update takes place and something (possibly related to space management updates) making the update think it has to go into “write consistent” mode as its “start SCN” gets too far away from the current SCN.

The bottom line on this, though, is that there may be cases where a large, tablescan-driven, update could actually do the tablescan three times, and generate far more undo and redo than it should. In theory I could have completed the update in about 13,000 pairs of (11.19, 5.1) change vectors. In practice this simple test produced far more work.

And it gets worse …

Update – prompted by a tweet from Tanel Poder

In one of my tests I had added the /+ monitor */ hint to see if the activity would be reported correctly. It didn’t look quite right – but what I hadn’t noticed was that the monitor reported the statement as executing 3 times (and, of course, the tablescan executing 3 times):

Global Information
 Status              :  DONE
 Instance ID         :  1
 Session             :  TEST_USER (14:13842)
 SQL ID              :  9a166bpd7455j
 SQL Execution ID    :  16777216
 Execution Started   :  09/10/2019 09:41:54
 First Refresh Time  :  09/10/2019 09:41:54
 Last Refresh Time   :  09/10/2019 09:42:00
 Duration            :  6s
 Module/Action       :  MyModule/MyAction
 Service             :  orcl
 Program             :  sqlplus@vbgeneric (TNS V1-V3)

Global Stats
| Elapsed |   Cpu   |    IO    |  Other   | Buffer | Read | Read  |
| Time(s) | Time(s) | Waits(s) | Waits(s) |  Gets  | Reqs | Bytes |
|    5.84 |    5.77 |     0.01 |     0.06 |     1M |   42 |  26MB |

SQL Plan Monitoring Details (Plan Hash Value=2927627013)
| Id |      Operation       | Name |  Rows   | Cost |   Time    | Start  | Execs |   Rows   | Read | Read  | Activity | Activity Detail |
|    |                      |      | (Estim) |      | Active(s) | Active |       | (Actual) | Reqs | Bytes |   (%)    |   (# samples)   |
|  0 | UPDATE STATEMENT     |      |         |      |         4 |     +3 |     3 |        0 |      |       |          |                 |
|  1 |   UPDATE             | T1   |         |      |         6 |     +1 |     3 |        0 |      |       |   100.00 | Cpu (6)         |
|  2 |    TABLE ACCESS FULL | T1   |    250K |  432 |         4 |     +3 |     3 |     698K |   42 |  26MB |          |                 |

As you see from Tanel’s tweet, he has a little script you can use to scan through the library cache to see if any (big) updates have suffered from this problem.


I couldn’t quite manage to leave this problem alone, so I’ve run the test and dumped the redo log three more times and there’s an interesting coincidence. Every time I ran the test the switch from updating to rolling back had these 4 (annotated) redo opcodes separating them:

158766:CHANGE #1:-OP:22.2        ktfbhredo - File Space Header Redo
158774:CHANGE #1:-OP:22.5        ktfbbredo - File BitMap Block Redo:
158781:CHANGE #1:-OP:14.4        kteop redo - redo operation on extent map
158787:CHANGE #2:-OP:22.2        ktfbhredo - File Space Header Redo:

The group represents some allocation of space from a file to an object – so the question is: what object gets an extent map update. Here’s the full redo change vector for the OP:14.4

REDO RECORD - Thread:1 RBA: 0x001cdd.000028a4.00d8 LEN: 0x00b0 VLD: 0x01 CON_UID: 0
SCN: 0x0b86.0db28b4c SUBSCN: 88 09/10/2019 15:06:13
CHANGE #1 CON_ID:0 TYP:0 CLS:29 AFN:2 DBA:0x00800688 OBJ:4294967295 SCN:0x0b86.0db28b4c SEQ:10 OP:14.4 ENC:0 RBL:0 FLG:0x0000
kteop redo - redo operation on extent map
   ADD: dba:0x802380 len:128 at offset:6
  ADDRET: offset:6 ctime:0
   SETSTAT: exts:9 blks:911 lastmap:0x0 mapcnt:0
   UPDXNT: extent:6 add:TRUE

The “ADD” is adding extent id 6, of 128 blocks, at file 2 block 9088. That happens to be the undo segment which my update statement is using. Several more 14.4 change vectors appear throughout the trace file taking the size of the undo segment up to 35 extents, but in the (small number of ) tests I did it was always the first 14.4 that seemed to trigger the rollback and restart.

Preliminary Hypothesis

When the transaction does sufficient work that it forces its undo segment to extend some feature of the resulting recursive transaction causes the transaction to rollback and restart. [ed: confirmed in the posts listed by Mikhail Velikikh in comment #1 below]

This does fit with the observation that the number of rows updated before the rollback/restart occurs varies apparently at random (thanks to the variation in current free space in the transaction’s undo segments). It also fits the observation that an update that doesn’t rollback and restart in 12.1 does rollback and restart in 12.2 because it gets a bit unlucky using the “single row locking” optimisation for “nochange update” optimisation.


The step to “alter system dump logfile” is not allowed inside a PDB, so as a general pattern I now capture the current SCN before and after an operation that I’m investigating and then “alter system dump redo scn min XXX scn max YYY”.

Update Aug 2021

Following a tweet from Andy Sayer, I realised that I should have updated an important detail on this note. I had only ever noticed this restart anomaly on large updates driven by tablescans and I hadn’t looked to see if it appeared in other circumstances. Given the final explanation from Mikhail Velikikh in the posts listed in comment #1 below I should have highlighted the fact that it’s not about the tablescan, it’s about the state of the undo segment when the transaction starts.

To confirm this (for my own benefit) I’ve run a modified version of the tests on Oracle and produced the same effect with a large, index-driven, update on the table. All I needed to do to create the indexed version of the test was add a primary key contraint to the table, and change the update statement to update the non-key column v1 while driving the access path through the primary key index, viz:

create  unique index t1_pk on t1(id);
alter table t1 add constraint t1_pk primary key(id);


update  /*+
        */ t1
set     v1 = lpad(rownum,10,'0')
where   id > 0

The pattern and content of the redo log showed the same rollback and restart; the SQL trace reported ORA-01551; and the SQL Monitor report claimed 3 Execs and 520K rows processed (a lucky “improvement” over the tablescan test). You should expect the same behaviour from deletes and merge commands with update clauses.


February 28, 2018

255 Columns

Filed under: Infrastructure,Oracle,Troubleshooting,undo — Jonathan Lewis @ 12:27 pm GMT Feb 28,2018

It’s the gift that keeps on giving – no matter how many problems you find there are always more waiting to be found. It’s been some time since I last wrote about tables with more than 255 columns, and I said then that there was more to come. In the last article I described how adding a few columns to a table, or updating a trailing column in a way that made the table’s used column count exceed 255, could result in some strange row-splitting behaviour – in this article I’m going to look at a critical side effect of that behaviour.

We’ll start with a simple model and a question – I’ll create a table with a thousand rows of data, then I’ll add a few columns to that table and update the last column in every row and check the undo and redo generated by the update.  Eventually I’m going to need a table with more than 255 columns so the script to create the table is rather long and I’ve posted it at the end of the article in case you want to experiment – in the following text I’ve omitted a couple of hundred lines each of column declarations and values.

rem     Script:         wide_table_6.sql
rem     Author:         Jonathan Lewis
rem     Dated:          Feb 2018

create table t1(
with generator as (
                rownum id
        from dual
        connect by
                level <= 1e3 -- > comment to avoid WordPress format issue
        generator       v2
        rownum <= 1e4 -- > comment to avoid WordPress format issue

                ownname     => user,
                tabname     => 'T1',
                method_opt  => 'for all columns size 1'

alter table t1 add(
        col250 varchar2(10),
        col251 varchar2(10),
        col252 varchar2(10),
        col253 varchar2(10),
        col254 varchar2(10),
        col255 varchar2(10),
        col256 varchar2(10),
        col257 varchar2(10),
        col258 varchar2(10),
        col259 varchar2(10)

update t1 set col259 = lpad('259',10,'0');

The table I’ve created has columns named from col000 to col249 populated with a text matching the column’s numeric id – a total of 250 columns, except that for the first part of the test I’ve commented out the creation and population of 10 of those columns, giving me a total of 240 columns. Then I’ve added 10 more columns and updated the last of those 10. So, for the first part of the test, I’ve grown my table from 240 used columns to 250 used columns. Here are a few critical stats for that update from the session statistics view (v$sesstat joined to v$statname):


Name                                                 Value
----                                                 -----
db block gets                                        1,023
consistent gets                                        555
db block changes                                     2,012
redo entries                                           907
redo size                                          262,692
undo change vector size                             76,052
table scan rows gotten                               1,002
table scan blocks gotten                               501
HSC Heap Segment Block Changes                       1,000

That’s pretty much the sort of thing we might expect. For a small update to a row it’s reasonable to see 250 – 300 bytes of redo of which about half is the redo for the undo. We can see that we’ve scanned 1,000 rows and made 2,000 block changes (one to the table block and one to an undo block for each row in the table). The table was 500 blocks of data (the avg_row_len is about 2640 bytes from 240 columns at 10 bytes + a length byte, which makes two rows per block with lots of spare space at the default 10 pctfree). You might been expecting the number of redo entries to be a close match to the number of rows but it’s a little short because the redo records for the first few updates would have been stored in private redo and turned into a single large redo entry.

So what do the stats look like when we start with 250 columns and grow to 260 columns, breaking through the dreaded 255 barrier ?

Name                                                 Value
----                                                 -----
db block gets                                        9,503
consistent gets                                      1,894
db block changes                                     9,384
redo size                                        8,110,584
redo entries                                         5,778
undo change vector size                          3,780,260
table scan rows gotten                               1,002
table scan blocks gotten                               501
HSC Heap Segment Block Changes                       3,000

Such a simple change – with such a large impact.

The average undo is now nearly 3.5KB per row (and the rows were only about 2,860 bytes each anyway), the number of redo entries is up to 6 times the original, we’re averaging 3 “HSC Heap Segment Block Changes” per row instead of 1 and in total we’ve managed to do an extra 7,000 db block changes overall.

To get an idea of what was going on behind the scenes I dumped the redo log file. For the first part of the test most of the redo entries consisted of a pair of redo change vectors with OP codes 5.1 (modify undo block) and 11.5 (update row piece). The 5.1 corresponded to the undo needed to reverse out the effect of its paired 11.5 and both change vectors were pretty small.

For the second part of the test I found a frequently repeated sequence of three consecutive redo records of paired redo vectors: (5.1, 11.4), (5.1, 11.2) and (5.1, 11.6). Again each 5.1 corresponds to the undo needed to reverse out the effect of its associated 11.x, and in this case the three “table” (level 11) OP codes are, in order: “lock row piece”, “insert row piece”, “overwrite row piece”. These 3 pairs occured 1,000 times each, accounting for roughly 3,000 of the redo entries reported.

On top of this there were 2,500 redo records holding redo change vectors with op code 13.22 (“State change on level 1 bitmap block”), and a few hundred , with op code 13.21 (“Format page table block”) with just a few others bringing the total up to the 5,800 redo entries reported. (You might note that the last OP code is a clue that we added quite a lot of new blocks to the table as we executed the update – in fact the number of used table blocks grew by about 50%.)

We also had 500 redo change vectors of type 5.2 (“get undo header”). This number was significantly more than in the first part of the test because we had a lot more undo block changes (with bigger undo change vectors) which means we used a lot more undo blocks, and each time we move to a new undo block we update our transaction slot in the undo header with the current undo block address. I mentioned the pairing of 5.1 and 11.6 above – in fact 50% of those records recorded just two changes (5.1, 11.6) the other 50% recorded three changes (5.2, 5.1, 11.6) – in effect every other row update resulted in Oracle demanding a new undo block.

I’ll explain in a moment why we have a very large number of “State change on level 1 bitmap block”; first let’s examine the key steps of how Oracle is processing a single row update – the sequence of 11.4, 11.2, 11.6:

  • 11.4: lock the head row piece – this seems to happen when the row is in (or going to turn into) multiple pieces; presumably because the piece that is going to change might not be the head piece. This is a tiny operation that generates undo and redo in the order of tens of bytes.
  • 11.2: our code extends the row to 260 columns, which means Oracle has to split it into two pieces of 5 columns and 255 columns respectively – so one of those row-pieces has to cause an insert to take place. Inserting a row may require a lot of redo, of course, but the undo for a (table-only) insert is, again, tiny.
  • 11.6: When Oracle has to split a wide row (>255 columns) it counts columns from the end, so the first row piece of our row is 5 columns and the second row piece (which is the one inserted by the 11.2 operation) is 255 columns. This means we have to overwrite the original row piece with a much shorter row piece. So we’re replacing 2,750 bytes (250 columns) with 55 bytes (5 columns), which means we have to write the contents of the “deleted” 250 columns to the undo tablespace – and that’s where most of the huge volume of undo comes from.

There are two extra points to note about the way Oracle handles the insert/overwrite steps. The length of our row exaggerates the effect, of course, but when we insert the ending 255 columns the block they go to is probably going to change from nearly empty to half full, or half full to full – which means its bitmap entry has to be updated; similarly when the initial 250 columns is overwritten with just 5 columns a huge amount of free space will appear in the block which, again, means that the block’s bitmap entry has to be updated. This gives us a good idea of why we see so many 13.22 (“L1 bitmap state change”)redo change vectors.

The second point is that the numbers still don’t add up. Allowing a couple of hundred bytes of undo per row for the lock row and insert row, then 2,750 plus a few more for the overwrite, I’ve accounted for about 3,000 bytes per row updated – which leaves me short by about 800 bytes per row.  If I dump undo blocks I can see that the undo change vector for the overwrite is actually 3,628 bytes long rather than being “slightly over” the 2,750 for the before image of the overwritten row. Strangely I can see a couple of hundred bytes of what looks like damage in the row image, and there’s also a couple of hundred bytes of what looks like garbage (but probably isn’t) after the row image, but I’ve got no idea why there’s so much excess data in the record.

One day I (or someone else) may get around to looking very closely at why that particular undo record is a lot bigger than an observer might expect – but at least we can see that the space is there, and even if some of that volume could be made to disappear the problem of the massive undo relating to the overwrite and Oracle’s odd strategy of counting columns from the end is still going to be present, and there are probably some occasions when you need to know about this.


Once the number of columns in a table goes over 255 then a simple update to a “trailing” null column (i.e. one past the current end of the row) is likely to generate a much larger volume of undo and redo than you might expect. In particular the size of the undo record is likely to be equivalent to the volume of the last 255 columns of the table – and then a large fraction more.

The reason why this particular anomaly came to my attention is because a client had a problem upgrading an application that required them to add a few columns to a table and then update a couple of them. The size of the undo tablespace was 300 GB, the update (the only thing running at the time) should have added about 30 bytes to the length of each row, the update should have affected 250 million rows. The process crashed after a few hours “ORA-30036: unable to extend segment by %s in undo tablespace ‘%s'”. Even allowing for a “reasonable” overhead it seemed rather surprising that Oracle needed more than 1,200 bytes of undo space per row – but then I found the table was defined with 350 columns.

Solutions for this user:  it’s an upgrade that’s allowed to take quite a long time, and the nature of the update is such that it would be possible to update in batches, committing after each batch.  It would also be nice to review how the table was used to see if it could be rebuilt with a different column order to move all the unused columns to the end of the row – with a little luck the result table might find almost all the rows fitting into a single row piece, even after the upgrade.


If you want to experiment, here’s the whole script to create the table, insert some rows, then add a few more columns and update one of them. It’s very long, and not in the least bit exciting, but it may save you a little typing time if you want to use it.

create table t1(
with generator as (
                rownum id
        from dual 
        connect by 
                level <= 1e3 -- > comment to avoid WordPress format issue
        generator       v2
        rownum <= 1e4 -- > comment to avoid WordPress format issue

                ownname     => user,
                tabname     => 'T1',
                method_opt  => 'for all columns size 1'

        avg_row_len, num_rows,  blocks,
        num_rows / trunc(8000/avg_row_len) estimated_blocks
        table_name = 'T1'

prompt  =================
prompt  Add a few columns
prompt  =================

alter table t1 add(
        col250 varchar2(10),
        col251 varchar2(10),
        col252 varchar2(10),
        col253 varchar2(10),
        col254 varchar2(10),
        col255 varchar2(10),
        col256 varchar2(10),
        col257 varchar2(10),
        col258 varchar2(10),
        col259 varchar2(10)

-- alter system switch logfile;

update t1 set col259 = lpad('259',10,'0');

-- execute dump_log

P.S. if you do investigate and solve the question of the excess space in the undo record, and the odd content in the row “before image” then do let me know. (Hint: part of the excess may be a “null columns” map – but that still leaves plenty to account for.)

April 21, 2017

Undo Understood

Filed under: Infrastructure,Oracle,Read Consistency,Troubleshooting,undo — Jonathan Lewis @ 10:45 am BST Apr 21,2017

It’s hard to understand all the ramifications of Oracle’s undo handling, and it’s not hard to find cases where the resulting effects are very confusing. A recent post on the OTN database forum resulted in one response insisting that the OP was obviously updating a table with frequent commits from one session while querying it from another thereby generating a large number of undo reads in the querying session.

That’s a possible cause of the symptoms that had been described – although not the only possible cause, especially since the symptoms hadn’t been described completely. It’s actually possible to see this type of activity when there are no updates to, and no outstanding commits on, the target table. Unfortunately it’s quite hard to demonstrate this with a quick, simple, script in recent versions of Oracle unless you do some insanely stupid things to make the problem appear – but I’m pretty good at doing “insanely stupid” with Oracle so here we go; first some data creation:

rem     Script:         undo_rec_apply_2.sql
rem     Author:         Jonathan Lewis
rem     Dated:          March 2017

create table t2(v1 varchar2(100));
insert into t2 values(rpad('x',100));

create table t1
pctfree 99 pctused 1
with generator as (
                rownum id
        from dual
        connect by
                level <= 1e4
        cast(rownum as number(8,0))                     id,
        cast(lpad(rownum,10,'0') as varchar2(10))       v1,
        cast(lpad('x',100,'x') as varchar2(100))        padding
        generator       v1,
        generator       v2
        rownum <= 8e4 -- > comment to bypass WordPress formatting issue

alter table t1 add constraint t1_pk primary key(id)

                ownname          => user,
                tabname          =>'T1',
                method_opt       => 'for all columns size 1'

The t2 table is there as a target for a large number of updates from a session other than the one that’s going to display the problem. The t1 table has been defined and populated in a way that puts one row into each of 80,000 blocks (though, with ASSM and my specific tablespace definition of uniform 1MB extents, the total space allocation is about 80,400 blocks). I’ve got a primary key declaration that allows me to pick single rows/blocks from the table if I want to.

At this point I’m going to do a lot of updates to the big (t1) table using a hugely inefficient strategy to emulate the type of thing that can happen on a very large table with lots of random updates and many indexes to maintain:

        for i in 1..800 loop
                update t1 set v1 = upper(v1) where id = 100 * i;
                execute immediate 'alter system switch logfile';
                execute immediate 'alter system flush buffer_cache';
        end loop;

set transaction read only;

I’m updating every 100th row/block in the table with single row commits, but before each commit I’m switching log files and flushing the buffer cache. Because of the called to dbms_lock.sleep() the code will take something like 8 seconds to complete.


Warning: This is NOT an experiment to try on a production system, or even a development system if there are lots of busy developers or testers around – and if you’re running your dev/test in archivelog mode (which, for some of your systems you should be’ doing) you’re going to end up with a lot of (small) archived redo logs. I have to do this switch to ensure that the updated blocks are unpinned so that they will be written to disc and flushed from the cache by the flush buffer cache. (This extreme approach would not have been necessary in earlier versions of Oracle but the clever developers at Oracle Corp. keep adding “damage limitation” touches to the code – which I then have to work around to cause damage with small test cases.) Because the block has been flushed from memory before the commit the session will increment the session/instance activity statistic “commit cleanout failures: block lost” on each commit. By the time this loop has run to completion there will be 800 blocks from the table on disc needing a “delayed block cleanout”.

Despite the extreme brute force I use in this loop, there is a further very important detail that has to be set before this test will work (at least in, which is what I’ve used in my test runs). I had to start the database with the hidden parameter _db_cache_pre_warm set to false. If I don’t have the database started with this feature disabled Oracle would notice that the buffer cache had a lot of empty space and would “pre-warm” the cache by loading a few thousand blocks from t1 as I updated one row – with the side effect that the update from the previous cycle of the loop would be cleaned out on the current cycle of the loop. If you do run this experiment, remember to reset the parameter and restart the instance when you’ve finished.

I’ve finished this chunk of code with a call to “set transaction read only” – this emulates the start of a long-running query: it captures a point in time (through the current SCN) and any queries that run in the session from now on have to be read-consistent with that point in time. After doing this I need to use a second session to do a bit of hard work – in my case the following:

execute snap_rollstats.start_snap

        for i in 1..10000 loop
                update t2 set v1 = upper(v1);
                update t2 set v1 = lower(v1);
        end loop;

execute snap_rollstats.end_snap

The calls to the snap_rollstats package simply read v$rollstat and give me a report of the changes in the undo segment statistics over the period of the loop. I’ve executed 10,000 transactions in the interval, which was sufficient on my system to use each undo segment header at least 1,000 times and (since there are 34 transaction table slots in each undo segment header) overwrite each transaction table slot about 30 times. You can infer from these comments that I have only 10 undo segments active at the time, your system may have many more (check the number of rows in v$rollstat) so you may want to scale up that 10,000 loop count accordingly.

At this point, then, the only thing I’ve done since the start of my “long running query” is to update different table from different session. What happens when I do a simple count() from t1 that requires a full tablescan ?

alter system flush buffer_cache;

execute snap_filestat.start_snap
execute snap_my_stats.start_snap

select count(v1) from t1;

execute snap_my_stats.end_snap
execute snap_filestat.end_snap

I’ve flushed the buffer cache to get rid of any buffered undo blocks – again an unreasonable thing to do in production but a valid way of emulating the aging out of undo blocks that would take place in a production system – and surrounded my count() with a couple of packaged call to report the session activity stats and file I/O stats due to my query. (If you’re sharing your database then the file I/O stats will be affected by the activity of other users, of course, but in my case I had a private database.)

Here are the file stats:

Datafile Stats
file#       Reads      Blocks    Avg Size   Avg Csecs     S_Reads   Avg Csecs     M_Reads   Avg Csecs    Max      Writes      Blocks   Avg Csecs    Max
File name
-----       -----      ------    --------   ---------     -------   ---------     -------   ---------    ---      ------      ------   ---------    ---
    1          17          17       1.000        .065          17        .065           0        .000      6           0           0        .000     15
    3         665         665       1.000        .020         665        .020           0        .000      6           0           0        .000     15
    5         631      80,002     126.786        .000           2        .045         629        .000      6           0           0        .000     17

As expected I’ve done a number of multiblock reads of my data tablespace for a total of roughly 80,000 blocks read. What you may not have expected is that I’ve done 665 single block reads of the undo tablespace.

What have I been doing with all those undo blocks ? Check the session activity stats (restricted here to a critical pair):

Session stats
Name                                                                     Value
----                                                                     -----
transaction tables consistent reads - undo records applied              10,014
transaction tables consistent read rollbacks                                10

We’ve been reading undo blocks so that we can create read-consistent copies of the 10 undo segment headers that were active in my instance. We haven’t (and you’ll have to trust me on this, I can’t show you the stats that aren’t there!) reported any “data blocks consistent reads – undo records applied”.

If you want to see a detailed explanation of what has happened you’ll need to read Oracle Core (UK source), chapter 3 (and possibly chapter 2 to warm yourself up for the topic). In outline the following type of thing happens:

  • Oracle gets to the first block updated in t1 and sees that there’s an ITL (interested transaction list) entry that hasn’t been marked as committed (we flushed the block from memory before the commit cleanout could take place so the relevant transaction is, apparently, still running and the row is still marked as locked).
  • Let’s say the ITL entry says the transaction was for undo segment 9, transaction table slot 11, sequence 999. Oracle reads the undo segment header block for undo segment 9 and checks transaction table slot 11, which is now at sequence 1032. Oracle can infer from this that the transaction that updated the table has committed – but can’t yet know whether it committed before or after the start of our “long running query”.
  • Somehow Oracle has to get slot 11 back to sequence 999 so that it can check the commit SCN recorded in the slot at that sequence number. This is where we see “undo records applied” to make the “transaction table read consistent”. It can do this because the undo segment header has a “transaction control” section in it that records some details of the most recent transaction started in that segment. When a transaction starts it updates this information, but saves the old version of the transaction control and the previous version of its transaction table slot in its first undo record, consequently Oracle can clone the undo segment header block, identify the most recent transaction, find its first undo record and apply the bit of the undo that unwinds the transaction table information. As it does so it has also wound the transaction control section backwards one step, so it can use that (older) version to go back another step … and so on, until it takes the cloned undo segment header so far back that it takes our transaction table slot back to sequence 999 – and the job is done, we can now check the actual commit SCN.  (Or, if we’re unlucky, we might receive an ORA-01555 before we get there).

So – no changes to the t1 table during the query, but lots of undo records read because other tables had been changed by other sessions.


In my example the tablescan used direct path reads – so the blocks that went through delayed block cleanout were in private memory, which means they weren’t in the buffer cache and didn’t get written out to disc. When I flushed the buffer cache (again to emulate aging out of undo blocks etc.) and repeated the tablescan Oracle had to go through all that work of creating read consistent transaction tables all over again.

Footnote 2:

Despite the grand claim implied by the title I hope everyone realises that this is just one little detail of how undo handling can interfere with their expectations of performance in extreme cases.


August 25, 2015


Filed under: Infrastructure,Oracle,redo,undo — Jonathan Lewis @ 8:39 am BST Aug 25,2015

The old question about truncate and undo (“does a truncate generate undo or not”) appeared on the OTN database forum over the week-end and then devolved into “what really happens on a truncate” and then carried on.

The quick answer to the traditional question is essentially this: the actual truncate activity typically generates very little undo (and redo) compared to a full delete of all the data because all it does is tidy up any space management blocks and update the data dictionary; the undo and redo generated is only about the metadata, not about the data itself.

Of course, a reasonable response to the quick answer is: “how do you prove that?” – so I suggested that all you had to do was “switch logfile, truncate a table, dump logfile”. Unfortunately I realised that I had never bothered to do this myself and, despite having far more useful things to do, I couldn’t resist wasting some of my evening doing it. Here’s the little script I wrote to help

create table t2 (v1 varchar2(32));
insert into t2 values (rpad('A',32));

create table t1
with generator as (
        select  --+ materialize
                rownum id
        from dual
        connect by
                level <= 1e4 -- > comment to avoid wordpress format issue
        rownum                  id, 
        rpad('x',100)           padding
        generator       v1,
        generator       v2
        rownum <= 1e5 -- > comment to avoid wordpress format issue

create index t1_i1 on t1(id);
alter system flush buffer_cache;
execute dbms_lock.sleep(3)

alter system switch logfile;

insert into t2 values(rpad('X',32));

truncate table t1;

insert into t2 values(rpad('Y',32));

execute dump_log

Procedure dump_log simply dumps the current log file. The call to switch logfile keeps the dumped log file as small as possible; and I’ve flushed the buffer cache with a three second sleep to minimise the number of misleading “Block Written Record” entries that might otherwise appear in the log file after the truncate. There were all sorts of interesting little details in the resulting activity when I tested this on – here’s one that’s easy to spot before you even look at the trace file:

SQL> select object_id, data_object_id, object_name from user_objects where object_name like 'T1%';

---------- -------------- --------------------
    108705         108706 T1_I1
    108704         108707 T1

Notice how the data_object_id of the index is smaller than that of the table after the truncate ? Oracle truncates (and renumbers) the index before truncating the table.

The truncate activity was pretty much as I had assumed it would be – with one significant variation. The total number of change vectors reported was 272 in 183 redo records (your numbers may vary slightly if you try to reproduce the example), and here’s a summary of the redo OP codes that showed up in those change vectors in order of frequency:

Change operations
  1 OP:10.25    Format root block
  1 OP:11.11    Insert multiple rows (table)
  1 OP:24.1     DDL
  1 OP:4.1      Block cleanout record
  2 OP:10.4     Delete leaf row
  2 OP:13.28    HWM on segment header block
  3 OP:10.2     Insert leaf row
  3 OP:17.28    standby metadata cache invalidation
  4 OP:11.19    Array update (index)
  4 OP:11.5     Update row (index)
 10 OP:13.24    Bitmap Block state change (Level 2)
 11 OP:23.1     Block written record
 12 OP:14.1     redo: clear extent control lock
 12 OP:22.5     File BitMap Block Redo
 14 OP:14.2     redo - lock extent (map)
 14 OP:14.4     redo - redo operation on extent map
 14 OP:5.4      Commit / Rollback
 15 OP:18.3     Reuse record (object or range)
 15 OP:22.16    File Property Map Block (FPM)
 22 OP:13.22    State on Level 1 bitmap block
 24 OP:22.2     File Space Header Redo
 29 OP:5.2      Get undo header
 58 OP:5.1      Update undo block

The line that surprised me was the 14 commit/rollback OP codes – a single truncate appears to have operated as 14 separate (recursive) transactions. I did start to walk through the trace file to work out the exact order of operation but it’s a really tedious and messy task so I just did a quick scan to get the picture. I may have made a couple of mistakes in the following but I think the steps were:

  • Start transaction
  • Lock the extent map for the INDEX segment — no undo needed
  • Lock each bitmap (space management) block  — no undo needed
  • Reset each bitmap block — undo needed to preserve space management information
  • Reset highwater marks where relevant on bitmap and segment header block — undo needed
  • Clear segment header block — undo needed
  • Write all the updated space management blocks to disc (local write waits)
    • Log file records “Block Written Record”.
  • For each space management block in turn
    • Update space management blocks with new data object_id — undo needed
    • Write the updated block to disc (local write wait)
    • Log file records one “Block Written Record” for each block
  • Repeat all the above for the TABLE segment.
  • Start a recursive transaction
    • Insert a row into mon_mod$ — undo needed
    • recursive commit
  • Set DDL marker in redo log (possibly holding the text of the DDL statement, but it’s not visible in the dump)
  • Set object reuse markers in the redo log
  • update tab$  — needs undo, it’s just DML
  • update ind$ — needs undo, it’s just DML
  • update seg$  — needs undo, it’s just DML (twice – once for table once for index)
  • update obj$ — needs undo, it’s just DML (twice – ditto)
  • COMMIT — at last, with a change vector for a “Standby metadata cache invalidation” marker

The remaining 12 transactions look like things that could be delayed to tidy up things like space management blocks for the files and tablespaces and releasing “block locks”.

This first, long, transaction, is the thing that has to happen as an atomic event to truncate the table – and you can imagine that if the database crashed (or you crashed the session) in the middle of a very slow truncate then there seems to be enough information being recorded in the undo to allow the database to roll forward an incomplete truncate, and then roll back to before the truncate.

It would be possible to test whether or not this would actually work – but I wouldn’t want to do it on a database that anyone else was using.

March 9, 2015

Flashback logging

Filed under: Infrastructure,Oracle,redo,undo — Jonathan Lewis @ 2:44 pm GMT Mar 9,2015

When database flashback first appeared many years ago I commented (somewhere, but don’t ask me where) that it seemed like a very nice idea for full-scale test databases if you wanted to test the impact of changes to batch code, but I couldn’t really see it being a good idea for live production systems because of the overheads.

Features and circumstances change, of course, and someone recently pointed out that if your production system is multi-terabyte and you’re running with a dataguard standby and some minor catastrophe forces you to switch to the standby then you don’t really want to be running without a standby for the time it would take for you to use restore and recover an old backup to create a new standby and there may be cases where you could flashback the original primary to before the catastrophe and turn it into the standby from that point onward. Sounds like a reasonable argument to me – but you might still need to think very carefully about how to minimise the impact of enabling database flashback, especially if your database is a data warehouse, DSS, or mixed system.

Imagine you have a batch processes that revolve around loading data into an empty table with a couple of indexes – it’s a production system so you’re running with archivelog mode enabled, and one day you’re told to switch on database flashback. How much impact will that have on your current loading strategies? Here’s a little bit of code to help you on your way – I create an empty table as a clone of the view all_objects, and create one index, then I insert 1.6M rows into it. I’ve generated 4 different sets of results: flashback on or off, then either maintaining the index during loading or marking it unusable then rebuilding it after the load. Here’s the minimum code:

rem     Script:         flashback_overheads.sql
rem     Author:         Jonathan Lewis
rem     Dated:          Feb 2015
rem     Purpose:
rem     Last tested

create table t1 segment creation immediate tablespace test_8k
select * from all_objects
where   rownum < 1

create index t1_i1 on t1(object_name, object_id) tablespace test_8k_assm_auto;
-- alter index t1_i1 unusable;

insert /*+ append */ into t1
with object_data as (
        select --+ materialize
                rownum <= 50000
counter as (
        select  --+ materialize
                rownum id
        from dual
        connect by
                level <= 32
        /*+ leading (ctr obj) use_nl(obj) */
        counter         ctr,
        object_data     obj

-- alter index t1_i1 rebuild;

Here’s a quick summary of the timing I got  before I talk about the effects (running

Flashback off:
Maintain index in real time: 138 seconds
Rebuild index at end: 66 seconds

Flashback on:
Maintain index in real time: 214 seconds
Rebuild index at end: 112 seconds

It is very important to note that these timings do not allow you to draw any generic conclusions about optimum strategies for your systems. The only interpretation you can put on them is that different circumstances may lead to very different timings, so it’s worth looking at what you could do with your own systems to find good strategies for different cases.

Most significant, probably, is the big difference between the two options where flashback is enabled – if you’ve got to use it, how do you do damage limitation. Here are some key figures, namely the file I/O stats and the some instance activity stats, I/O stats first:

"Real-time" maintenance
Tempfile Stats - 09-Mar 11:41:57
file#       Reads      Blocks    Avg Size   Avg Csecs     S_Reads   Avg Csecs    Writes      Blocks   Avg Csecs    File name
-----       -----      ------    --------   ---------     -------   ---------    ------      ------   ---------    -------------------
    1       1,088      22,454      20.638        .063         296        .000     1,011      22,455        .000    /u01/app/oracle/oradata/TEST/datafile/o1_mf_temp_938s5v4n_.tmp

Datafile Stats - 09-Mar 11:41:58
file#       Reads      Blocks    Avg Size   Avg Csecs     S_Reads   Avg Csecs     M_Reads   Avg Csecs         Writes      Blocks   Avg Csecs    File name
-----       -----      ------    --------   ---------     -------   ---------     -------   ---------         ------      ------   ---------    -------------------
    3      24,802      24,802       1.000        .315      24,802        .315           0        .000          2,386      20,379        .239    /u01/app/oracle/oradata/TEST/datafile/o1_mf_undotbs1_938s5n46_.dbf
    5         718      22,805      31.762        .001           5        .000         713        .002            725      22,814        .002    /u01/app/oracle/oradata/TEST/datafile/o1_mf_test_8k_bcdy0y3h_.dbf
    6       8,485       8,485       1.000        .317       8,485        .317           0        .000            785       6,938        .348    /u01/app/oracle/oradata/TEST/datafile/o1_mf_test_8k__bfqsmt60_.dbf

Mark Unusable and Rebuild
Tempfile Stats - 09-Mar 11:53:04
file#       Reads      Blocks    Avg Size   Avg Csecs     S_Reads   Avg Csecs    Writes      Blocks   Avg Csecs    File name
-----       -----      ------    --------   ---------     -------   ---------    ------      ------   ---------    -------------------
    1       1,461      10,508       7.192        .100           1        .017       407      10,508        .000    /u01/app/oracle/oradata/TEST/datafile/o1_mf_temp_938s5v4n_.tmp

Datafile Stats - 09-Mar 11:53:05
file#       Reads      Blocks    Avg Size   Avg Csecs     S_Reads   Avg Csecs     M_Reads   Avg Csecs         Writes      Blocks   Avg Csecs    File name
-----       -----      ------    --------   ---------     -------   ---------     -------   ---------         ------      ------   ---------    -------------------
    3          17          17       1.000       5.830          17       5.830           0        .000             28          49       1.636    /u01/app/oracle/oradata/TEST/datafile/o1_mf_undotbs1_938s5n46_.dbf
    5         894      45,602      51.009        .001           2        .002         892        .001            721      22,811        .026    /u01/app/oracle/oradata/TEST/datafile/o1_mf_test_8k_bcdy0y3h_.dbf
    6       2,586       9,356       3.618        .313         264       3.064       2,322        .001          2,443       9,214        .000    /u01/app/oracle/oradata/TEST/datafile/o1_mf_test_8k__bfqsmt60_.dbf

There are all sorts of interesting differences in these results due to the different way in which Oracle handles the index. For the “real-time” maintenance the session accumulates the key values and rowids as it writes the table, then sorts them, then does an cache-based bulk update to the index. For the “rebuild” strategy Oracle simply scans the table after it has been loaded, sorts the key values and indexes, then writes the index to disc using direct path writes; you might expect the total work done to be roughly the same in both cases – but it’s not.

I’ve shown 4 files: the temporary tablespace, the undo tablespace, the tablespace holding the table and the tablespace holding the index; and the first obvious difference is the number of blocks written and read and the change in average read size on the temporary tablespace. Both sessions had to spill to disc for the sort, and both did a “one-pass” sort; the difference in the number of blocks written and read appears because the “real-time” session wrote the sorted data set back to the temporary tablespace one more time than it really needed to – it merged the sorted data in a single pass but wrote the data back to the temporary tablespace before reading it again and applying it to the index (for a couple of points on tracing sorts, see this posting). I don’t know why Oracle chose to use a much smaller read slot size in the second case, though.

The next most dramatic thing we see is that real-time maintenance introduced 24,800 single block reads with 20,000 blocks written to the undo tablespace (with a few thousand more that would eventually be written by dbwr – I should have included a “flush buffer_cache” in my tests), compared to virtually no activity in the “rebuild” case. The rebuild generates no undo; real-time maintenance (even starting with an empty index) generates undo because (in theory) someone might look at the index and need to see a read-consistent image of it. So it’s not surprising that we see a lot of writes to the undo tablespace – but where did the reads come from? I’ll answer question that later.

It’s probably not a surprise to see the difference in the number of blocks read from the table’s tablespace. When we rebuild the index we have to do a tablescan to acquire the data; but, again, we can ask why did we see 22,800 blocks read from the table’s tablespace when we were doing the insert with real-time maintenance. On a positive note those reads were multiblock reads, but what caused them? Again, I’ll postpone the answer.

Finally we see that the number of blocks read (reason again postponed) and written to the index’s tablespace are roughly similar. The writes differ because because the rebuild is doing direct path writes, while the real-time maintenance is done in the buffer cache, so there are some outstanding index blocks to be written. The reads are similar, though one test is exclusively single block reads and the other is doing (small) multiblock reads – which is just a little bit more efficient.  The difference in the number of reads is because the rebuild was at the default pctfree=10 while the index maintenance was a massive “insert in order” which would have packed the index leaf blocks at 100%.

To start the explanation – here are the most significant activity stats – some for the session, a couple for the instance:

"Real-time" maintenance
Name                                                                     Value
----                                                                     -----
physical reads for flashback new                                        33,263
redo entries                                                           118,290
redo size                                                          466,628,852
redo size for direct writes                                        187,616,044
undo change vector size                                            134,282,356
flashback log write bytes                                          441,032,704

Name                                                                     Value
----                                                                     -----
physical reads for flashback new                                           156
redo entries                                                            35,055
redo size                                                          263,801,792
redo size for direct writes                                        263,407,628
undo change vector size                                                122,156
flashback log write bytes                                          278,036,480

The big clue is the “physical reads for flashback new”. When you modify a block, if it hasn’t been dumped into the flashback log recently (as defined by the hidden _flashback_barrier_interval parameter) then the original version of the block has to be written to the flashback log before the change can be applied; moreover, if a block is being “newed” (Oracle-speak for being reformatted for a new use) it will also be written to flashback log. Given the way that the undo tablespace works it’s not surprising if virtually every block you modify in the undo tablespace has to be written to the flashback log before you use it. The 33,264 blocks read for “flashback new” consists of the 24,800 blocks read from the undo tablespace when we were maintaining the index in real-time plus a further 8,460 from “somewhere” – which, probably not coincidentally, matches the number of blocks read from the index tablespace as we create the index. The odd thing is that we don’t see the 22,800 reads on the table’s tablespace (which don’t occur when flashback is off) reported as “physical reads for flashback new”; this looks like a reporting error to me.

So the volume of undo requires us to generate a lot of flashback log as well as the usual increase in the amount of redo. As a little side note, we get confirmation from these stats that the index was rebuilt using direct path writes – there’s an extra 75MB of redo for direct writes.


If you are running with flashback enabled in a system that’s doing high volume data loading remember that the “physical reads for flashback new” could be a major expense. This is particularly expensive on index maintenance, which can result in a large number single block reads of the undo tablespace. The undo costs you three times – once for the basic cost of undo (and associated redo), once for the extra reads, and once for writing the flashback log. Although you have to do tablescans to rebuild indexes, the cost of an (efficient, possibly direct path) tablescan may be much less than the penalty of the work relating to flashback.


Since you can’t (officially) load data into a table with an unusable unique index or constraint, you may want to experiment with using non-unique indexes to support unique/PK constraints and disabling the constraints while loading.

March 14, 2014

12c Temporary

Filed under: 12c,Infrastructure,Oracle,undo — Jonathan Lewis @ 6:41 pm GMT Mar 14,2014

Just one of those little snippets to cover something new and remind of something old. A single session can now have three (or more) temporary tablespaces in use at the same time for different reasons.


You can set parameter temp_undo_enabled to true, at which point the undo relating to global temporary tables (GTTs) will be written into the database’s default temporary tablespace (select value$ from props$ where name = ‘DEFAULT_TEMP_TABLESPACE’) which means it won’t generate redo. As an interesting consequence this also means that you can use DML on temporary objects in a physical standby database.

In 12c the temporary undo segment is reported with segtype = “UNDEFINED” in v$sort_usage/v$tempseg_usage; by 19c this has changed to “TEMP UNDO”.

A detail to watch out for: it looks as if each session gets its own temporary undo segment and the segment will not be released or shrink, until the session ends. Since the default definition for the default temporary tablespace is uniform allocation with 1MB extents; you might want to think carefully about specifying the extent size and tempfile size for the database default temporary tablespace before you enable this feature.

UPDATE: in 12.2 you can create a “local” temporary tablespaces, and even specify a default local temporary tablespace. This appears to be a feture aimed at RAC systems (and flex clusters) that allows each instance to write to “local” files rather than having to share files and space management with other instances. Every instance has the same temporary tablespace, which is automatically a bigfile tablespace (i.e. one file) and the name of the associated tempfile has a numeric postfix identifying the instance which uses it. I have yet to check whether “temp undo” will be writte to the database’s temporary or local temporary tablespace.



In 11g you can specify a tablespace (though – even in 19c – not a tablespace group) in the ‘create global temporary table’ statement, which means you can keep activity about GTTs separated from the I/O resulting from sorts and hash joins etc. spilling to disc. In fact you could specify a different temporary tablespace for every GTT if you wanted to – realistically, I could imagine a case for having a couple of different temporary tablespaces to accommodate GTTs with very different usage characteristics. (Unfortunately you still can’t specify a tablespace in the dbms_lob.create_temporary() function).

If you don’t specify a tablespace for a GTT it will go into the default temporary tablespace of the user who is using it (not the database default, and not the default for the definer of the GTT). If you create indexes on a GTT, or include LOB columns, they will automatically go into the same tablespace as the table.


Finally, of course, there’s the default temporary tablespace for the user and this is where GTTs will go if they don’t have a tablespace specified, and where all the scratch data (sorts, hash tables, materialized “with” subqueries et. al.) will go.


The range of possibilities means, of course, that you could manage to do a single “insert as select” writing a GTT to one temporary tablespace, with its undo going to a second temporary tablespace, and the spill from a sort or hash join in the select going to a third temporary tablespace. The flexibility probably won’t make much difference to performance (for most people), but it’s possible that it will make it easier to monitor where the I/O is coming from if you’re ever in the position where a single temporary tablespace would be subject to a lot of I/O.


In the past I’ve advised DBAs to set up a small number of temporary tablespaces (or temporary tablespace groups) so that they can allocate different classes of users – typically grouped by business function – to different temporary tablespaces. The ability to allocate GTTs to temporary tablespaces allows a further degree of refinement to this strategy.

If you do use the view v$sort_usage to check what the temporary space allocation looks like it has a column session_num, which sounds like it ought to be the session sid; it isn’t, it’s the serial# (the v$sort_usage.session_addr will give you the link to the session via v$session.saddr).

Note also that the column v$sort_usage.sql_id identifies the statement currently being executed by the session, not (necessarily) the statement that caused the temporary space allocation. In 12c Oracle added the column sql_id_tempseg to report this rather important piece of information.

Old habits die hard – I tend to think of v$sort_usage first when I want to examine what’s going on in the temporary tablespace(s) but you should probably use (or be aware of) the “modern” object v$tempseg_usage which is (still) just a synonym for v$sort_usage.

Update (May 2018)

I’ve just discovered an undesirable side effect of enabling temporary undo. I’ve tried to do an “explain plan” on a distributed query in an instance of (from a PDB) where temp_undo_enabled was set to true, and got the following Oracle error messages:

explain plan for select  /*+ driving_site(t2) */
ERROR at line 1:
ORA-00604: error occurred at recursive SQL level 1
ORA-55525: cannot generate temporary undo in a distributed transaction
ORA-02063: preceding line from ORCL
ORA-02063: preceding 3 lines from ORCL@LOOPBACK

You’ll notice from the echoed first line of the call to explain plan that I’ve used the driving_site() hint in this example to make the remote database optimise the query, and that’s why the last two lines (ORA-02063) appear; if I remove the driving_site() hint then only the first two error messages appear.

Update (July 2020)

In a recent thread on the Datbase Admin forum on MOSC (needs a valid account) the point came up that 18.3 introduces data errors if you try to mix global temporary tables, distributed transactions and enable temp undo – but doesn’t necessarily report ORA-55525 when (apparently) it ought to (as we saw in the May 2018 update above).

Moreover the manuals for Oracle 19c still report that

  • Distributed transactions are not supported for temporary tables

This, despite the fact that they seem to have operated for many years with no problem – until you enable temp undo, that is.

Update (April 2022)

Prompted by a discussion about using global temporary tables and enabling temp undo, I spent a few minutes search for some notes that I thought I had written about the combination and the impact of materialized “with” subqueries. It’s another note I never got around to writing, but most of the material and ideas can be found in this thread on Oracle Developers’ Forum.

Critically, and subject to further testing, if you try to “alter session” to enable temp undo then it may have no effect if the session has already generated undo. The thread doesn’t examine all the possible ways in which this could happen, though; not does it examine the case where the parameter has been set at the system level or in the startup file. (Tests and blog note to follow – but not in the very short term.)


September 2, 2011


Filed under: Infrastructure,Oracle,undo — Jonathan Lewis @ 5:15 pm BST Sep 2,2011

A question that comes up from time to time on the OTN forums is the one about how much undo do you generate when you insert or update a row. It’s not a question that has an immediate answer – it depends on what you’re actually doing, the amount of change you introduce, and the number of indexes affected. However, there are a few guidelines that will point you in the right direction – and the key is to keep an eye on the used_urec and used_ublk colummns from v$transactions.

Consider the example of inserting data: when you insert a row, you typically generate one undo record (used_urec) for the row, but when you insert many rows using an array insert Oracle has an optimisation mechanism on undo record creation that allows it to create one used_urec to cover all the changes you have made simultaneously to an individual block – so used_urec could be much smaller than the number of rows processed.

However, if you have indexes in place and are doing normal index maintenance on import, then each table row would require each index to be updated, so you would go back to one used_urec per table row plus one used_urec per index maintained per table row.

So, when you look at the “big picture” there’s no obvious correlation between rows inserted and undo generated — until you look at the fine detail of exactly what you’re doing, and whether any optimisations apply. (The details of the optimisation strategies available vary with the chosen insert mechanisms and with version of Oracle)

Just for reference, here’s a link to a little note I wrote some months ago about monitoring undo.

May 8, 2011

Consistent Gets

Filed under: Infrastructure,Oracle,Read Consistency,Troubleshooting,undo — Jonathan Lewis @ 10:09 am BST May 8,2011

There’s an interesting question on the OTN database forum at present – why does an update of 300,000 rows take a billion buffer visits. (There are 25 indexes on the table – so you might point a finger at that initially, but only one of the indexes is going to be changed by the update so that should only account for around an extra 10 gets per row in a clean environment.)

April 19, 2011

More CR

Filed under: Infrastructure,Oracle,Performance,Read Consistency,Troubleshooting,undo — Jonathan Lewis @ 6:32 pm BST Apr 19,2011

Following on from yesterday’s post on consistent reads, I thought I’d make the point that the way you work can make an enormous difference to the amount of work you do. Here’s a silly little demo (in

April 18, 2011

Consistent Reads

Filed under: Infrastructure,Oracle,Performance,Read Consistency,Troubleshooting,undo — Jonathan Lewis @ 11:08 am BST Apr 18,2011

Here’s a quick demo to make a point about consistent reads (prompted by a question on the Oracle-L mailing list):

December 23, 2010

Private Redo

Filed under: redo,undo — Jonathan Lewis @ 10:17 am GMT Dec 23,2010

About this time last year I wrote a short answer on OTN about Private Redo Threads and In-Memory Undo. Thanks to a follow-up question a year later I’ve been prompted to supply a link to the thread because my original answer was actually pretty good. (And the answer to the follow-up is also worth reading).

Update: If you’re looking at this note and haven’t expanded the view to see the comments from readers, make sure that you do look at them since they include a couple of technical details I described in response to further follow-up questions on the OTN thread.

September 22, 2010

Session Undo

Filed under: Infrastructure,Oracle,Troubleshooting,undo — Jonathan Lewis @ 8:40 pm BST Sep 22,2010

One of the simple scripts I use from time to time monitors the amount of undo generated by a long running session by taking a session id (SID) as an input and joining v$session to v$transaction:

column start_scn format 999,999,999,999

        tr.start_scn, tr.log_io, tr.phy_io, tr.used_ublk, tr.used_urec, recursive
        v$session       se,
        V$transaction   tr
        se.sid = &1
and     tr.ses_addr = se.saddr

---------------- ---------- ---------- ---------- ---------- ---
 136,509,466,121   10730956     358074          1          1 NO
 136,515,115,543         12          0          1          3 YES

You’ll note that this is a session that has been doing quite a lot of work (log_io = db block gets + db block changes, and phy_io = physical reads). I was watching it because I wanted to check that it was only generating undo in a series of very short recursive transactions. (It’s running a coalesce on a very large, very smashed, index).

Interestingly the work done by the recursive transactions for the coalesce was accounted for against the parent transaction, even though (fortunately) the undo block and record usage was not. The used_urec and used_ublk columns can best be described as showing you “currently held” space, rather than “cumulative space used”.

Addendum: with reference to Gary’s comment below – here’s a lovely little script – originally written for 9i, but still working) from Steve Adams for estimating how long it will take for rolling back to complete. Note that it references sys.x_$ktuxe; Steve has a convention of creating “x_” views on the x$ objects when there is no v$ view supplied by Oracle, so x_$ktuxe is just a sys-owned view defined as ‘select * from x$ktuxe’.

February 9, 2010

Why Undo ?

Filed under: Infrastructure,redo,undo — Jonathan Lewis @ 5:32 pm GMT Feb 9,2010

A recent thread on the OTN database forum asks the question:

“… since redo also has past and current information why can’t redo logs be used to retrieve that information? Why is undo required when redo already has all that information.”

The thread has generated some interesting replies – but to a large extent they describe how the undo and redo work rather than explaining why the designers at Oracle Corp. chose to implement undo and redo the way they did. Since I’m sitting in an airport (Zurich – where the coffee bar in the business lounge has a very large bowl of Lindt chocolate squares available) waiting for a plane I thought I’d use the time to share my thoughts on the why.


October 21, 2009

Bitmap Updates

Filed under: bitmaps,Indexing,Infrastructure,Oracle,Performance,redo,undo — Jonathan Lewis @ 5:33 pm BST Oct 21,2009

It is fairly well-known that bitmap indexes are very dense structures that can behave badly if their underlying tables are subject to even fairly low levels of insert, update or delete activity. Problems include contention, space management and performance, and these problens have spawned a couple of well-known guidelines relating to bitmap indexes:

  • Avoid concurrent modification of data by multiple processes – otherwise you can end up with processes dead-locking
  • Drop/disable bitmap indexes before data loads and rebuild them afterwards.

Of course, with a little care and experimentation, you may find that you don’t need to apply the second guideline in all cases – especially for bulk inserts.

Next Page »

Website Powered by