Why Undo ?

February 9, 2010

Why Undo ?

Filed under: Infrastructure,redo,undo — Jonathan Lewis @ 5:32 pm GMT Feb 9,2010

A recent thread on the OTN database forum asks the question:

“… since redo also has past and current information why can’t redo logs be used to retrieve that information? Why is undo required when redo already has all that information.”

The thread has generated some interesting replies – but to a large extent they describe how the undo and redo work rather than explaining why the designers at Oracle Corp. chose to implement undo and redo the way they did. Since I’m sitting in an airport (Zurich – where the coffee bar in the business lounge has a very large bowl of Lindt chocolate squares available) waiting for a plane I thought I’d use the time to share my thoughts on the why.

Redo gives us two things: recoverability and efficiency. If we record a description of every change we make to a database then we can take an old copy of the database and re-apply the list of changes to bring it up to date – that’s recoverability. If we make that list the first point of protection for our data changes we need only keep writing a small volume of information to disc, rather than writing each data block to disc the moment we change it – that’s an aid to efficiency. (Of course, although focusing all the changes into a small space gives us I/O efficiency, anything that focuses a lot of activity into a small space also gives us a point of contention).

Undo (in some form) has to exist if we want read-consistency. No-one else should be allowed to see our data changes until we commit them and our changes shouldn’t even be visible to queries that started running before we committed them, so we have to keep older versions of the data somewhere. This is where things get a little counter-intuitive (but very clever) with Oracle.

Take a close look at the following redo record; it’s a record generated in the middle of a transaction and it’s the redo that Oracle generated as we changed the fourth column of a particular row in a table from the value 42 to the value 43.

REDO RECORD - Thread:1 RBA: 0x000056.00000028.0014 LEN: 0x0128 VLD: 0x01
SCN: 0x0000.0028efae SUBSCN: 3 02/09/2010 14:59:40
CHANGE #1 TYP:0 CLS:20 AFN:2 DBA:0x0080065f OBJ:4294967295 SCN:0x0000.0028efae SEQ: 24 OP:5.1
ktudb redo: siz: 112 spc: 5414 flg: 0x0022 seq: 0x0112 rec: 0x19
  xid: 0x0002.028.000009af
ktubu redo: slt: 40 rci: 24 opc: 11.1 objn: 49428 objd: 49428 tsn: 5
Undo type:  Regular undo       Undo type:  Last buffer split:  No
Tablespace Undo: No
             0x00000000
KDO undo record:
KTB Redo
op: 0x02  ver: 0x01
op: C uba: 0x0080065f.0112.18
Array Update of 1 rows:
tabn: 0 slot: 41(0x29) flag: 0x2c lock: 0 ckix: 11 <----- row 42 in block
ncol: 4 nnew: 1 size: 0
KDO Op code:  21 row dependencies Disabled
  xtype: XAxtype KDO_KDOM2 flags: 0x00000080 bdba: 0x0180000a hdba: 0x01800009
itli: 2 ispac: 0 maxfr: 4863
vect = 9
col 3: [ 2] c1 2b <----- Change fourth column back to 42

CHANGE #2 TYP:0 CLS: 1 AFN:6 DBA:0x0180000a OBJ:49428 SCN:0x0000.0028efae SEQ: 41 OP:11.19
KTB Redo
op: 0x02  ver: 0x01
op: C  uba: 0x0080065f.0112.19
Array Update of 1 rows:
tabn: 0 slot: 41(0x29) flag: 0x2c lock: 2 ckix: 11 <----- row 42 in block
ncol: 4 nnew: 1 size: 0
KDO Op code:  21 row dependencies Disabled
  xtype: XAxtype KDO_KDOM2 flags: 0x00000080  bdba: 0x0180000a  hdba: 0x01800009
itli: 2  ispac: 0  maxfr: 4863
vect = 9
col 3: [ 2] c1 2c <----- Change fourth column to 43

You’ll notice that the redo record consists of two “change vectors” (CHANGE #1 and CHANGE #2), and I’ve labelled the two lines showing that the change was from 42 to 43. In principle, the Oracle redo log really does contain all the information we need to generate old versions of data. So why does Oracle introduce the undo tablespace for read-consistency when the redo log seems to have all the required information ?

To answer this question we need to think multi-user. (It’s very easy to forget, when you’re trying to understand some feature, that Oracle is a multi-user system but if you ignore that fact you forget to think about what’s going on from the point of view of other sessions, and also forget to think about the impact of other sessions).

Thought experiment number 1: I am executing a long-running transaction and lots of other sessions are executing little queries and short transactions. The other sessions will have to see data as it was before I changed it. If they have to visit the redo to see this data the online redo log file will have to be arbitrarily large to ensure that it can hold all the necessary information – and that’s information required to log the forward and backward changes from all the sessions for all the transactions that have executed since I started my transaction. What’s supposed to happen if the redo log files fill ?

Thought experiment number 2: If my long running transaction fails and has to roll back then the backward changes I need will be intermingled with all the forward and backward changes that everyone else has made since my transaction started, so my session will probably end up doing a lot of random I/O to find the backward changes. Of course we could make this task more efficient (and address number 1 a little better) by making each transaction write its own log file – but then we increase the complexity of file handling (opening and closing files for every start and stop of a transaction), and of applying the redo log files in the right order for recovery.

Thought experiment number 3: Currently Oracle does not tidy up every changed block when a transaction commits. If I commit a transaction but don’t tidy up block X, how is the next session that visits that block going to know that my transaction committed and when it committed. If all we used was the redo log, the next session would have to go to the point in the redo log that had recorded the change to block X, and walk forward until it found the commit record (or got to the end of the redo log, or found a suitable SCN in the redo – and thus could determine that my transaction had not committed, or had not committed “in time”) But then every redo record for a given transaction would need a “forward pointer” to the next redo record – which means every time you created a redo record you’d have to go backwards to update your previous redo record; and that’s not going to be efficient. Plan B, then, is to have a mechanism for keeping an indefinitely long list of transactions and their commit SCNs somewhere in the database – and that “indefinitely” is not a nice idea.

And so it goes on – you may find some activity where using the redo log as the source for undo would be perfectly acceptable, but when you start to consider all the options of a multi-user system with mixed activity you find that some types of activity are likely to run into, or cause, problems.

Oracle’s solution is to keep the “usable” backward changes, the undo, somewhere completely different from the redo. Rather than create a third type of storage structure they’ve chosen to put it into the database using ordinary data blocks. Think about how this addresses the issues I’ve mentioned so far.

Because undo blocks are just database blocks they are protected by the redo log and can be recovered as the database is recovered without introducing an extra code path to recovery. The “backward” change vector we saw recorded in the redo log record was actually the forward change vector for the undo block – we’ve introduced an overhead in the redo log because we write undo twice (but that’s what we do with every data change anyway – write it once on the block and once in the redo log) The undo blocks also benefit from the standard LRU mechanisms – if an undo block is being used a lot it stays in memory, and that’s nice because recently generated undo is more likely to be undo we want to use for read-consistency.

Because we keep the undo streams inside the database we don’t have to worry about sessions constantly opening and closing files as they start and stop transactions – we just have to set up some sort of catalog that lists the current active transactions and points to the place in the undo tablespace where that transaction is currently putting its undo records. (This is what the transaction table in the undo segment header is about (see glossary)).

Because we use database blocks as the unit of storage and keep the undo for different transactions separate, when I write my undo my records are nicely clustered – I acquire and fill one block at a time – so if I have to roll my transaction back a single undo block read will get me lots of undo records, and I minimise the random I/O I have to do to read back the undo stream.

If I start a very long running transaction while other sessions are also generating undo, there is a degree of independence between my undo and their undo. In a compromise between “all the undo goes into one file” and “every transaction gets its own undo file”, Oracle has allowed for multiple undo segments – i.e. the equivalent of a relatively small number of shared files – which means that a single long-running transaction is restricted to a single undo segment and all the other transactions can keep re-using (over-writing) any remaining space in the undo tablespace much more effectively. (Of course we can still end up with “file full” errors – but the strategy makes it possible for the database to survive concurrent activity much longer on the same amount of disk space.)

It would be impossible to explain all the subtle details of the problems that are bypassed, or minimised, but putting undo into the database and using it the way Oracle does, but the summary is this: if you want to know why Oracle uses an undo tablespace rather than using the redo log for read-consistency, rolling back, and commit-point calculations, just think about all the scenarios you’d have to address and work out how you’d address them with just the redo log. This will give you some idea of the extra overhead and complexity you’d have to introduce – and may allow you to see how the undo tablespace makes things (relatively) so much simpler and more efficient.

Comments (30)

30 Comments »

Just wanted to thank you with all feelings of appreciation for these great information

Comment by Eslam — February 9, 2010 @ 7:30 pm GMT Feb 9,2010 | Reply
- Hello, I have a question regarding read-consistency. Suppose, my transaction changed a data block in buffer and it got written to data file before I issue commit because of some reason. Now, I do not commit this record for 3-4 days. In the meanwhile,many transactions are taking place which have pushed Oracle to use undo space and now we have come to the situation where undo data related to my transaction is also no longer in undo space (remember still transaction has not been committed). So, we have a situation where a uncommitted data is existing in database without having any undo data. Is it possbile?
  and if yes, then what will happen if some other transaction wants to read same rows, so from where that transaction is supposed to get data (undo space is cleared), or will it read uncommitted data ?
  
  Comment by Nikhil Mishra — October 7, 2010 @ 10:59 am BST Oct 7,2010 | Reply
  - Oracle will not over-write undo for an uncommitted transaction, it will keep extending the undo segment rather than doing this.
    
    Ultimately, in your theoretical case, you may get an error from some session saying something like: “unable to extend rollback segment, tablespace full” because it’s possible that your rollback segment (being used by other sessions) will slowly extend and steal space from other rollback segments until all the other rollback segments are full, and unable to extend – and your rollback segment could have lots of “redundant” space which is unusable because all the space is between the uncommitted transaction and the current “high water mark” of the rollback segment.
    
    Comment by Jonathan Lewis — October 7, 2010 @ 6:54 pm BST Oct 7,2010 | Reply
Thanks so much sir for this excellent explanation. I shall add this link to that thread as well.

Sir, can you please tell the command that you used to get that redo record?

Thanks and regards
Aman….

Comment by Aman.... — February 10, 2010 @ 2:10 am GMT Feb 10,2010 | Reply
- Aman,
  
  The basic command is:
```
alter system dump logfile '{name of file}';
```
  The file name can reference any online or archived log file from any current or earlier version of Oracle.
  
  There are several options that allow you to be selective – although the options and syntax change across versions of Oracle. For example, in 10g, you can dump all the redo records that contain change vectors for a given database block with:
```
alter sytem dump logfile '{name of file}'
dba min file_no block_no dba max file_no block_no;
```
  Earlier versions of Oracle needed a dot, surrounded by spaces, between the file number and block number.
  
  Comment by Jonathan Lewis — February 10, 2010 @ 12:29 pm GMT Feb 10,2010 | Reply
Remember the BI.ORA file in V5 ? That was before Oracle seperated Undo (Rollback) and Redo in V6.
When I had a look at DB2 7.2, it seemed to be similar to Oracle V5 in that respect.
Hemant

Comment by Hemant K Chitale — February 10, 2010 @ 2:38 am GMT Feb 10,2010 | Reply
- I believe Oracle had rudimentary MVCC capability starting with version 3 (BI.ORA). It was definitely present in V4 and V5.
  
  DB2 transaction log, however, is not similar to the Oracle V5 before image facility for the simple reason that DB2 does not implement MVCC and does not need any kind of version store, therefore.
  
  Comment by Val — February 10, 2010 @ 6:34 pm GMT Feb 10,2010 | Reply
I believe at V5 there was the After Image Journal (AIJ) for redo.

Comment by Richard — February 10, 2010 @ 10:01 am GMT Feb 10,2010 | Reply
- Richard,
  
  That’s my memory too. My memory of it is that the BI (before image) file recorded prior versions of blocks – which may be why some people think that whole data blocks are always copied into the undo – and the AIJ (after image journal) recorded change descriptions.
  
  Comment by Jonathan Lewis — February 10, 2010 @ 12:26 pm GMT Feb 10,2010 | Reply
- If I remember correctly, the AIJ was not mandatory. It was “like” ArchiveLog mode in V6 and later. Without it, there was a single BI file.
  
  Hemant
  
  Comment by Hemant K Chitale — February 10, 2010 @ 2:18 pm GMT Feb 10,2010 | Reply
  - Hemant,
    You may well be right – my memory of it isn’t very strong any more.
    
    Comment by Jonathan Lewis — February 10, 2010 @ 10:25 pm GMT Feb 10,2010 | Reply
Jonathan,
thank you for the brilliant explanation! In my view, we just cannot overestimate the huge benefit and the elegance of implementation of the playing together of UNDO and REDO. It is simply the core feature of Oracle Databases since version 6, making Oracle Databases that much successful especially in the OLTP area with large user populations. That is the main reason why nobody wants an SAP (for example) installation on Non-Oracle Database with more than 20 users working :-)
I still get enthusiastic about it when I tell my students of UNDO etc. in my DBA-classes after all those years!

Comment by Uwe Hesse — February 10, 2010 @ 2:16 pm GMT Feb 10,2010 | Reply
- Uwe,
  
  Version 6 – I made the same point in an impromptu interview at Open World 2009. The introduction of the undo/redo at that version really was the critical feature that made Oracle.
  
  Comment by Jonathan Lewis — February 10, 2010 @ 10:18 pm GMT Feb 10,2010 | Reply
Hi Sir,

Its always nice to read your blog.I had a question running up in my mind.The database is recovered in mount stage,and in mount stage the datafiles are not opened,so when the archives are applied(which has both the committed and uncommitted data),is it this information of redo change vector that helps in rollforward and rollback of the uncommitted data?

Regards,
Anand

Comment by Anand — February 10, 2010 @ 3:27 pm GMT Feb 10,2010 | Reply
- Anand,
  
  The redo change vectors handle the roll-forward – they bring the undo blocks and the table (and index) blocks forward in time to the moment the database crashed.
  
  Once the rollforward is complete, Oracle can look at the undo segment header (for the transaction tables) and look for evidence of uncommitted transactions that need to be rolled back.
  
  Comment by Jonathan Lewis — February 10, 2010 @ 10:20 pm GMT Feb 10,2010 | Reply
  - Sir,
    
    Once the roll back is initiated(in the open stage) , what’s the actual process? Is it applying the undo statements over the changed yet uncommitted dirty buffers? I believe this also should be happening in the buffer cache only. So is it something like that the Undo buffers are pulled in the buffer cache for the corresponding dirty buffers, applied over the dirty buffers and are pushed back to the datafiles?
    
    Regards
    Aman….
    
    Comment by Aman.... — February 11, 2010 @ 2:07 am GMT Feb 11,2010 | Reply
    - Aman,
      You need to be very fussy about your choice of words.
      
      Dirty buffers are in-memory versions of blocks that don’t match what’s on disk . (See https://jonathanlewis.wordpress.com/2009/06/16/clean-it-up/ ). They can be dirty whether or not they are committed. It’s also not appropriate (in this context) to talk about undo “statements” – I think you mean undo “records”. And changes only ever happen in the buffer cache (apart from the changes made by direct path writes).
      
      In the simplest cases the sequence of events after the roll-forward had completed would be:
      
      Read the first undo segment header block, and identify first uncommitted transaction.
      
      Go to last undo block for that transaction and read last undo record in block
      
      Read data block identified by the undo record and apply the undo record
      
      Log the change
      
      Follow the pointer to the previous undo record .. and repeat
      
      Any blocks not in the cache when needed would be read as required; the normally checkpoint, LRU and touch count mechanisms would be active to deal with writing and flushing of blocks.
      
      Comment by Jonathan Lewis — February 11, 2010 @ 8:40 am GMT Feb 11,2010
Jonathan,

Thanks for the wonderful blog.

You have explained this very nicely and also your answers on the comments on some of the questions that were raised from this blog entry.

I always learn a lot from your blogs.

Thanks
Aswath

Comment by Aswath Rao — February 11, 2010 @ 3:04 am GMT Feb 11,2010 | Reply
- Aswath,
  Thanks for the comment.
  
  Comment by Jonathan Lewis — February 11, 2010 @ 8:25 am GMT Feb 11,2010 | Reply
Hi Sir,
I think i was not clear in my previous question.If i restore a database from suppose a cold backup and want to recover it till the last archivelog that is have,i mount the database and use recover database using backup controlfile until cancel.Now this prompts for a archivelog sequence.I provide that sequence number.So when oracle applies that archivelog (archivelog will have both committed and uncommited data) what all does it do??

Regrads,
Anand

Comment by Anand — February 11, 2010 @ 8:58 am GMT Feb 11,2010 | Reply
- Anand,
  
  Oracle will process one complete redo record at a time in order. Each redo record will hold one or more change vectors. Each change vector will describe a change to a block in the datbase and give some information about the expected state of that block (e.g. SCN and change number). Oracle will try to apply each change vector in turn.
  
  This will continue until you issue the cancel – at which point Oracle can check the undo segment headers to identify the transactions that were uncommitted at that point in the stream of log information, and roll them back.
  
  (I’ve ignored the details of the two-pass strategy that appeared in 10g to reduce the amount of random I/O on the data files as the redo is applied.)
  
  Comment by Jonathan Lewis — February 11, 2010 @ 7:51 pm GMT Feb 11,2010 | Reply
Awesome…. Thanks a ton.

Comment by Khurana — February 11, 2010 @ 3:54 pm GMT Feb 11,2010 | Reply
Jonathan,

your explanation of concept of redo and undo is excellent. I learnt a lot from your blog and book on cost based optimization.

Could you please blog on oracle 11gR2 performance features ?

Satish

Comment by satish. — February 12, 2010 @ 7:41 pm GMT Feb 12,2010 | Reply
[…] 13, 2010 by tylerv Fantastic article by Jonathan Lewis on why Oracle couldn’t just use REDO to cover the purposes of both UNDO and […]

Pingback by REDO and UNDO – the ‘why’ behind them « DBA n00b — February 13, 2010 @ 6:01 pm GMT Feb 13,2010 | Reply
[…] See also rollback(2) above, and separate blog article “Why Undo?” […]

Pingback by Glossary « Oracle Scratchpad — February 20, 2010 @ 6:50 pm GMT Feb 20,2010 | Reply
Jonathan,

Nice synopsis of differences between redo and undo. You would be surprised to find that many DBAs do not understand the difference between these two items in Oracle.

Cheers,
Ben

Comment by Ben Prusinski — February 28, 2010 @ 5:56 am GMT Feb 28,2010 | Reply
[…] Jonathan Lewis-Why Undo? […]

Pingback by Blogroll Report 05/02/2009 – 12/02/2010 « Coskan’s Approach to Oracle — March 10, 2010 @ 2:28 am GMT Mar 10,2010 | Reply
[…] https://jonathanlewis.wordpress.com/2010/02/09/why-undo/ […]

Pingback by True or False – Undo « Charles Hooper's Oracle Notes — May 9, 2010 @ 2:10 pm BST May 9,2010 | Reply
[…] article de Jonathan Lewis publié sur son blog. L’article original en anglais se trouve ici. Merci à Mohamed Houri qui m’a aidé pour cette […]

Pingback by A quoi sert l’undo (rollback segment), par Jonathan Lewis | Oracle – Concepts et Exemples — November 19, 2013 @ 4:02 pm GMT Nov 19,2013 | Reply
[…] https://jonathanlewis.wordpress.com/2010/02/09/why-undo/amp/ […]

Pingback by Why Undo ? – Oracle Scratchpad | asidoroff — July 13, 2019 @ 10:39 am BST Jul 13,2019 | Reply