Oracle Scratchpad

October 7, 2009

Undone

Filed under: Infrastructure,Oracle,Troubleshooting,undo — Jonathan Lewis @ 7:17 pm BST Oct 7,2009

Here’s a little problem that was thrown at me a little while ago during a visit to a client of mine. It only took about five minutes to fix, but it seemed like a nice demonstration of trouble-shooting so I thought I’d write it up for the blog.

We start with the moment when one of the developers told me that they were seeing lots of session failing with the error “unable to extend segment in undo tablespace”.

The obvious thing to do was a quick check for any transactions using a lot of undo space – so here’s a little query to supply the session address and number of undo blocks used for active transactions:

SQL> select ses_addr, used_ublk from v$transaction;

SES_ADDR          USED_UBLK
---------------- ----------
0700000108372710          1
07000001093A90C0          1
070000010836D170          1
070000010F37C198          1
070000010A3C6C58          1
070000010F3897A8          1

So, none of the active transaction seems to be taking any undo space. I checked v$lock for locks of type ‘TX’, and v$session for rows with non-null taddr values just in case there were some hidden transactions – but the six I’ve shown were all there were.

So let’s take a quick look at the undo undo segments themselves to see if there’s anything of note there:

select  segment_name, blocks
from    dba_segments
where   tablespace_name = 'UNDO'
order by
        blocks
...
_SYSSMU42$                                     10,632
_SYSSMU35$                                     10,712
_SYSSMU33$                                     11,676
_SYSSMU37$                                     11,756
_SYSSMU38$                                     12,016
_SYSSMU39$                                    150,756
_SYSSMU12$                                    361,800
_SYSSMU4$                                   2,410,568

171 rows selected.

Most of the segments were tiny – but there was one which had taken most of the space in the tablespace. (We were on 16KB blocks, so that’s about 37 GB for an undo segment! Surely it should have been freed if no-one was using the space).

Clearly we need a little more detail about those tiny transactions – so here’s a query that picked up some of the session details along with the transaction start time:


select
sid, serial#, username, osuser,
to_char(logon_time,'dd-mon hh24:mi:ss') connected,
to_char(start_date,'dd-mon hh24:mi:ss') started
from
        v$transaction tr,
        v$session se
where
        se.saddr = tr.ses_addr
order by
        started
;

SID   SERIAL# USERNAME         OSUSER          CONNECTED       STARTED
----- --------- ---------------- --------------- --------------- ---------------
662       925 P_READONLY       wilesc          21-jul 15:40:43 23-jul 12:13:02
379       342 P_READONLY       greena          29-jul 10:17:49 29-jul 11:24:03
353      1482 P_READONLY       edwards         29-jul 11:32:18 29-jul 11:32:19
363      7972 P_READONLY       edwards         29-jul 17:53:27 29-jul 17:56:47
596      5553 DEV_SUPP         smithj          24-jul 09:00:36 30-jul 17:26:03

The most recent transaction started on 30th July, but there are three transactions that started “yesterday”, and one that started a week earlier. Clearly none of these transactions has done much – but they’ve been hanging round for a long time.

Simply by existing, they’ve been holding the “start point” of an undo segment, and as time has passed more and more space has gradually been migrated into these undo segments as Oracle chose them for other transactions. And any space that got into one of those undo segments cannot be transferred until the holding transaction completes.

Just to clarify this point – let’s imagine the history of _SYSSMU4$, the undo segment that is currently at 37GB. This is the undo segment that session 662 picked. When it started, the transaction could have been somewhere in the middle of the list of extents currently allocated to the undo segment.

As time passes, other transactions come and go, and gradually they allocate space to extend this undo segment. At the same time extents that had been allocated before session 662 started its transaction will pass the undo_retention_time and be released. But that one tiny transaction keeps its extent “locked” into the segment, and stops Oracle from dropping any extents added later. Inevitably the segment grows until it has consumed all the available space in the undo tablespace.

Having identified the cause of the problem, I tracked down the users found out what they were doing, got a coupe of them to disconnect and killed the rest of the sessions to clear the problem.

Footnote: despite the schema names, this was a development copy of the production database, so I didn’t have any qualms about killing sessions that were causing problems.

8 Comments »

  1. I have few questions:

    1) How did you come to know that it was session 662 which was holding a lock?
    2) When a segment is already allocated to a particular session and is not being used then why/how does Oracle allocate more and more segments into it and make it fatter and fatter until it gets obese.

    Comment by Oracler — October 8, 2009 @ 12:10 pm BST Oct 8,2009 | Reply

  2. Oracler,
    1) All the sessions listed were holding locks. The missing piece is: which session was using which undo segment?

    select  s.sid, s.serial#, r.segment_name
    from    v$session s
    join    v$transaction t on s.taddr = t.addr
    join    DBA_ROLLBACK_SEGS r on r.segment_id = t.xidusn;
    

    2) If I read Jonathan correctly, an EXTENT is allocated, not a SEGMENT. Extents can be reused or added freely at the end of the segment, but the starting point remains the same as long as the first extent is still being used.

    Comment by Stew Ashton — October 8, 2009 @ 2:06 pm BST Oct 8,2009 | Reply

    • Sorry, I should have mentioned that the code in 1) is based on an answer in asktom.oracle.com.

      Comment by Stew Ashton — October 8, 2009 @ 2:10 pm BST Oct 8,2009 | Reply

  3. Jonathan,
    to avoid recurrence of the issue did you suggest changing transactions_per_rollback_segment to 1 on the dev instance ?
    regards
    srivenu

    Comment by srivenu — October 8, 2009 @ 5:58 pm BST Oct 8,2009 | Reply

  4. Oracler,
    I think Stew Ashton has answered your questions.

    Stew,
    Just one little detail for clarification – the transaction acquires a block at a time, not an extent; but the consequence, of course, is that the extent is effectively locked.

    srivenu,

    Unless that parameter has changed its purpose recently, it has nothing to do with the problem. It used to relate to RAC (or OPS) systems and public (manual) rollback segments, and told each instance the minimum number of segments they had to acquire on startup.

    Comment by Jonathan Lewis — October 9, 2009 @ 3:22 pm BST Oct 9,2009 | Reply

  5. Hi jonathan,

    I thought

    1. Oracle will try to steal expired extents from the same segment
    2. Oracle will then try to steal expired extents from other undo segments
    3. Oracle will try to steal unexpired inactive extents from the same segment
    4. Oracle will try to steal unexpired inactive extents from the other undo segments.
    5. Oracle will try to extend the undo segment
    6. Oracle will try to extend the tablespace (if autoextend)
    7. Throw an error.

    So it this case shouldn’t oracle try to steal expired/unexpired extents from the _SYSSMU4$

    Or my understanding in incorrect ?

    thanks

    Comment by Amit Poddar — October 9, 2009 @ 11:57 pm BST Oct 9,2009 | Reply

  6. Amit,

    I think I’d make a couple of changes to that sequence – but the important assumption I am making is that you can’t steal extents out of order.

    Any short text description is going to be a little flawed, but imagine the current segment having six extents, and the small transaction started in segment three. So we have:

    1, 2, 3x, 4, 5, 6

    As time passes, and assuming the transactions in extents 1 and 2 have committed, new transactions can keep on going around the ring of extents – at some point we could imagine the segment looking like this:

    3x, 4, 5, 6, 1, 2

    The segment hasn’t extended, and extents 1 and 2 haven’t really moved, but we can imagine that the segment starts at extent 3.

    My working hypothesis is that even if every transaction that started in extent 4 committed a long time ago, if Oracle needs to add a new extent to the end of the segment it cannot “steal” an extent out of order – so it cannot rearrange the extents in the order:

    3x, 5, 6, 1, 2, 4

    So every extent from 3 onwards is untouchable until the transaction in extent 3 commits.

    Comment by Jonathan Lewis — October 10, 2009 @ 12:27 pm BST Oct 10,2009 | Reply

  7. [...] Jonathan Lewis- Undone [...]

    Pingback by Blogroll Report 02/10/2009-09/10/2009 « Coskan’s Approach to Oracle — October 21, 2009 @ 11:29 pm BST Oct 21,2009 | Reply


RSS feed for comments on this post. TrackBack URI

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Theme: Rubric. Get a free blog at WordPress.com

Follow

Get every new post delivered to your Inbox.

Join 4,268 other followers