Here’s a little problem that was thrown at me a little while ago during a visit to a client of mine. It only took about five minutes to fix, but it seemed like a nice demonstration of trouble-shooting so I thought I’d write it up for the blog.
We start with the moment when one of the developers told me that they were seeing lots of session failing with the error “unable to extend segment in undo tablespace”.
The obvious thing to do was a quick check for any transactions using a lot of undo space – so here’s a little query to supply the session address and number of undo blocks used for active transactions:
SQL> select ses_addr, used_ublk from v$transaction; SES_ADDR USED_UBLK ---------------- ---------- 0700000108372710 1 07000001093A90C0 1 070000010836D170 1 070000010F37C198 1 070000010A3C6C58 1 070000010F3897A8 1
So, none of the active transaction seems to be taking any undo space. I checked v$lock for locks of type ‘TX’, and v$session for rows with non-null taddr values just in case there were some hidden transactions – but the six I’ve shown were all there were.
So let’s take a quick look at the undo undo segments themselves to see if there’s anything of note there:
select segment_name, blocks from dba_segments where tablespace_name = 'UNDO' order by blocks ... _SYSSMU42$ 10,632 _SYSSMU35$ 10,712 _SYSSMU33$ 11,676 _SYSSMU37$ 11,756 _SYSSMU38$ 12,016 _SYSSMU39$ 150,756 _SYSSMU12$ 361,800 _SYSSMU4$ 2,410,568 171 rows selected.
Most of the segments were tiny – but there was one which had taken most of the space in the tablespace. (We were on 16KB blocks, so that’s about 37 GB for an undo segment! Surely it should have been freed if no-one was using the space).
Clearly we need a little more detail about those tiny transactions – so here’s a query that picked up some of the session details along with the transaction start time:
select sid, serial#, username, osuser, to_char(logon_time,'dd-mon hh24:mi:ss') connected, to_char(start_date,'dd-mon hh24:mi:ss') started from v$transaction tr, v$session se where se.saddr = tr.ses_addr order by started ; SID SERIAL# USERNAME OSUSER CONNECTED STARTED ----- --------- ---------------- --------------- --------------- --------------- 662 925 P_READONLY wilesc 21-jul 15:40:43 23-jul 12:13:02 379 342 P_READONLY greena 29-jul 10:17:49 29-jul 11:24:03 353 1482 P_READONLY edwards 29-jul 11:32:18 29-jul 11:32:19 363 7972 P_READONLY edwards 29-jul 17:53:27 29-jul 17:56:47 596 5553 DEV_SUPP smithj 24-jul 09:00:36 30-jul 17:26:03
The most recent transaction started on 30th July, but there are three transactions that started “yesterday”, and one that started a week earlier. Clearly none of these transactions has done much – but they’ve been hanging round for a long time.
Simply by existing, they’ve been holding the “start point” of an undo segment, and as time has passed more and more space has gradually been migrated into these undo segments as Oracle chose them for other transactions. And any space that got into one of those undo segments cannot be transferred until the holding transaction completes.
Just to clarify this point – let’s imagine the history of _SYSSMU4$, the undo segment that is currently at 37GB. This is the undo segment that session 662 picked. When it started, the transaction could have been somewhere in the middle of the list of extents currently allocated to the undo segment.
As time passes, other transactions come and go, and gradually they allocate space to extend this undo segment. At the same time extents that had been allocated before session 662 started its transaction will pass the undo_retention_time and be released. But that one tiny transaction keeps its extent “locked” into the segment, and stops Oracle from dropping any extents added later. Inevitably the segment grows until it has consumed all the available space in the undo tablespace.
Having identified the cause of the problem, I tracked down the users found out what they were doing, got a coupe of them to disconnect and killed the rest of the sessions to clear the problem.
Footnote: despite the schema names, this was a development copy of the production database, so I didn’t have any qualms about killing sessions that were causing problems.