Here’s a little oddity that I came across at a client site recently.
The client called me in because they were having problems with Oracle error “ORA-01555: snapshot too old” appearing in a particular task after a few thousand seconds (typically 5,000 to 6,000) even though they had set the undo_retention to 14,400 seconds and had a huge undo tablespace running with autoextend enabled on the data files.
While investigating their problem, it occurred to me that the fact that they were using a 32KB block size might be contributing to the issue. It was – although the contribution was relatively insignificant in this particular case – and it’s worth knowing that there are cases where this specific block size for the undo tablespace could be responsible for generating about 30% more undo block writes than you would otherwise expect.
The problem is similar to the problem of row migration and ASSM – there are some hard-coded limits in what you can put into a single Oracle block, and those limits may not be suitable for the larger Oracle block sizes.
In this case it’s the catalogue in the block header of the undo records that appear in the undo block that is limited to 255 entries. (Note – I’m not talking about the undo segment header block here, I’m talking about the header that appears in every other undo block.)
Here’s a partial block dump of an undo block to show you what I mean; in 10g you automatically get the raw block dump and the symbolic block dump at the same time:
The items to examine are the cnt: and irb:, which report the total number of records in the block, and (I think, but can’t be certain) the number of records which could be available for rolling back. (I don’t think the latter assumption is entirely correct – unless I’ve also discovered a bug in the way the irb: is handled – but the irb: does differ from the cnt: in cases where a rollback has taken place).
705A200 0000A202 0080002A 03B3540D 00050000 [....*....T......] 705A210 00000000 00140003 0000D1CA 151A1A6D [............m...] 705A220 1FE80000 1E981F40 1D8C1E30 1C3C1CE4 [....@...0.....<.]
xid: 0x0003.014.0000d1ca seq: 0x1a6d cnt: 0x1a irb: 0x15 icl: 0x0 flg: 0x0000 Rec Offset Rec Offset Rec Offset Rec Offset Rec Offset --------------------------------------------------------------------------- 0x01 0x1f40 0x02 0x1e98 0x03 0x1e30 0x04 0x1d8c 0x05 0x1ce4 0x06 0x1c3c 0x07 0x1a7c 0x08 0x1a14 0x09 0x19c4 0x0a 0x1968 0x0b 0x1918 0x0c 0x18b0 0x0d 0x1830 0x0e 0x17c4 0x0f 0x1748 0x10 0x16b4 0x11 0x1634 0x12 0x15bc 0x13 0x153c 0x14 0x14c4 0x15 0x1158 0x16 0x0e2c 0x17 0x0b00 0x18 0x07d4 0x19 0x04a8 0x1a 0x017c
Notice that the cnt: is 0x1a, and the irb: is 0x15 in this extract. Then check the second line of the raw dump where the last string of digits is “151A1A6D”. This shows you that the two counts are limited to a single byte – which tells you that there is a limit of 255 undo records in an undo block.
Since an undo record could easily be in the region of 80 bytes or less (see record 0x08 above, for example), an update that makes a large number of small changes to the data could end up packing 255 records of 80 bytes into an undo block and then have to switch to a new block after using 255 * 80 = 21,930 bytes out of 32KB – a waste of about 30% of the available space.
This was exactly what the client was doing, with a very repetitive “select for update” that was locking a large number of scattered rows, then clearing them with a commit after a few updates.
So, when you start to think about choosing larger Oracle block sizes, remember that a 32KB undo block size can be a souce of wasted I/O.
This waste wasn’t the primary cause of the client’s problem – which was basically the volume of undo generated. I fixed their underlying problem by giving them a code fix that locked a few rows on each pass instead of several hundred thousand rows. I also discovered that the query that crashed with the ORA-01555 had a tendency to change execution path randomly (and catastrophically) because of a histogram collection problem, and suggested a short script to create a fixed representative histogram that would stop that problem re-appearing.