Here’s a little oddity that I came across at a client site recently.
The client called me in because they were having problems with Oracle error “ORA-01555: snapshot too old” appearing in a particular task after a few thousand seconds (typically 5,000 to 6,000) even though they had set the undo_retention to 14,400 seconds and had a huge undo tablespace running with autoextend enabled on the data files.
While investigating their problem, it occurred to me that the fact that they were using a 32KB block size might be contributing to the issue. It was – although the contribution was relatively insignificant in this particular case – and it’s worth knowing that there are cases where this specific block size for the undo tablespace could be responsible for generating about 30% more undo block writes than you would otherwise expect.
The problem is similar to the problem of row migration and ASSM – there are some hard-coded limits in what you can put into a single Oracle block, and those limits may not be suitable for the larger Oracle block sizes.
In this case it’s the catalogue in the block header of the undo records that appear in the undo block that is limited to 255 entries. (Note – I’m not talking about the undo segment header block here, I’m talking about the header that appears in every other undo block.)
Here’s a partial block dump of an undo block to show you what I mean; in 10g you automatically get the raw block dump and the symbolic block dump at the same time:
The items to examine are the cnt: and irb:, which report the total number of records in the block, and (I think, but can’t be certain) the number of records which could be available for rolling back. (I don’t think the latter assumption is entirely correct – unless I’ve also discovered a bug in the way the irb: is handled – but the irb: does differ from the cnt: in cases where a rollback has taken place).
Raw
705A200 0000A202 0080002A 03B3540D 00050000 [....*....T......] 705A210 00000000 00140003 0000D1CA 151A1A6D [............m...] 705A220 1FE80000 1E981F40 1D8C1E30 1C3C1CE4 [....@...0.....<.]
Symbolic
xid: 0x0003.014.0000d1ca seq: 0x1a6d cnt: 0x1a irb: 0x15 icl: 0x0 flg: 0x0000 Rec Offset Rec Offset Rec Offset Rec Offset Rec Offset --------------------------------------------------------------------------- 0x01 0x1f40 0x02 0x1e98 0x03 0x1e30 0x04 0x1d8c 0x05 0x1ce4 0x06 0x1c3c 0x07 0x1a7c 0x08 0x1a14 0x09 0x19c4 0x0a 0x1968 0x0b 0x1918 0x0c 0x18b0 0x0d 0x1830 0x0e 0x17c4 0x0f 0x1748 0x10 0x16b4 0x11 0x1634 0x12 0x15bc 0x13 0x153c 0x14 0x14c4 0x15 0x1158 0x16 0x0e2c 0x17 0x0b00 0x18 0x07d4 0x19 0x04a8 0x1a 0x017c
Notice that the cnt: is 0x1a, and the irb: is 0x15 in this extract. Then check the second line of the raw dump where the last string of digits is “151A1A6D”. This shows you that the two counts are limited to a single byte – which tells you that there is a limit of 255 undo records in an undo block.
Since an undo record could easily be in the region of 80 bytes or less (see record 0x08 above, for example), an update that makes a large number of small changes to the data could end up packing 255 records of 80 bytes into an undo block and then have to switch to a new block after using 255 * 80 = 21,930 bytes out of 32KB – a waste of about 30% of the available space.
This was exactly what the client was doing, with a very repetitive “select for update” that was locking a large number of scattered rows, then clearing them with a commit after a few updates.
So, when you start to think about choosing larger Oracle block sizes, remember that a 32KB undo block size can be a souce of wasted I/O.
Footnote:
This waste wasn’t the primary cause of the client’s problem – which was basically the volume of undo generated. I fixed their underlying problem by giving them a code fix that locked a few rows on each pass instead of several hundred thousand rows. I also discovered that the query that crashed with the ORA-01555 had a tendency to change execution path randomly (and catastrophically) because of a histogram collection problem, and suggested a short script to create a fixed representative histogram that would stop that problem re-appearing.
Jonathan.
Very interesting investigation.
Comment by Dion Cho — March 23, 2009 @ 6:29 am GMT Mar 23,2009 |
Hi Jonathan
FWIW, my understanding of “irb” is the following: “Index of the first record to be processed in case of a rollback.” In other words, the last in the undo chain.
Cheers,
Chris
Comment by Christian Antognini — March 23, 2009 @ 8:59 am GMT Mar 23,2009 |
Hi Jonathan,
If you have time, can you please explain how did you come to the conclusion on this statement
“Notice that the cnt: is 0×1a, and the irb: is 0×15 in this extract. Then check the second line of the raw dump where the last string of digits is “151A1A6D”. This shows you that the two counts are limited to a single byte – which tells you that there is a limit of 255 undo records in an undo block.”
I am lost during the math :(
Comment by coskan — March 23, 2009 @ 10:22 am GMT Mar 23,2009 |
@coskan
I believe the explanation is that the string contains 1A and 15, showing that the data is stored in just one byte. One byte can only have 255 different values. So as this is essentially in index to records, there has to be a limit of 255 records. I don’t understand how one knows that those two values are mapped into that place in memory. Maybe it’s possible to know or maybe it’s one thing Jonathan has found out through looking at a lot of blockdumps.
Comment by Mathias Magnusson — March 23, 2009 @ 12:50 pm GMT Mar 23,2009 |
Christian,
Thanks for the information – that explanation fits the anomaly I thought I was seeing.
Mathias,
your explanation is spot on.
Coskan,
If you look at the first line of the symbolic dump, you can see:
And in the second line of the raw dump you get:
Comparing figures carefully, you can pick out (with byte and word swapping) from the raw dump the xid, followed by the irb, cnt and seq in that order – and there’s only one byte each of the irb and cnt, hence 255 max.
As Mathias says, you just do a lot of block dumps – and then you search the raw dump for values you can see in the symbolic dump.
Comment by Jonathan Lewis — March 23, 2009 @ 5:11 pm GMT Mar 23,2009 |
Jonathan, I think you’re saying if db_block_size is 32k, an undo header block can have at most 255 entries. That’s also the max number of transaction slots a non-system rollback segment on this database can have. Correct? If I get it correct, that max number is 99, 48, 22, for db_block_size of 16k, 8k, 4k, respectively. Isn’t the wasted space in each block actually higher in percentage for smaller blocks? For example, for 8k, it’s (8*1024-48*80)/(8*1024)=53%.
According to
http://www.ubtools.com/jira/browse/QA-27
“irb points the first undo record in undo block to begin rollback.”
Comment by Yong Huang — March 24, 2009 @ 1:53 am GMT Mar 24,2009 |
I misread. You got the limit 255 because that’s how much information one byte can contain. In fact, an undo segment whose block size is 32k probably can’t hold that many transaction slots (202?), so that’s unrelated to your theory.
Comment by Yong Huang — March 24, 2009 @ 2:23 am GMT Mar 24,2009 |
Yong, be careful, Jonathan is not talking about the “undo header” block but the header of the “undo block”. The limit he mentioned is, therefore, the number of undo records in an “undo block”. The number of transactions that a undo segment (and, therefore, its “undo header” block) can support is, AFAIK, not limited to a specific value. But, instead, it varies depending on the block size. In fact, an “undo header” block is always completely used.
HTH
Chris
Comment by Christian Antognini — March 24, 2009 @ 1:21 pm GMT Mar 24,2009 |
Yong Huang,
I hope Christian’s comments have clarified the issue. I have added a note to the posting to point out that I am NOT talking about undo segment header blocks.
I think your value for the number of slots in the transaction table of a 32KB block is about right – I’d have said 205, but not very confidently without checking. But this is not relevant to the number of records you can get in the undo block. In principle, if you get an active transaction in every slot in the transaction table, then each transaction would get be using its own undo block, and each transaction would be limited to a maximum of 255 records in that block.
Comment by Jonathan Lewis — March 24, 2009 @ 8:00 pm GMT Mar 24,2009 |
Christian,
Thinking about the use of irb: would you agree that the only time that irb: and cnt: would be different would be when the most recent transaction to use an undo block had been rolled back; in which case the irb: would point to the last record of the previous (necessarily committed) transaction that used the block.
The special case (which I had viewed as an anomaly under my erroneous assumption) is when a transaction is the only transaction to have used a block. After such a trasnaction has rolled back irb: seems to be set to 1, rather than the notional 0 that you might expect.
Comment by Jonathan Lewis — March 24, 2009 @ 8:04 pm GMT Mar 24,2009 |
255(1byte) seems to be prevailing limit in Oracle.
ITL, bucket# of histogram and here in undo block record.
Comment by Dion Cho — March 25, 2009 @ 7:22 am GMT Mar 25,2009 |
Dion Cho,
I think frequency of the 255 limit is probably a hang-over from the days when blocks were small, memory was expensive, and a single byte saved was a good thing. There was even a brief period in 7.2 where the developers of the Parallel Query code decided that no-one could possibly write an SQL statement that would generate more than 255 bind variables when decomposed.
Comment by Jonathan Lewis — April 15, 2009 @ 9:47 am BST Apr 15,2009 |
this is very interesting story but very useful story
So again the env is so dynamic anywhere.
Comment by Kwanyoung — March 30, 2009 @ 9:27 pm BST Mar 30,2009 |
[…] Filed under: Uncategorized — Jonathan Lewis @ 9:41 pm UTC Mar 30,2009 In my last post on larger block sizes, I pointed out a potential issue relating to space wastage and increased I/O with undo tablespaces […]
Pingback by Undo extremes « Oracle Scratchpad — March 30, 2009 @ 9:41 pm BST Mar 30,2009 |
Hi Jonathan
I almost forgot to followup…
> would you agree that the only time that irb: and cnt: would be different would
> be when the most recent transaction to use an undo block had been rolled back
Basically yes. The only other case I can think of (which is very similar to the one you described…) is when a transaction is only partially rolled back to a savepoint and, after that, commited.
Best,
Chris
Comment by Christian Antognini — May 7, 2009 @ 4:50 pm BST May 7,2009 |
[…] https://jonathanlewis.wordpress.com/2009/03/22/block-size-again/ “ORA-01555: snapshot too old” caused by large block size […]
Pingback by Faulty Quotes 5 – Block Sizes « Charles Hooper's Oracle Notes — January 31, 2010 @ 6:04 am GMT Jan 31,2010 |
[…] your undo tablespace with a 32KB block size you might test to see if the change would produce any surprise side-effects); or you might wonder if anything funny could happen to the redo generation if you created all […]
Pingback by Assumptions | Oracle Scratchpad — July 9, 2019 @ 11:47 am BST Jul 9,2019 |