Oracle Scratchpad

May 21, 2009

Row Directory

Filed under: Block Size,Infrastructure,Oracle,Troubleshooting — Jonathan Lewis @ 8:52 am BST May 21,2009

Yesterday I asked the question: how many row entries can you create in an 8KB block with pctfree 0. It brought out some interesting observations – including one I hadn’t thought of relating to a small difference between ASSM and freelist management.

I suspect, however, that most people didn’t quite realise the significance of the wording: “row entries”– I  wasn’t asking about “rows”. So here’s a little script you can run after you create the table, followed by the result of dumping the first block in the table.

rem
rem     Script:         row_directory_01.sql
rem     Author:         Jonathan Lewis
rem     Dated:          May 2009
rem

begin
	for i in 1..2048 loop
		insert into t1 values(null);
		delete from t1;
	end loop;
end;
/

Block header dump:  0x0140020a
 Object id on Block? Y
 seg/obj: 0x1749e  csc: 0x00.3df6da7  itc: 2  flg: O  typ: 1 - DATA
     fsl: 0  fnx: 0x0 ver: 0x01

 Itl           Xid                  Uba         Flag  Lck        Scn/Fsc
0x01   0x0006.021.0000c0bd  0x00801d04.1966.5d  ----  2017  fsc 0x0007.00000000
0x02   0x0000.000.00000000  0x00000000.0000.00  ----    0  fsc 0x0000.00000000

data_block_dump,data header at 0x7f1a25c
===============
tsiz: 0x1fa0
hsiz: 0xfd4
pbl: 0x07f1a25c
bdba: 0x0140020a
     76543210
flag=--------
ntab=1
nrow=2017
frre=-1
fsbo=0xfd4
fseo=0xfdb
avsp=0x3
tosp=0xfcc
0xe:pti[0]	nrow=2017	offs=0
0x12:pri[0]	offs=0x1f9e
0x14:pri[1]	offs=0x1f9c
 ...
0xfd0:pri[2015]	offs=0xfe0
0xfd2:pri[2016]	offs=0xfdb
block_row_dump:
tab 0, row 0, @0x1f9e
tl: 2 fb: --HDFL-- lb: 0x1
tab 0, row 1, @0x1f9c
tl: 2 fb: --HDFL-- lb: 0x1
tab 0, row 2, @0x1f9a
tl: 2 fb: --HDFL-- lb: 0x1
 ...
tab 0, row 2015, @0xfe0
tl: 2 fb: --HDFL-- lb: 0x1
tab 0, row 2016, @0xfdb
tl: 2 fb: --HDFL-- lb: 0x1
end_of_block_dump

I’ve trimmed out most of the block dump – but as you can see there are no rows in the block – but there are 2,017 row entries in the row directory.

Each time I insert a row, I need a row entry in the directory; but when I delete a row I can’t re-use that entry until my delete is commited – so I have to keep adding new entries for every row in my loop.

As I delete a row it is first marked as deleted (the “flag” byte gets the D bit set) and eventually, when the block needs to be tidied and re-packed to make available space usable, the deleted rows are trimmed back to a stub of just two bytes (the flag byte and the lock byte).

In passing, there is a new statistic in 10g which I believe records this packing process so that you can see how often it happens, this is the: “heap block compress”. In a presentation at Collaborate 09, Carl Dudley pointed out that 11g does its “real-time”  advanced compression only at this moment – so perhaps that’s why the statistic has recently appeared.

Consequences (1):

You may say, of course, that this is a pointless demonstration that bears no resemblance to real life. But you would be forgetting Murphy’s law, which (in one of its polite forms) says: “anything that can go wrong will go wrong”.

The title of “lunchtime quiz” came about because I was giving a seminar in Zagreb, Croatia, and one of the attendees mentioned that the data warehouse tool they used did exactly this type of thing – and crashed as a consequence . So, in the lunch break, I ran a quick test and posted the note – this was a genuine production problem. Metalink notes: 465226.1,  603480.1, and bug 4260477 are relevant.

The problem in the metalink notes is this: you can get 2,017 row entries in a 8KB block and you can get 4,065 entries in a 16KB block – so you have space to get roughly 8,160 entries in a 32KB block … but there’s a hard limit of 4,095 to the field (12 bits) in the ITL where the number of rows locked by a single transaction is recorded. (Testing it with a 32KB block on 11.2.0.2 the limit seemed to be 4,094 rather than the 4,095 given in the bug notes.)

I don’t have an older copy of Oracle that will create 32KB blocks in my sandbox so I can’t test exactly what happens when you pass the limit, but there are two possibilities I can think of:  (a) your session crashes, or (b) your session corrupts the block, and some other processes crashes later on trying to read it.

Oracle’s initial response to the problem was to raise Oracle error ORA-08007 to stop you from overshooting the limit. (This, of course, crashed the user’s data warehouse build process); the latest fix is to limit the number of rowid entries (potentially “wasting” space from the block) to protect the ITL limit.

The workaround on older systems is to move critical tables into a tablespace using a smaller block size – 32KB is the problem. But if you’ve created your database on a 32KB block size then the block size for the temporary tablespace has to be 32KB – and the data warehouse builder was doing this with global temporary tables. Workaround (1): don’t use global temporary tables (!), workaround (2): create a database with a 16KB block size and use transportable tablespaces to move the data across to it.

Consequences (2):

There is another side effect of this large row directory – but one that surely won’t be seen (Murphy willing).

Nobody, surely, is going to write code that does this type of insert/delete cycle on a table with bitmap indexes – the consequences could be dire in terms of undo, and redo (even in 10g where the index explosion problem has been addressed).

But what if they load the table with code like this, then try to build some bitmap indexes.  This is what happened on my 8KB block example when I didn’t delete the last row in the block:

create bitmap index t1_b1 on t1(n1)
                             *
ERROR at line 1:
ORA-28604: table too fragmented to build bitmap index (20971786,2016,744)

When building bitmap indexes, the codes assumes that the number of rows in a block cannot exceed roughly blocksize/11 rows. (The 11 comes from assuming all rows are migrated, then each row requires: 2 bytes row directory, one byte each for column count, flags, and lock byte, and 6 bytes for the forwarding rowid – see “Practical Oracle 8i”). The error message is telling me that I have 2017 rows in a block when the Hakan Factor (maximum expected) is 744.

Fortunately there is a workaround for this (though I haven’t tested it for side effects) – you can set the maximum to a value higher than the default using the standard mechanism for optimising bitmap index storage:

SQL> alter table t1 minimize records_per_block;

Table altered.

SQL> create bitmap index t1_b1 on t1(n1);

Index created.

There are a couple more interesting points I could make about blocks, rows, deletions, and the row directory (compression is an interesting topic) – but it’s a lovely morning in Zagreb, so I’m off to see a few of the sights before I fly home.

25 Comments »

  1. I (comment 17) humbly realized before of this post… after Joel Garry’s hint

    Comment by Roberto — May 21, 2009 @ 9:15 am BST May 21,2009 | Reply

  2. So the recap:

    – size of each pointer in the row directory: 2 bytes
    – minimum size of the “row”: 2 bytes (“flag” byte + “lock” byte)

    Hence 4 bytes total for each row entry.

    By the way, I find this kind of demonstrations very useful; they are excellent for learning (nothing is better than the simplest possible example to understand an aspect of a topic, in this case the “aspect” being the structure of the row overhead). I do not care too much whether they occur frequently or not in real life, they are valuable anyway.

    Comment by Alberto Dell'Era — May 21, 2009 @ 12:24 pm BST May 21,2009 | Reply

    • Alberto,
      The total space required for:

      a deleted row is 4 bytes – as you describe
      a current row is 5 bytes – one extra byte for the column count – which will be zero for a completely null row
      a migrated row is 11 bytes – as above (with column count = 0) but including the rowid of where the row has gone

      Comment by Jonathan Lewis — May 21, 2009 @ 1:31 pm BST May 21,2009 | Reply

    • It exists a minimum size:

      minimum size of the “row” = max(UB1 * 3 + UB4 + SB2, ROW_LEN) = max(9, 2) = 9

      It seems to work fine

      Comment by Roberto — May 21, 2009 @ 1:50 pm BST May 21,2009 | Reply

  3. I’ve got 2,015 row entries instead of 2,017. You said you can get 2,017 row entries in 8KB block. Is that upper bound value ?

    Block header dump:  0x0080000c
     Object id on Block? Y
     seg/obj: 0xdf01  csc: 0x00.39d9b7  itc: 2  flg: E  typ: 1 - DATA
         brn: 0  bdba: 0x800009 ver: 0x01 opc: 0
         inc: 0  exflg: 0
     
     Itl           Xid                  Uba         Flag  Lck        Scn/Fsc
    0x01   0x0010.010.000000d2  0x02000a04.0049.12  ----  2015  fsc 0x0007.00000000
    0x02   0x0000.000.00000000  0x00000000.0000.00  ----    0  fsc 0x0000.00000000
     
    data_block_dump,data header at 0x7604e64
    ===============
    tsiz: 0x1f98
    hsiz: 0xfd0
    pbl: 0x07604e64
    bdba: 0x0080000c
         76543210
    flag=--------
    ntab=1
    nrow=2015
    frre=-1
    fsbo=0xfd0
    fseo=0xfd9
    avsp=0x3
    

    Comment by Bundit — May 22, 2009 @ 10:41 am BST May 22,2009 | Reply

    • Bundit,

      I’ll take a guess that your tablespace is using ASSM (automatic segment space management). There were a couple of comments in the “Lunchtime quiz” about similar differences in results due to a few extra bytes being taken up in the block header.

      Comment by Jonathan Lewis — May 22, 2009 @ 7:59 pm BST May 22,2009 | Reply

  4. […] byte, the flag byte, the length byte(s), or the two byte pointer to each row in the block’s row directory – but it looks as if this information is included as a step in the final extent […]

    Pingback by Index Size « Oracle Scratchpad — May 22, 2009 @ 7:53 pm BST May 22,2009 | Reply

  5. Hi,

    >a current row is 5 bytes – one extra byte for the column count – …
    How about case when column count>256 ? Table can have up to 1000 column.

    Comment by Dmitry — May 24, 2009 @ 10:56 pm BST May 24,2009 | Reply

    • Dmitry,
      When a row exceeds 255 columns, it is stored as several row-pieces in the same block. Each row piece is up to 255 columns long, and there is an extra “nrowid” entry in each row linking from one row piece to the next. But in just the same way that Oracle doesn’t store trailing nulls, if you’ve only filled columns in the first row piece Oracle doesn’t attempt to store the later row pieces.

      Comment by Jonathan Lewis — May 26, 2009 @ 11:10 am BST May 26,2009 | Reply

  6. I remember the “ORA-28604: table too fragmented to build bitmap index” error being raised in a load process many years ago.

    As I recall, though I can’t remember for sure, it was caused by use of “alter table t1 minimize records_per_block” followed by a partition exchange from a table with a higher Hakan factor than the partitioned table.

    Comment by David Aldridge — June 2, 2009 @ 10:30 am BST Jun 2,2009 | Reply

  7. […] quiz non attirò molto la mia attenzione, ma la soluzione, forse per il fascino dell’occulto che si cela dietro l’esplorazione dei meccanismi […]

    Pingback by Struttura interna di un blocco dati Oracle « Oracle and other — June 10, 2009 @ 10:03 am BST Jun 10,2009 | Reply

  8. Does any one know in Oracle 10g when table is compressed and in a block there are duplicate values for one column what is the size of the reference pointer size?
    Ex: if i have column value “test” (4 bytes ) repeated for the whole block. what is the size if the pointer?

    My concern is if the pointer size is 4bytes or 8 bytes then compression may not be helping me..

    Thanks
    Pramode

    Comment by Pramode — July 17, 2009 @ 7:37 pm BST Jul 17,2009 | Reply

    • You need to do some simple testing. Create a table with a single column of varchar2(12), say, insert 1,000 rows with the same value and dump a block; then compress the table, dump a block, and compare the two dumps.

      Here’s an extract from a dump of such a compressed table:

      tab 1, row 702, @0x11be
      tl: 5 fb: --H-FL-- lb: 0x0  cc: 1
      col  0: [12]  78 78 78 78 78 78 78 78 78 78 78 78
      bindmp: 2c 00 01 01 00
      tab 1, row 703, @0x11b9
      tl: 5 fb: --H-FL-- lb: 0x0  cc: 1
      col  0: [12]  78 78 78 78 78 78 78 78 78 78 78 78
      bindmp: 2c 00 01 01 00
      tab 1, row 704, @0x11b4
      tl: 5 fb: --H-FL-- lb: 0x0  cc: 1
      col  0: [12]  78 78 78 78 78 78 78 78 78 78 78 78
      bindmp: 2c 00 01 01 00
      

      Note that the row lengths are 5 bytes (tl: 5), or substract two adjacent addresses.Since there are three bytes of row overhead (flags, lock byte, column count) that means only two bytes have been used for the column value – so possibly one byte to flag that the next byte is a token, and one byte for the “token pointer”.

      You stil have to do a few more tests, of course but at first sight it looks like a repetitive four-byte string can give you some space benefit on compression.

      Further tests: if a “token pointer” is one byte does that mean a block can hold only 255 different token values ? Maybe that one byte can turn to two bytes.

      What happens if two adjacent columns have very repetitive combinations – you get one token to represent two columns, does that change anything else.

      Comment by Jonathan Lewis — July 20, 2009 @ 12:46 pm BST Jul 20,2009 | Reply

  9. […] Some symbolic block dumps (table and undo particularly) will list the row directory, others (e.g. index) will not. There are variations on a theme – blocks from heap tables and clusters also contain a “table directory”, but this will contain only one entry in the case of a simple heap table; blocks from a compressed index include a “prefix directory”; blocks from a compressed table include a “token directory”. In all cases the concept is the same: a list of two-byte entries pointing to a piece of useful data. (See also: this blog entry) […]

    Pingback by Glossary « Oracle Scratchpad — September 12, 2009 @ 7:10 pm BST Sep 12,2009 | Reply

  10. […] The answer is here, but do read the comments on this post before you look at it. Comments […]

    Pingback by Lunchtime Quiz « Oracle Scratchpad — September 26, 2009 @ 3:26 am BST Sep 26,2009 | Reply

  11. […] 4095 as the “highest row in block” value to calculate the high rowid. (There’s an interesting consequence of this hard limit that could cause surprises in data warehouses and other system (e.g. Peoplesoft) […]

    Pingback by Pseudo-parallel « Oracle Scratchpad — January 3, 2010 @ 11:52 am GMT Jan 3,2010 | Reply

  12. […] until after the commit; however tables and indexes are handled differently. Tables only need to keep a “stub” to represent the row that has been marked for deletion, but indexes have to keep the entire index […]

    Pingback by Index too big « Oracle Scratchpad — March 26, 2010 @ 10:42 pm GMT Mar 26,2010 | Reply

  13. […] in case a process rolled back the delete. (This tied back to a note I had written showing that the row directory in a table block could become much larger than you might think […]

    Pingback by heap block compress « Oracle Scratchpad — March 30, 2010 @ 7:25 pm BST Mar 30,2010 | Reply

  14. […] other “magic number” of 4,096 is there because that’s the maximum number of row directory entries you can have in a single […]

    Pingback by Row count « Oracle Scratchpad — April 17, 2010 @ 11:13 am BST Apr 17,2010 | Reply

  15. […] first extract is from the block dump of the first block after the update and shows the row directory and the start of each row in the “row heap” (that’s a term I’ve just […]

    Pingback by Row count 2 « Oracle Scratchpad — May 5, 2010 @ 7:43 pm BST May 5,2010 | Reply

  16. […] is to tell me which entry in the block’s “row directory” I occupy, and since the row directory can hold 4,096 entries that takes two bytes. So – from my perspective – the cost of my […]

    Pingback by Rowid « Oracle Scratchpad — May 9, 2010 @ 7:48 pm BST May 9,2010 | Reply

  17. […] order – these lines in an index leaf block dump show Oracle walking through the block’s “row directory”; the number in square brackets following the row number is the offset into the block where the […]

    Pingback by Index Splits – 2 | Oracle Scratchpad — October 30, 2018 @ 1:30 pm GMT Oct 30,2018 | Reply

  18. […] a handful of bytes of free space) because even though each row will require only 5 bytes (2 bytes row directory entry, 3 bytes row overhead, no bytes for data) Oracle’s arithmetic will allow for the 11 […]

    Pingback by Row Migration | Oracle Scratchpad — December 3, 2018 @ 4:27 pm GMT Dec 3,2018 | Reply

  19. […] SCN; that’s consistent with 10 rows migrating into the block in a single transaction. In the row directory you can see the block holds 10 rows, and in the body of the block you can see the header for each […]

    Pingback by Migrated rows | Oracle Scratchpad — November 28, 2022 @ 10:59 am GMT Nov 28,2022 | Reply

  20. […] they arrive at a leaf block and they only appear to be in order in the dump because the leaf block “row directory” is maintained in real time to report them in order. Nevertheless, the rowids for a given key in a […]

    Pingback by Global Rowids | Oracle Scratchpad — February 9, 2023 @ 4:39 pm GMT Feb 9,2023 | Reply


RSS feed for comments on this post. TrackBack URI

Comments and related questions are welcome.

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Website Powered by WordPress.com.