I received an email recently asking me if I knew how Oracle found specific rows and columns in a compression unit. This is a topic that I’ve spoken about a couple of times, and I’ve published * several notes on the blog* about it, including

*from one of my presentations, and I was expecting to find some notes somewhere about Oracle catalogues all the bits and pieces.*

**an image of a critical slide***lists some of the detail but I was slightly surprised to discover that it made the comment: “*

**Part 4 of this series***and somewhere in the CU there has to be a set of pointers to the first byte for each compressed column”*– and this was the bit that my interrogator wanted to know. This posting is the answer – and I’ll start with a dump of the first block of a CU:

block_row_dump: tab 0, row 0, @0x30 tl: 8016 fb: --H-F--N lb: 0x0 cc: 1 nrid: 0x01c06484.0 col 0: [8004] Compression level: 03 (Archive Low) Length of CU row: 8004 kdzhrh: ------PC CBLK: 11 Start Slot: 00 NUMP: 11 PNUM: 00 POFF: 7884 PRID: 0x01c06484.0 PNUM: 01 POFF: 15900 PRID: 0x01c06485.0 PNUM: 02 POFF: 23916 PRID: 0x01c06486.0 PNUM: 03 POFF: 31932 PRID: 0x01c06487.0 PNUM: 04 POFF: 39948 PRID: 0x01c06488.0 PNUM: 05 POFF: 47964 PRID: 0x01c06489.0 PNUM: 06 POFF: 55980 PRID: 0x01c0648a.0 PNUM: 07 POFF: 63996 PRID: 0x01c0648b.0 PNUM: 08 POFF: 72012 PRID: 0x01c0648c.0 PNUM: 09 POFF: 80028 PRID: 0x01c0648d.0 PNUM: 10 POFF: 88044 PRID: 0x01c0648e.0 CU header: CU version: 0 CU magic number: 0x4b445a30 CU checksum: 0x5c173142 CU total length: 93755 CU flags: NC-U-CRD-OP ncols: 7 nrows: 4361 algo: 0 CU decomp length: 93159 len/value length: 309310 row pieces per row: 1 num deleted rows: 0 START_CU: 00 00 1f 44 1f 0b 00 00 00 0b 00 00 1e cc 01 c0 64 84 00 00 00 00 3e 1c 01 c0 64 85 00 00 00 00 5d 6c 01 c0 64 86 00 00 00 00 7c bc 01 c0 64 87 00 00 00 00 9c 0c 01 c0 64 88 00 00 00 00 bb 5c 01 c0 64 89 00 00 00 00 da ac 01 c0 64 8a 00 00 00 00 f9 fc 01 c0 64 8b 00 00 00 01 19 4c 01 c0 64 8c 00 00 00 01 38 9c 01 c0 64 8d 00 00 00 01 57 ec 01 c0 64 8e 00 00 00 4b 44 5a 30 42 31 17 5c 00 01 6e 3b eb 06 00 07 11 09 00 04 b8 3e 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ... ... 20 repetitions of the previous line of zeros deleted ... This is mostly the bitmap identifying deleted rows. ... nrows=4361, so this will be about 4261, so this will be ... about 4361/8 = 545 bytes of zeros. ... 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 02 00 00 02 54 00 00 30 bc 00 00 4a 82 00 00 64 37 00 00 a6 b8 00 00 e9 39 00 01 2b ba e6 c2 eb 8b 00 00 2e 68 01 40 01 02 00 00 53 f0 78 9c a5 dc 07 57 23 5b 96 25 e0 ee 24

If you look at the symbolic dump there are lots of numbers that you could try to locate in the binary dump labelled * START_CU:*, for example the

*4b3445a30 is very obvious at the end of line 37 above (with the bytes in the right order), and the*

**CU magic number***of 5c173142 is visible at the start of line 38 (with the bytes in reverse order – no, don’t ask why me there’s a difference).*

**CU checksum**Bearing in mind how Oracle frequently uses “self-describing” strategies for data we might search for the value that gives the length of the CU as this could well be the start of the compressed data. But there are two CU lengths – the * CU total length* (93755) and the

*(93159). To my mind the latter is the “amount of CU requiring decompression” but let’s check for both: 93,755 (dec) = 0x16e3b, and 93,159 (dec) = 0x16be7 .*

**CU decomp length**Take another look at line 38 – the second 4 bytes are: 00 01 6e 3b, the * CU total length*. Unfortunately we can’t see the

*anywhere, so let’s change strategy slightly – the different between the two lengths is 93755 – 93159 = 596 = 0x254, and by a convenient coincidence if you look at line 46, five bytes from the end, you find 02 54. Is this really a coincidence ? Let’s count 596 bytes from the*

**CU decomp length***and see where we get to – the answer is : the last 4 bytes of line 46 (e6 c2 eb 8b) This doesn’t seem to help at first sight, but take a look at where we are and what precedes that 4 bytes. There are 7 groups of 4 bytes that look like increasing values – and since I know more about the data than you do (it’s a table with 7 columns, and the last 4 columns are identical across all rows), I’ve extracted those 28 bytes and reformatted them:*

**CU total length**Offset Delta ------ ----- 00 00 02 54 596 00 00 30 bc 12476 11880 00 00 4a 82 19074 6598 00 00 64 37 25655 6581 00 00 a6 b8 42680 17025 00 00 e9 39 59705 17025 00 01 2b ba 76730 17025 93755 <----

The *“Offset”* column is what you get by converting from Hex to decimal, and the *“Delta”* column is the difference between one value and the next. Notice how the last three deltas are the same, and how, if you add the same delta to the last Offset you get a number which is the * CU total length*. The “02 54” I found as the difference between the

*and the*

**CU total length***is actually the starting point for the list of pointers to each of the compressed columns, and the repeated 17,025 is the length of my duplicated final four columns. The e6 c2 eb 8b is the first four bytes of the first (compressed) column.*

**CU decomp length**Having worked out that it’s possible to find the list of pointers quite easily from the dump, the next step (left as an exercise to the interested reader) is to work out the algorithm that Oracle uses to locate the list – since the CU decomp length is clearly derived and not actually stored. As a little clue, you might start by taking another look at line 38: bytes 11/12 are 00 07 – the number of columns in the table, and bytes 13/14 are 11 09 – the number of rows in the CU (and that’s the most significant number at this point because it tells you how long the bitmap for deleted rows will be).