Oracle Scratchpad

May 12, 2014

Compression Units – 6

Filed under: Exadata,HCC,Oracle — Jonathan Lewis @ 1:34 pm GMT May 12,2014

I received an email recently asking me if I knew how Oracle found specific rows and columns in a compression unit. This is a topic that I’ve spoken about a couple of times, and I’ve published several notes on the blog about it, including an image of a critical slide from one of my presentations, and I was expecting to find some notes somewhere about Oracle catalogues all the bits and pieces. Part 4 of this series lists some of the detail but I was slightly surprised to discover that it made the comment: “and somewhere in the CU there has to be a set of pointers to the first byte for each compressed column” – and this was the bit that my interrogator wanted to know. This posting is the answer – and I’ll start with a dump of the first block of a CU:


block_row_dump:
tab 0, row 0, @0x30
tl: 8016 fb: --H-F--N lb: 0x0  cc: 1
nrid:  0x01c06484.0
col  0: [8004]
Compression level: 03 (Archive Low)
 Length of CU row: 8004
kdzhrh: ------PC CBLK: 11 Start Slot: 00
 NUMP: 11
 PNUM: 00 POFF: 7884 PRID: 0x01c06484.0
 PNUM: 01 POFF: 15900 PRID: 0x01c06485.0
 PNUM: 02 POFF: 23916 PRID: 0x01c06486.0
 PNUM: 03 POFF: 31932 PRID: 0x01c06487.0
 PNUM: 04 POFF: 39948 PRID: 0x01c06488.0
 PNUM: 05 POFF: 47964 PRID: 0x01c06489.0
 PNUM: 06 POFF: 55980 PRID: 0x01c0648a.0
 PNUM: 07 POFF: 63996 PRID: 0x01c0648b.0
 PNUM: 08 POFF: 72012 PRID: 0x01c0648c.0
 PNUM: 09 POFF: 80028 PRID: 0x01c0648d.0
 PNUM: 10 POFF: 88044 PRID: 0x01c0648e.0
CU header:
CU version: 0   CU magic number: 0x4b445a30
CU checksum: 0x5c173142
CU total length: 93755
CU flags: NC-U-CRD-OP
ncols: 7
nrows: 4361
algo: 0
CU decomp length: 93159   len/value length: 309310
row pieces per row: 1
num deleted rows: 0
START_CU:
 00 00 1f 44 1f 0b 00 00 00 0b 00 00 1e cc 01 c0 64 84 00 00 00 00 3e 1c 01
 c0 64 85 00 00 00 00 5d 6c 01 c0 64 86 00 00 00 00 7c bc 01 c0 64 87 00 00
 00 00 9c 0c 01 c0 64 88 00 00 00 00 bb 5c 01 c0 64 89 00 00 00 00 da ac 01
 c0 64 8a 00 00 00 00 f9 fc 01 c0 64 8b 00 00 00 01 19 4c 01 c0 64 8c 00 00
 00 01 38 9c 01 c0 64 8d 00 00 00 01 57 ec 01 c0 64 8e 00 00 00 4b 44 5a 30
 42 31 17 5c 00 01 6e 3b eb 06 00 07 11 09 00 04 b8 3e 01 00 00 00 00 00 00
 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
...
...  20 repetitions of the previous line of zeros deleted
...  This is mostly the bitmap identifying deleted rows.
...  nrows=4361, so this will be about 4261, so this will be
...  about 4361/8 = 545 bytes of zeros.
...
 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 02 00 00 02 54 00 00 30
 bc 00 00 4a 82 00 00 64 37 00 00 a6 b8 00 00 e9 39 00 01 2b ba e6 c2 eb 8b
 00 00 2e 68 01 40 01 02 00 00 53 f0 78 9c a5 dc 07 57 23 5b 96 25 e0 ee 24

If you look at the symbolic dump there are lots of numbers that you could try to locate in the binary dump labelled START_CU:, for example the CU magic number 4b3445a30 is very obvious at the end of line 37 above (with the bytes in the right order), and the CU checksum of 5c173142 is visible at the start of line 38 (with the bytes in reverse order – no, don’t ask why me there’s a difference).

Bearing in mind how Oracle frequently uses “self-describing” strategies for data we might search for the value that gives the length of the CU as this could well be the start of the compressed data. But there are two CU lengths – the CU total length (93755) and the CU decomp length (93159). To my mind the latter is the “amount of CU requiring decompression” but let’s check for both: 93,755 (dec) = 0x16e3b, and 93,159 (dec) = 0x16be7 .

Take another look at line 38 – the second 4 bytes are: 00 01 6e 3b, the CU total length. Unfortunately we can’t see the CU decomp length anywhere, so let’s change strategy slightly – the different between the two lengths is 93755 – 93159 =  596 = 0x254, and by a convenient coincidence if you look at line 46, five bytes from the end, you find 02 54. Is this really a coincidence ? Let’s count 596 bytes from the CU total length and see where we get to – the answer is : the last 4 bytes of line 46 (e6 c2 eb 8b) This doesn’t seem to help at first sight, but take a look at where we are and what precedes that 4 bytes. There are 7 groups of 4 bytes that look like increasing values – and since I know more about the data than you do (it’s a table with 7 columns, and the last 4 columns are identical across all rows), I’ve extracted those 28 bytes and reformatted them:


                    	Offset		Delta
                    	------		-----
        00 00 02 54 	  596
        00 00 30 bc 	12476		11880
        00 00 4a 82 	19074		 6598
        00 00 64 37 	25655		 6581
        00 00 a6 b8 	42680		17025
        00 00 e9 39 	59705		17025
        00 01 2b ba 	76730		17025
                    	93755	<----


The “Offset” column is what you get by converting from Hex to decimal, and the “Delta” column is the difference between one value and the next. Notice how the last three deltas are the same, and how, if you add the same delta to the last Offset you get a number which is the CU total length. The “02 54″ I found as the difference between the CU total length and the CU decomp length is actually the starting point for the list of pointers to each of the compressed columns, and the repeated 17,025 is the length of my duplicated final four columns. The e6 c2 eb 8b is the first four bytes of the first (compressed) column.

Having worked out that it’s possible to find the list of pointers quite easily from the dump, the next step (left as an exercise to the interested reader) is to work out the algorithm that Oracle uses to locate the list – since the CU decomp length is clearly derived and not actually stored. As a little clue, you might start by taking another look at line 38: bytes 11/12 are 00 07 – the number of columns in the table, and bytes 13/14 are 11 09 – the number of rows in the CU (and that’s the most significant number at this point because it tells you how long the bitmap for deleted rows will be).

1 Comment »

  1. […] An update to the above post (thanks for that by the way) […]

    Pingback by Mysterious new Oracle compression type « Martins Blog — May 12, 2014 @ 11:20 pm GMT May 12,2014 | Reply


RSS feed for comments on this post. TrackBack URI

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Theme: Rubric. Get a free blog at WordPress.com

Follow

Get every new post delivered to your Inbox.

Join 4,267 other followers