Following up a suggestion from Kerry Osborne that I show how I arrived at the observation I made in an earlier posting about the size of a compression unit, here’s a short note to show you what I did. It really isn’t rocket science (that’s just a quick nod to NASA and Curiosity – the latest Mars rover).
Step 1: you can access rows by rowid in Oracle, so what happens when you try to analyze rowids on Exadata for a table using HCC ? I created a table with the option “compress for archive high” and then ran the following query:
break on report compute sum of count(*) on report compute count of block_no on report select dbms_rowid.rowid_relative_fno(rowid) file_no, dbms_rowid.rowid_block_number(rowid) block_no, count(*) from t1 group by dbms_rowid.rowid_relative_fno(rowid), dbms_rowid.rowid_block_number(rowid) order by dbms_rowid.rowid_relative_fno(rowid), dbms_rowid.rowid_block_number(rowid) ;
This code simply counts the number of rows which appear to share the same file number and block number and reports the results in block address order. With a few simple sets of data, I found that the number of rows per “block” appeared to be around 32,000 – so I created an extremely compressible test case, and got the following results:
create table t1 nologging compress for archive high storage (initial 64M next 64M) as with generator as ( select --+ materialize rownum id from dual connect by level <= 1e4 ) select lpad('x',10,'x') v1 from generator v1, generator v2 where rownum <= 4 * 1048576 ; FILE_NO BLOCK_NO COUNT(*) ---------- ---------- ---------- 5 3983906 32759 5 3983907 32759 5 3983908 32759 ... 5 3984033 32759 5 3984034 1152 ---------- ---------- count 129 sum 4194304
Note: the storage clause is an attempt to avoid any special boundary conditions in a tablespace declared as locally managed with system managed extent allocation – in a new tablespace it (usually) skips the first few 64KB, 1MB and 8MB extent allocations.
As you can see, the best I could do was to cram 32,759 rows into a single “block” ! Using a slightly less extreme data set (see, for example, the code from my previous CU posting) I ran the same query to show that rather than reporting rowids in consecutive blocks I would see thousands of rows in a single block, followed by a gap of up to something like 32 blocks then another block with thousands of rows, and so on. The obvious inference was that the reported block addresses were the starting blocks of a compression unit – so I dumped one. Here’s the type of thing I found:
data_block_dump,data header at 0x2b98bb454a7c =============== tsiz: 0x1f80 hsiz: 0x1c pbl: 0x2b98bb454a7c 76543210 flag=-0------ ntab=1 nrow=1 frre=-1 fsbo=0x1c fseo=0x30 avsp=0x14 tosp=0x14 r0_9ir2=0x0 mec_kdbh9ir2=0x0 76543210 shcf_kdbh9ir2=---------- 76543210 flag_9ir2=--R----- Archive compression: Y fcls_9ir2[0]={ } 0x16:pti[0] nrow=1 offs=0 0x1a:pri[0] offs=0x30
It’s a normal table block (although the r0_9ir2 entry at line 15 shows that it’s subject to some form of internal column re-ordering to maximise the compressibility – that’s a detail that arrived in 9.2) holding just one row (nrow = 1 at line 9)
block_row_dump: tab 0, row 0, @0x30 tl: 8016 fb: --H-F--N lb: 0x0 cc: 1 nrid: 0x0156ee23.0 col 0: [8004] Compression level: 04 (Archive High) Length of CU row: 8004 kdzhrh: ------PC CBLK: 33 Start Slot: 00 NUMP: 33 PNUM: 00 POFF: 7664 PRID: 0x0156ee23.0 PNUM: 01 POFF: 15680 PRID: 0x0156ee24.0 PNUM: 02 POFF: 23696 PRID: 0x0156ee25.0 PNUM: 03 POFF: 31712 PRID: 0x0156ee26.0 PNUM: 04 POFF: 39728 PRID: 0x0156ee27.0 PNUM: 05 POFF: 47744 PRID: 0x0156ee28.0 PNUM: 06 POFF: 55760 PRID: 0x0156ee29.0 PNUM: 07 POFF: 63776 PRID: 0x0156ee2a.0 PNUM: 08 POFF: 71792 PRID: 0x0156ee2b.0 PNUM: 09 POFF: 79808 PRID: 0x0156ee2c.0 PNUM: 10 POFF: 87824 PRID: 0x0156ee2d.0 PNUM: 11 POFF: 95840 PRID: 0x0156ee2e.0 PNUM: 12 POFF: 103856 PRID: 0x0156ee2f.0 PNUM: 13 POFF: 111872 PRID: 0x0156ee30.0 PNUM: 14 POFF: 119888 PRID: 0x0156ee31.0 PNUM: 15 POFF: 127904 PRID: 0x0156ee32.0 PNUM: 16 POFF: 135920 PRID: 0x0156ee33.0 PNUM: 17 POFF: 143936 PRID: 0x0156ee34.0 PNUM: 18 POFF: 151952 PRID: 0x0156ee35.0 PNUM: 19 POFF: 159968 PRID: 0x0156ee36.0 PNUM: 20 POFF: 167984 PRID: 0x0156ee37.0 PNUM: 21 POFF: 176000 PRID: 0x0156ee38.0 PNUM: 22 POFF: 184016 PRID: 0x0156ee39.0 PNUM: 23 POFF: 192032 PRID: 0x0156ee3a.0 PNUM: 24 POFF: 200048 PRID: 0x0156ee3b.0 PNUM: 25 POFF: 208064 PRID: 0x0156ee3c.0 PNUM: 26 POFF: 216080 PRID: 0x0156ee3d.0 PNUM: 27 POFF: 224096 PRID: 0x0156ee3e.0 PNUM: 28 POFF: 232112 PRID: 0x0156ee3f.0 PNUM: 29 POFF: 240128 PRID: 0x0156ee40.0 PNUM: 30 POFF: 248144 PRID: 0x0156ee41.0 PNUM: 31 POFF: 256160 PRID: 0x0156ee42.0 PNUM: 32 POFF: 264176 PRID: 0x0156ee43.0 CU header: CU version: 0 CU magic number: 0x4b445a30 CU checksum: 0x210a6a05 CU total length: 264972 CU flags: NC-U-CRD-OP ncols: 8 nrows: 17854 algo: 0 CU decomp length: 262686 len/value length: 2467640 row pieces per row: 1 num deleted rows: 0
But the next part of the block dump shows that the one row is a very funny row consisting of just one column (cc: 1 in line 3 is the column count). If you try to access this “row” by rowid in the normal way you won’t be able to retrieve it, instead Oracle will unpack the compression and give you the first row from the unit.
Line 3 tells us that the total length (tl) of the row is 8,016 bytes, but line 4 tells us that the row has chained, and the next row-piece (nrid) is the zeroth row in block
0x0156ee23. Of the 8,016 bytes in the row-piece, the single column comprises 8,004 bytes.
We can see in line 6 that the compression level is “Archive High”.
In line 8 we see that there are 33 continuation blocks (CBLK), and this is followed by a list of the blocks (technically the rowid with block) in order. Notice that the rowid given by PNUM: 00 is consistent with the nrid that is reported in the normal “chained row” part of the row description. The POFF values (pointer offset ?) show the correlation between a row-piece and its starting byte position in the CU.
Line 46 shows us the total length of the compression unit; in this case it’s 264,972 bytes – approximately 259KB. So far most of the dumps I’ve done have show CUs to be a little less than 256KB, but I have seen a few larger ones – but never much more than 256KB.
Line 48 tells us that the underlying rows have 8 columns, and somewhere in the CU there has to be a set of pointers to the first byte for each compressed column. Line 49 tells us that this compression unit holds 17,854 rows. Line 51 tells us that the data content of the CU (i.e. eliminating overheads) is 262,686 bytes, which has been compressed down from 2.4MB (a compression factor of roughly 9.4).
Line 50 is quite interesting – algo: 0 Is this short for “algorithm”, and does that suggest that Oracle can decide dynamically what compression algorithm it’s going to use based on the data it finds as it starts to build the compression unit ?
The final entry I’ve shown is the list of deleted rows – and there are none in this example. If there are any deleted rows in the CU (and updated rows are deleted from the CU and copied to a new location) they appear in a simple list of number which, in this example, would range from 1 to 17,854 (the number of rows in the CU).
Just clarifying a bit of what was demonstrated:
When a segment uses HCC, dbms_rowid.rowid_block_number(rowid) represents the CU (the logical rowid), not the database block that one is generally familiar with. This also means that any block that has non-head CU pieces in it will have no logical rowids associated with it, which explains the 32 block “gap”.
IIRC for HCC “query” compression, CUs are typically 32K, while “archive” compression CUs are typically no larger than 256K as mentioned.
When an HCC segment is loaded, compression analysis chooses the compression algo per column (of which there are several) which is why there is a enqueue wait for “enq: ZH – compression analysis” in 11gR2.
Another good reference is:
http://canali.web.cern.ch/canali/docs/Compressing_VLDS_Oracle_UKOUG09_LC_CERN.ppt
Comment by Greg Rahn — August 8, 2012 @ 7:26 am BST Aug 8,2012 |
Greg
Thanks for the comments and links.
I should have pointed out that compression for query seemed to use much smaller compression units. (I have a memory that I saw query high produce CUs of 11 blocks on one occasion – but possibly it was actually archive low and my memory is at fault.)
To my mind, the different limits on the sizes of query HCCs and archive HCCs is probably the most significant difference in strategy – the CPU cost of decompressing a column from an HCC that’s allowed to extend to 256KB is likely to be much greater than the CPU cost of decompressing a column from a 32KB HCC. That (I assume) is why my example from the previous post (https://jonathanlewis.wordpress.com/2012/07/27/compression-units-3/ ) showed the CPU usage going up by a factor of 40 even though the size of the table dropped by only 25% (49,000 blocks down to 36,600 blocks) when I switched from query high to archive high compression.
Comment by Jonathan Lewis — August 8, 2012 @ 9:00 am BST Aug 8,2012 |
I agree that the CPU cost (and elapsed time) of compression and decompression are important considerations along with the compression ratio because there are trade offs being made, however, I would place the emphasis on not how large the CU is, but what the compression algorithm and objective are because that is really the driving factor for the CU size. The top level compression algos include LZO, gzip (medium), gzip (high) and bzip2 [1]. The emphasis with the “archive” level compression is really reduced size, but the CU size and algo both contribute to CPU costs. Even if the CU size was constant across the four HCC levels, as the compression level increases, more CPU is required. e.g. LZO requires less CPU than bzip2 on an identical data set.
[1] http://bit.ly/My60wt
Comment by Greg Rahn — August 8, 2012 @ 3:44 pm BST Aug 8,2012 |
Greg,
I think we’re probably making the same point while talking in different ways about different aspects of the same feature.
I appreciate that it would take less CPU (though more space) to handle all your data if it were stored using LZO rather than bzip2 even if the CU was a uniform 256KB in both cases; but the aspect I was thinking of was that it takes less CPU to unpack a single row from a 32KB LZO CU than it would to unpack the same row from a 256KB LZU CU.
If your strategic choice is “compress for query” then you are presumably expecting to query the data, and probably not expecting every query to be a tablescan – so the CPU spent selecting single rows at random is likely to be a sufficiently important factor that a 32KB limit on CUs becomes a much better strategy than a 256KB limit.
Comment by Jonathan Lewis — August 9, 2012 @ 12:44 pm BST Aug 9,2012
Thanks for the details guys.
I went back and had a look at a few tables I had access to at the moment and it seems like the three lower level compression settings (Query Low through Archive Low) all have around 4 blocks per CU (32K). While Archive High has a significantly higher number generally around 32 (256K). I haven’t really seen anything in between although it’s an admittedly haphazard look at a very small data set.
Greg, can you expand any on the column level (I’ll call it pre-processing) compression?
Comment by pieboy13 — August 9, 2012 @ 4:00 pm BST Aug 9,2012 |
pieboy13,
I’ve just generated a data set at “archive low” – here’s a sample from the rowid check code (note – around 11 blocks, and 7,800 rows per CU):
And here’s s chunk from the block dump of block 3602383:
(This happens to be a case where one CU ends and another begins in the same block – so this CU is row 1 in the block rather than row 0.)
I mentioned in my previous reply to Greg that I thought I’d seen 11 blocks for a “query” CU – but looking at this result I think I probably was remembering an experiment with “archive low”.
Comment by Jonathan Lewis — August 9, 2012 @ 4:56 pm BST Aug 9,2012 |
@Jonathan – HCC was built with scans in mind as it is more costly to get the CU, uncompress it, then grab a single row. Not every query needs to be a FTS, but if the majority are not, then HCC probably isn’t the right compression choice.
@pieboy13 – Since the column level compression techniques are not publicly documented by Oracle, I can not comment specifically. What I will say is that there are a number of techniques are used by column major databases to encode/compress data and there is a good chance that HCC uses those as well.
Comment by Greg Rahn — August 9, 2012 @ 9:20 pm BST Aug 9,2012 |
[…] and I was expecting to find some notes somewhere about Oracle catalogues all the bits and pieces. Part 4 of this series lists some of the detail but I was slightly surprised to discover that it made the comment: […]
Pingback by Compression Units – 6 | Oracle Scratchpad — May 12, 2014 @ 1:34 pm BST May 12,2014 |
[…] Compression Units part 4 […]
Pingback by Compression | Oracle Scratchpad — August 5, 2015 @ 1:06 pm BST Aug 5,2015 |
[…] Compression Units part 4 – Aug 2012 […]
Pingback by Exadata Catalogue | Oracle Scratchpad — January 27, 2022 @ 6:27 pm GMT Jan 27,2022 |