Oracle Scratchpad

August 7, 2012

Compression Units – 4

Filed under: Exadata,HCC,Oracle — Jonathan Lewis @ 5:24 pm BST Aug 7,2012

Following up a suggestion from Kerry Osborne that I show how I arrived at the observation I made in an earlier posting about the size of a compression unit, here’s a short note to show you what I did. It really isn’t rocket science (that’s just a quick nod to NASA and Curiosity – the latest Mars rover).

Step 1: you can access rows by rowid in Oracle, so what happens when you try to analyze rowids on Exadata for a table using HCC ? I created a table with the option “compress for archive high” and then ran the following query:

break on report
compute sum of count(*) on report
compute count of block_no on report

select
	dbms_rowid.rowid_relative_fno(rowid) file_no,
	dbms_rowid.rowid_block_number(rowid) block_no,
	count(*)
from
	t1
group by
	dbms_rowid.rowid_relative_fno(rowid),
	dbms_rowid.rowid_block_number(rowid)
order by
	dbms_rowid.rowid_relative_fno(rowid),
	dbms_rowid.rowid_block_number(rowid)
;

This code simply counts the number of rows which appear to share the same file number and block number, and reports them in block address order. With a few simple sets of data, I found that the number of rows per “block” appeared to be around 32,000 – so I created an extremely compressible test case, to get the following results:

create table t1
nologging
compress for archive high
storage (initial 64M next 64M)
as
with generator as (
        select  --+ materialize
                rownum id
        from dual
        connect by
                level <= 1e4
)
select
        lpad('x',10,'x')        v1
from
        generator       v1,
        generator       v2
where
        rownum <= 4 * 1048576
;

   FILE_NO   BLOCK_NO   COUNT(*)
---------- ---------- ----------
         5    3983906      32759
         5    3983907      32759
         5    3983908      32759
...
         5    3984033      32759
         5    3984034       1152
           ---------- ----------
count             129
sum                      4194304

Note: the storage clause is an attempt to avoid any “starting conditions” in a tablespace declares as locally managed with system managed extent allocation – it (usually) skips the first few 64KB, 1MB and 8MB extent allocations.

As you can see, the best I could do was to cram 32,759 rows into a single block ! Using a slightly less extreme data set (see, for example, the code from my previous CU posting) I ran the same query to show that rather than every reporting rowids in every consecutive block I would see thousands of rows in a single block, followed by a gap of up to something like 32 blocks, then another block with thousands of rows. The obvious inference was that the reported block addresses were the starting blocks of a compression unit – so I dumped one. Here’s the type of thing I found:

data_block_dump,data header at 0x2b98bb454a7c
===============
tsiz: 0x1f80
hsiz: 0x1c
pbl: 0x2b98bb454a7c
     76543210
flag=-0------
ntab=1
nrow=1
frre=-1
fsbo=0x1c
fseo=0x30
avsp=0x14
tosp=0x14
        r0_9ir2=0x0
        mec_kdbh9ir2=0x0
                      76543210
        shcf_kdbh9ir2=----------
                  76543210
        flag_9ir2=--R-----      Archive compression: Y
                fcls_9ir2[0]={ }
0x16:pti[0]     nrow=1  offs=0
0x1a:pri[0]     offs=0x30

It’s a normal table block (although the r0_9ir2 entry at line 15 shows that it’s subject to some form of column re-arranging to optimise compression – that’s a detail that arrived in 9.2) holding just one row (nrow = 1 at line 9)

block_row_dump:
tab 0, row 0, @0x30
tl: 8016 fb: --H-F--N lb: 0x0  cc: 1
nrid:  0x0156ee23.0
col  0: [8004]
Compression level: 04 (Archive High)
 Length of CU row: 8004
kdzhrh: ------PC CBLK: 33 Start Slot: 00
 NUMP: 33
 PNUM: 00 POFF: 7664 PRID: 0x0156ee23.0
 PNUM: 01 POFF: 15680 PRID: 0x0156ee24.0
 PNUM: 02 POFF: 23696 PRID: 0x0156ee25.0
 PNUM: 03 POFF: 31712 PRID: 0x0156ee26.0
 PNUM: 04 POFF: 39728 PRID: 0x0156ee27.0
 PNUM: 05 POFF: 47744 PRID: 0x0156ee28.0
 PNUM: 06 POFF: 55760 PRID: 0x0156ee29.0
 PNUM: 07 POFF: 63776 PRID: 0x0156ee2a.0
 PNUM: 08 POFF: 71792 PRID: 0x0156ee2b.0
 PNUM: 09 POFF: 79808 PRID: 0x0156ee2c.0
 PNUM: 10 POFF: 87824 PRID: 0x0156ee2d.0
 PNUM: 11 POFF: 95840 PRID: 0x0156ee2e.0
 PNUM: 12 POFF: 103856 PRID: 0x0156ee2f.0
 PNUM: 13 POFF: 111872 PRID: 0x0156ee30.0
 PNUM: 14 POFF: 119888 PRID: 0x0156ee31.0
 PNUM: 15 POFF: 127904 PRID: 0x0156ee32.0
 PNUM: 16 POFF: 135920 PRID: 0x0156ee33.0
 PNUM: 17 POFF: 143936 PRID: 0x0156ee34.0
 PNUM: 18 POFF: 151952 PRID: 0x0156ee35.0
 PNUM: 19 POFF: 159968 PRID: 0x0156ee36.0
 PNUM: 20 POFF: 167984 PRID: 0x0156ee37.0
 PNUM: 21 POFF: 176000 PRID: 0x0156ee38.0
 PNUM: 22 POFF: 184016 PRID: 0x0156ee39.0
 PNUM: 23 POFF: 192032 PRID: 0x0156ee3a.0
 PNUM: 24 POFF: 200048 PRID: 0x0156ee3b.0
 PNUM: 25 POFF: 208064 PRID: 0x0156ee3c.0
 PNUM: 26 POFF: 216080 PRID: 0x0156ee3d.0
 PNUM: 27 POFF: 224096 PRID: 0x0156ee3e.0
 PNUM: 28 POFF: 232112 PRID: 0x0156ee3f.0
 PNUM: 29 POFF: 240128 PRID: 0x0156ee40.0
 PNUM: 30 POFF: 248144 PRID: 0x0156ee41.0
 PNUM: 31 POFF: 256160 PRID: 0x0156ee42.0
 PNUM: 32 POFF: 264176 PRID: 0x0156ee43.0
CU header:
CU version: 0   CU magic number: 0x4b445a30
CU checksum: 0x210a6a05
CU total length: 264972
CU flags: NC-U-CRD-OP
ncols: 8
nrows: 17854
algo: 0
CU decomp length: 262686   len/value length: 2467640
row pieces per row: 1
num deleted rows: 0

But the next part of the block dump shows that the one row is a very funny row consisting of just one column (cc: 1 in line 3 is the column count). If you try to access this “row” by rowid in the normal way you won’t be able to retrieve it, instead will unpack the compression and give the first row from the unit.

Line 3 tells us that the total row length (tl) is 8,016 bytes, but line 4 tells us that the row has chained, and the next row-piece (nrid) is the zeroth row in block
0x0156ee23. Of the 8,016 bytes in the row-piece, the single column comprises 8,004 bytes.

We can see in line 6 that the compression level is “Archive High”.

In line 8 we see that there are 33 continuation blocks (CBLK), and this is followed by a list of the blocks (technically the rowid with block) in order. Notice that the rowid given by PNUM: 00 is consistent with the nrid that is reported in the normal “chained row” part of the row description. The POFF values (pointer offset ?) show the correlation between a row-piece and its starting byte position in the CU.

Line 46 shows us the total length of the compression unit; in this case it’s 264,972 bytes – approximately 259KB. So far most of the dumps I’ve done have show CUs to be a little less than 256KB, but I have seen a few larger ones – but never much more than 256KB.

Line 48 tells us that the underlying rows have 8 columns, and somewhere in the CU there has to be a set of pointers to the first byte for each compressed column. Line 49 tells us that this compression unit holds 17,854 bytes. Line 51 tells us that the data content of the CU (i.e. eliminating overheads) is 262,686 bytes, which has been compressed down from 2.4MB (a compression factor of roughly 9.4).

Line 50 is quite interesting – algo: 0 Is this short for “algorithm”, and does that suggest that Oracle can decide dynamically what compression algorithm it’s going to use based on the data it finds as it starts to build the compression unit ?

The final entry I’ve shown is the list of deleted rows – and there are none in this example. If there are any deleted rows in the CU (and updated rows are deleted from the CU and copied to a new location) they appear in a simple list of number which, in this example, would range from 1 to 17,854 (the number of rows in the CU).

7 Comments »

  1. Just clarifying a bit of what was demonstrated:
    When a segment uses HCC, dbms_rowid.rowid_block_number(rowid) represents the CU (the logical rowid), not the database block that one is generally familiar with. This also means that any block that has non-head CU pieces in it will have no logical rowids associated with it, which explains the 32 block “gap”.

    IIRC for HCC “query” compression, CUs are typically 32K, while “archive” compression CUs are typically no larger than 256K as mentioned.

    When an HCC segment is loaded, compression analysis chooses the compression algo per column (of which there are several) which is why there is a enqueue wait for “enq: ZH – compression analysis” in 11gR2.

    Another good reference is:
    http://canali.web.cern.ch/canali/docs/Compressing_VLDS_Oracle_UKOUG09_LC_CERN.ppt

    Comment by Greg Rahn — August 8, 2012 @ 7:26 am BST Aug 8,2012 | Reply

    • Greg

      Thanks for the comments and links.

      I should have pointed out that compression for query seemed to use much smaller compression units. (I have a memory that I saw query high produce CUs of 11 blocks on one occasion – but possibly it was actually archive low and my memory is at fault.)

      To my mind, the different limits on the sizes of query HCCs and archive HCCs is probably the most significant difference in strategy – the CPU cost of decompressing a column from an HCC that’s allowed to extend to 256KB is likely to be much greater than the CPU cost of decompressing a column from a 32KB HCC. That (I assume) is why my example from the previous post (https://jonathanlewis.wordpress.com/2012/07/27/compression-units-3/ ) showed the CPU usage going up by a factor of 40 even though the size of the table dropped by only 25% (49,000 blocks down to 36,600 blocks) when I switched from query high to archive high compression.

      Comment by Jonathan Lewis — August 8, 2012 @ 9:00 am BST Aug 8,2012 | Reply

      • I agree that the CPU cost (and elapsed time) of compression and decompression are important considerations along with the compression ratio because there are trade offs being made, however, I would place the emphasis on not how large the CU is, but what the compression algorithm and objective are because that is really the driving factor for the CU size. The top level compression algos include LZO, gzip (medium), gzip (high) and bzip2 [1]. The emphasis with the “archive” level compression is really reduced size, but the CU size and algo both contribute to CPU costs. Even if the CU size was constant across the four HCC levels, as the compression level increases, more CPU is required. e.g. LZO requires less CPU than bzip2 on an identical data set.

        [1] http://bit.ly/My60wt

        Comment by Greg Rahn — August 8, 2012 @ 3:44 pm BST Aug 8,2012 | Reply

        • Greg,

          I think we’re probably making the same point while talking in different ways about different aspects of the same feature.

          I appreciate that it would take less CPU (though more space) to handle all your data if it were stored using LZO rather than bzip2 even if the CU was a uniform 256KB in both cases; but the aspect I was thinking of was that it takes less CPU to unpack a single row from a 32KB LZO CU than it would to unpack the same row from a 256KB LZU CU.

          If your strategic choice is “compress for query” then you are presumably expecting to query the data, and probably not expecting every query to be a tablescan – so the CPU spent selecting single rows at random is likely to be a sufficiently important factor that a 32KB limit on CUs becomes a much better strategy than a 256KB limit.

          Comment by Jonathan Lewis — August 9, 2012 @ 12:44 pm BST Aug 9,2012

  2. Thanks for the details guys.

    I went back and had a look at a few tables I had access to at the moment and it seems like the three lower level compression settings (Query Low through Archive Low) all have around 4 blocks per CU (32K). While Archive High has a significantly higher number generally around 32 (256K). I haven’t really seen anything in between although it’s an admittedly haphazard look at a very small data set.

    Greg, can you expand any on the column level (I’ll call it pre-processing) compression?

    Comment by pieboy13 — August 9, 2012 @ 4:00 pm BST Aug 9,2012 | Reply

    • pieboy13,
      I’ve just generated a data set at “archive low” – here’s a sample from the rowid check code (note – around 11 blocks, and 7,800 rows per CU):

             FNO        BNO      L_FNO      L_BNO     F_DIFF     B_DIFF ROWS_PER_CU
      ---------- ---------- ---------- ---------- ---------- ---------- -----------
               5    3602372          5    3602383          0         11        7877
               5    3602383          5    3602394          0         11        7880
               5    3602394          5    3602405          0         11        7768
               5    3602405          5    3602416          0         11        7722
               5    3602416          5    3602428          0         12        7723
      
      

      And here’s s chunk from the block dump of block 3602383:

      tab 0, row 1, @0x32
      tl: 5693 fb: --H-F--N lb: 0x0  cc: 1
      nrid:  0x0176f7d0.0
      col  0: [5681]
      Compression level: 03 (Archive Low)
       Length of CU row: 5681
      kdzhrh: ------PC CBLK: 11 Start Slot: 00
       NUMP: 11
       PNUM: 00 POFF: 5561 PRID: 0x0176f7d0.0
       PNUM: 01 POFF: 13577 PRID: 0x0176f7d1.0
       PNUM: 02 POFF: 21593 PRID: 0x0176f7d2.0
       PNUM: 03 POFF: 29609 PRID: 0x0176f7d3.0
       PNUM: 04 POFF: 37625 PRID: 0x0176f7d4.0
       PNUM: 05 POFF: 45641 PRID: 0x0176f7d5.0
       PNUM: 06 POFF: 53657 PRID: 0x0176f7d6.0
       PNUM: 07 POFF: 61673 PRID: 0x0176f7d7.0
       PNUM: 08 POFF: 69689 PRID: 0x0176f7d8.0
       PNUM: 09 POFF: 77705 PRID: 0x0176f7d9.0
       PNUM: 10 POFF: 85721 PRID: 0x0176f7da.0
      CU header:
      CU version: 0   CU magic number: 0x4b445a30
      CU checksum: 0xb561315
      CU total length: 90376
      CU flags: NC-U-CRD-OP
      ncols: 8
      nrows: 7880
      algo: 0
      CU decomp length: 89337   len/value length: 778484
      

      (This happens to be a case where one CU ends and another begins in the same block – so this CU is row 1 in the block rather than row 0.)
      I mentioned in my previous reply to Greg that I thought I’d seen 11 blocks for a “query” CU – but looking at this result I think I probably was remembering an experiment with “archive low”.

      Comment by Jonathan Lewis — August 9, 2012 @ 4:56 pm BST Aug 9,2012 | Reply

  3. @Jonathan – HCC was built with scans in mind as it is more costly to get the CU, uncompress it, then grab a single row. Not every query needs to be a FTS, but if the majority are not, then HCC probably isn’t the right compression choice.

    @pieboy13 – Since the column level compression techniques are not publicly documented by Oracle, I can not comment specifically. What I will say is that there are a number of techniques are used by column major databases to encode/compress data and there is a good chance that HCC uses those as well.

    Comment by Greg Rahn — August 9, 2012 @ 9:20 pm BST Aug 9,2012 | Reply


RSS feed for comments on this post. TrackBack URI

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

The Rubric Theme Blog at WordPress.com.

Follow

Get every new post delivered to your Inbox.

Join 3,453 other followers