<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:georss="http://www.georss.org/georss" xmlns:geo="http://www.w3.org/2003/01/geo/wgs84_pos#" xmlns:media="http://search.yahoo.com/mrss/"
		>
<channel>
	<title>Comments on: Compression Units &#8211; 4</title>
	<atom:link href="http://jonathanlewis.wordpress.com/2012/08/07/compression-units-4/feed/" rel="self" type="application/rss+xml" />
	<link>http://jonathanlewis.wordpress.com/2012/08/07/compression-units-4/</link>
	<description>Just another Oracle weblog</description>
	<lastBuildDate>Fri, 24 May 2013 13:27:07 +0000</lastBuildDate>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.com/</generator>
	<item>
		<title>By: Greg Rahn</title>
		<link>http://jonathanlewis.wordpress.com/2012/08/07/compression-units-4/#comment-48865</link>
		<dc:creator><![CDATA[Greg Rahn]]></dc:creator>
		<pubDate>Thu, 09 Aug 2012 21:20:55 +0000</pubDate>
		<guid isPermaLink="false">http://jonathanlewis.wordpress.com/?p=9300#comment-48865</guid>
		<description><![CDATA[@Jonathan - HCC was built with scans in mind as it is more costly to get the CU, uncompress it, then grab a single row.  Not every query needs to be a FTS, but if the majority are not, then HCC probably isn&#039;t the right compression choice.

@pieboy13 - Since the column level compression techniques are not publicly documented by Oracle, I can not comment specifically.  What I will say is that there are a number of techniques are used by column major databases to encode/compress data and there is a good chance that HCC uses those as well.]]></description>
		<content:encoded><![CDATA[<p>@Jonathan &#8211; HCC was built with scans in mind as it is more costly to get the CU, uncompress it, then grab a single row.  Not every query needs to be a FTS, but if the majority are not, then HCC probably isn&#8217;t the right compression choice.</p>
<p>@pieboy13 &#8211; Since the column level compression techniques are not publicly documented by Oracle, I can not comment specifically.  What I will say is that there are a number of techniques are used by column major databases to encode/compress data and there is a good chance that HCC uses those as well.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Jonathan Lewis</title>
		<link>http://jonathanlewis.wordpress.com/2012/08/07/compression-units-4/#comment-48856</link>
		<dc:creator><![CDATA[Jonathan Lewis]]></dc:creator>
		<pubDate>Thu, 09 Aug 2012 16:56:26 +0000</pubDate>
		<guid isPermaLink="false">http://jonathanlewis.wordpress.com/?p=9300#comment-48856</guid>
		<description><![CDATA[pieboy13,
I&#039;ve just generated a data set at &quot;archive low&quot; - here&#039;s a sample from the rowid check code (note - around 11 blocks, and 7,800 rows per CU):
[sourcecode gutter=&quot;false&quot;]
       FNO        BNO      L_FNO      L_BNO     F_DIFF     B_DIFF ROWS_PER_CU
---------- ---------- ---------- ---------- ---------- ---------- -----------
         5    3602372          5    3602383          0         11        7877
         5    3602383          5    3602394          0         11        7880
         5    3602394          5    3602405          0         11        7768
         5    3602405          5    3602416          0         11        7722
         5    3602416          5    3602428          0         12        7723

[/sourcecode]

And here&#039;s s chunk from the block dump of block 3602383:

[sourcecode gutter=&quot;false&quot;]
tab 0, row 1, @0x32
tl: 5693 fb: --H-F--N lb: 0x0  cc: 1
nrid:  0x0176f7d0.0
col  0: [5681]
Compression level: 03 (Archive Low)
 Length of CU row: 5681
kdzhrh: ------PC CBLK: 11 Start Slot: 00
 NUMP: 11
 PNUM: 00 POFF: 5561 PRID: 0x0176f7d0.0
 PNUM: 01 POFF: 13577 PRID: 0x0176f7d1.0
 PNUM: 02 POFF: 21593 PRID: 0x0176f7d2.0
 PNUM: 03 POFF: 29609 PRID: 0x0176f7d3.0
 PNUM: 04 POFF: 37625 PRID: 0x0176f7d4.0
 PNUM: 05 POFF: 45641 PRID: 0x0176f7d5.0
 PNUM: 06 POFF: 53657 PRID: 0x0176f7d6.0
 PNUM: 07 POFF: 61673 PRID: 0x0176f7d7.0
 PNUM: 08 POFF: 69689 PRID: 0x0176f7d8.0
 PNUM: 09 POFF: 77705 PRID: 0x0176f7d9.0
 PNUM: 10 POFF: 85721 PRID: 0x0176f7da.0
CU header:
CU version: 0   CU magic number: 0x4b445a30
CU checksum: 0xb561315
CU total length: 90376
CU flags: NC-U-CRD-OP
ncols: 8
nrows: 7880
algo: 0
CU decomp length: 89337   len/value length: 778484
[/sourcecode]

(This happens to be a case where one CU ends and another begins in the same block - so this CU is row 1 in the block rather than row 0.)
I mentioned in my previous reply to Greg that I thought I&#039;d seen 11 blocks for a &quot;query&quot; CU - but looking at this result I think I probably was remembering an experiment with &quot;archive low&quot;.]]></description>
		<content:encoded><![CDATA[<p>pieboy13,<br />
I&#8217;ve just generated a data set at &#8220;archive low&#8221; &#8211; here&#8217;s a sample from the rowid check code (note &#8211; around 11 blocks, and 7,800 rows per CU):</p>
<pre class="brush: plain; gutter: false; title: ; notranslate">
       FNO        BNO      L_FNO      L_BNO     F_DIFF     B_DIFF ROWS_PER_CU
---------- ---------- ---------- ---------- ---------- ---------- -----------
         5    3602372          5    3602383          0         11        7877
         5    3602383          5    3602394          0         11        7880
         5    3602394          5    3602405          0         11        7768
         5    3602405          5    3602416          0         11        7722
         5    3602416          5    3602428          0         12        7723

</pre>
<p>And here&#8217;s s chunk from the block dump of block 3602383:</p>
<pre class="brush: plain; gutter: false; title: ; notranslate">
tab 0, row 1, @0x32
tl: 5693 fb: --H-F--N lb: 0x0  cc: 1
nrid:  0x0176f7d0.0
col  0: [5681]
Compression level: 03 (Archive Low)
 Length of CU row: 5681
kdzhrh: ------PC CBLK: 11 Start Slot: 00
 NUMP: 11
 PNUM: 00 POFF: 5561 PRID: 0x0176f7d0.0
 PNUM: 01 POFF: 13577 PRID: 0x0176f7d1.0
 PNUM: 02 POFF: 21593 PRID: 0x0176f7d2.0
 PNUM: 03 POFF: 29609 PRID: 0x0176f7d3.0
 PNUM: 04 POFF: 37625 PRID: 0x0176f7d4.0
 PNUM: 05 POFF: 45641 PRID: 0x0176f7d5.0
 PNUM: 06 POFF: 53657 PRID: 0x0176f7d6.0
 PNUM: 07 POFF: 61673 PRID: 0x0176f7d7.0
 PNUM: 08 POFF: 69689 PRID: 0x0176f7d8.0
 PNUM: 09 POFF: 77705 PRID: 0x0176f7d9.0
 PNUM: 10 POFF: 85721 PRID: 0x0176f7da.0
CU header:
CU version: 0   CU magic number: 0x4b445a30
CU checksum: 0xb561315
CU total length: 90376
CU flags: NC-U-CRD-OP
ncols: 8
nrows: 7880
algo: 0
CU decomp length: 89337   len/value length: 778484
</pre>
<p>(This happens to be a case where one CU ends and another begins in the same block &#8211; so this CU is row 1 in the block rather than row 0.)<br />
I mentioned in my previous reply to Greg that I thought I&#8217;d seen 11 blocks for a &#8220;query&#8221; CU &#8211; but looking at this result I think I probably was remembering an experiment with &#8220;archive low&#8221;.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: pieboy13</title>
		<link>http://jonathanlewis.wordpress.com/2012/08/07/compression-units-4/#comment-48853</link>
		<dc:creator><![CDATA[pieboy13]]></dc:creator>
		<pubDate>Thu, 09 Aug 2012 16:00:22 +0000</pubDate>
		<guid isPermaLink="false">http://jonathanlewis.wordpress.com/?p=9300#comment-48853</guid>
		<description><![CDATA[Thanks for the details guys. 

I went back and had a look at a few tables I had access to at the moment and it seems like the three lower level compression settings (Query Low through Archive Low) all have around 4 blocks per CU (32K). While Archive High has a significantly higher number generally around 32 (256K). I haven&#039;t really seen anything in between although it&#039;s an admittedly haphazard look at a very small data set. 

Greg, can you expand any on the column level (I&#039;ll call it pre-processing) compression?]]></description>
		<content:encoded><![CDATA[<p>Thanks for the details guys. </p>
<p>I went back and had a look at a few tables I had access to at the moment and it seems like the three lower level compression settings (Query Low through Archive Low) all have around 4 blocks per CU (32K). While Archive High has a significantly higher number generally around 32 (256K). I haven&#8217;t really seen anything in between although it&#8217;s an admittedly haphazard look at a very small data set. </p>
<p>Greg, can you expand any on the column level (I&#8217;ll call it pre-processing) compression?</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Jonathan Lewis</title>
		<link>http://jonathanlewis.wordpress.com/2012/08/07/compression-units-4/#comment-48845</link>
		<dc:creator><![CDATA[Jonathan Lewis]]></dc:creator>
		<pubDate>Thu, 09 Aug 2012 12:44:55 +0000</pubDate>
		<guid isPermaLink="false">http://jonathanlewis.wordpress.com/?p=9300#comment-48845</guid>
		<description><![CDATA[Greg,

I think we&#039;re probably making the same point while talking in different ways about different aspects of the same feature.

I appreciate that it would take less CPU (though more space) to handle all your data if it were stored using LZO rather than bzip2 even if the CU was a uniform 256KB in both cases; but the aspect I was thinking of was that it takes less CPU to unpack a single row from a 32KB LZO CU than it would to unpack the same row from a 256KB LZU CU. 

If your strategic choice is &quot;compress for query&quot; then you are presumably expecting to query the data, and probably not expecting every query to be a tablescan - so the CPU spent selecting single rows at random is likely to be a sufficiently important factor that a 32KB limit on CUs becomes a much better strategy than a 256KB limit.]]></description>
		<content:encoded><![CDATA[<p>Greg,</p>
<p>I think we&#8217;re probably making the same point while talking in different ways about different aspects of the same feature.</p>
<p>I appreciate that it would take less CPU (though more space) to handle all your data if it were stored using LZO rather than bzip2 even if the CU was a uniform 256KB in both cases; but the aspect I was thinking of was that it takes less CPU to unpack a single row from a 32KB LZO CU than it would to unpack the same row from a 256KB LZU CU. </p>
<p>If your strategic choice is &#8220;compress for query&#8221; then you are presumably expecting to query the data, and probably not expecting every query to be a tablescan &#8211; so the CPU spent selecting single rows at random is likely to be a sufficiently important factor that a 32KB limit on CUs becomes a much better strategy than a 256KB limit.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Greg Rahn</title>
		<link>http://jonathanlewis.wordpress.com/2012/08/07/compression-units-4/#comment-48808</link>
		<dc:creator><![CDATA[Greg Rahn]]></dc:creator>
		<pubDate>Wed, 08 Aug 2012 15:44:39 +0000</pubDate>
		<guid isPermaLink="false">http://jonathanlewis.wordpress.com/?p=9300#comment-48808</guid>
		<description><![CDATA[I agree that the CPU cost (and elapsed time) of compression and decompression are important considerations along with the compression ratio because there are trade offs being made, however, I would place the emphasis on not how large the CU is, but what the compression algorithm and objective are because that is really the driving factor for the CU size.  The top level compression algos include LZO, gzip (medium), gzip (high) and bzip2 [1].  The emphasis with the &quot;archive&quot; level compression is really reduced size, but the CU size and algo both contribute to CPU costs.  Even if the CU size was constant across the four HCC levels, as the compression level increases, more CPU is required.  e.g. LZO requires less CPU than bzip2 on an identical data set.

[1] http://bit.ly/My60wt]]></description>
		<content:encoded><![CDATA[<p>I agree that the CPU cost (and elapsed time) of compression and decompression are important considerations along with the compression ratio because there are trade offs being made, however, I would place the emphasis on not how large the CU is, but what the compression algorithm and objective are because that is really the driving factor for the CU size.  The top level compression algos include LZO, gzip (medium), gzip (high) and bzip2 [1].  The emphasis with the &#8220;archive&#8221; level compression is really reduced size, but the CU size and algo both contribute to CPU costs.  Even if the CU size was constant across the four HCC levels, as the compression level increases, more CPU is required.  e.g. LZO requires less CPU than bzip2 on an identical data set.</p>
<p>[1] <a href="http://bit.ly/My60wt" rel="nofollow">http://bit.ly/My60wt</a></p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Jonathan Lewis</title>
		<link>http://jonathanlewis.wordpress.com/2012/08/07/compression-units-4/#comment-48783</link>
		<dc:creator><![CDATA[Jonathan Lewis]]></dc:creator>
		<pubDate>Wed, 08 Aug 2012 09:00:14 +0000</pubDate>
		<guid isPermaLink="false">http://jonathanlewis.wordpress.com/?p=9300#comment-48783</guid>
		<description><![CDATA[Greg

Thanks for the comments and links.

I should have pointed out that compression for query seemed to use much smaller compression units. (I have a memory that I saw query high produce CUs of 11 blocks on one occasion - but possibly it was actually archive low and my memory is at fault.)

To my mind, the different limits on the sizes of query HCCs and archive HCCs is probably the most significant difference in strategy - the CPU cost of decompressing a column from an HCC that&#039;s allowed to extend to 256KB is likely to be much greater than the CPU cost of decompressing a column from a 32KB HCC. That (I assume) is why my example from the previous post (http://jonathanlewis.wordpress.com/2012/07/27/compression-units-3/ ) showed the CPU usage going up by a factor of 40 even though the size of the table dropped by only 25% (49,000 blocks down to 36,600 blocks) when I switched from query high to archive high compression.]]></description>
		<content:encoded><![CDATA[<p>Greg</p>
<p>Thanks for the comments and links.</p>
<p>I should have pointed out that compression for query seemed to use much smaller compression units. (I have a memory that I saw query high produce CUs of 11 blocks on one occasion &#8211; but possibly it was actually archive low and my memory is at fault.)</p>
<p>To my mind, the different limits on the sizes of query HCCs and archive HCCs is probably the most significant difference in strategy &#8211; the CPU cost of decompressing a column from an HCC that&#8217;s allowed to extend to 256KB is likely to be much greater than the CPU cost of decompressing a column from a 32KB HCC. That (I assume) is why my example from the previous post (<a href="http://jonathanlewis.wordpress.com/2012/07/27/compression-units-3/" rel="nofollow">http://jonathanlewis.wordpress.com/2012/07/27/compression-units-3/</a> ) showed the CPU usage going up by a factor of 40 even though the size of the table dropped by only 25% (49,000 blocks down to 36,600 blocks) when I switched from query high to archive high compression.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Greg Rahn</title>
		<link>http://jonathanlewis.wordpress.com/2012/08/07/compression-units-4/#comment-48781</link>
		<dc:creator><![CDATA[Greg Rahn]]></dc:creator>
		<pubDate>Wed, 08 Aug 2012 07:26:05 +0000</pubDate>
		<guid isPermaLink="false">http://jonathanlewis.wordpress.com/?p=9300#comment-48781</guid>
		<description><![CDATA[Just clarifying a bit of what was demonstrated:
When a segment uses HCC, dbms_rowid.rowid_block_number(rowid) represents the CU (the logical rowid), not the database block that one is generally familiar with.  This also means that any block that has non-head CU pieces in it will have no logical rowids associated with it, which explains the 32 block &quot;gap&quot;.

IIRC for HCC &quot;query&quot; compression, CUs are typically 32K, while &quot;archive&quot; compression CUs are typically no larger than 256K as mentioned.

When an HCC segment is loaded, compression analysis chooses the compression algo per column (of which there are several) which is why there is a enqueue wait for &quot;enq: ZH – compression analysis&quot; in 11gR2.

Another good reference is: 
http://canali.web.cern.ch/canali/docs/Compressing_VLDS_Oracle_UKOUG09_LC_CERN.ppt]]></description>
		<content:encoded><![CDATA[<p>Just clarifying a bit of what was demonstrated:<br />
When a segment uses HCC, dbms_rowid.rowid_block_number(rowid) represents the CU (the logical rowid), not the database block that one is generally familiar with.  This also means that any block that has non-head CU pieces in it will have no logical rowids associated with it, which explains the 32 block &#8220;gap&#8221;.</p>
<p>IIRC for HCC &#8220;query&#8221; compression, CUs are typically 32K, while &#8220;archive&#8221; compression CUs are typically no larger than 256K as mentioned.</p>
<p>When an HCC segment is loaded, compression analysis chooses the compression algo per column (of which there are several) which is why there is a enqueue wait for &#8220;enq: ZH – compression analysis&#8221; in 11gR2.</p>
<p>Another good reference is:<br />
<a href="http://canali.web.cern.ch/canali/docs/Compressing_VLDS_Oracle_UKOUG09_LC_CERN.ppt" rel="nofollow">http://canali.web.cern.ch/canali/docs/Compressing_VLDS_Oracle_UKOUG09_LC_CERN.ppt</a></p>
]]></content:encoded>
	</item>
</channel>
</rss>
