Hybrid Columnar Compression is one of the big features of Exadata that can make fairly dramatic differences to the amount of space it takes to store your data. But how do you find out what’s going on under the covers if you haven’t got an Exadata machine in your garage ?
Here’s a simple starting point that occurred to me a couple of days ago after the product manager (or some such) pointed out that there was no need to make an Exadata emulator available to anyone because all you needed was the compression advisor which you could trust because it actually compressed a sample of your data to see how well it could compress.
So here’s a simple example of how you can use the advisor:
rem rem Script: compression_test.sql rem Author: Jonathan Lewis rem Dated: Sept 2011 rem create table t1 as with generator as ( select --+ materialize rownum id from dual connect by rownum <= 1e4 ) select rownum id, lpad(trunc(dbms_random.value(1,100)),10,'0') v1, lpad(trunc(dbms_random.value(1,100)),10,'0') v2, rpad(dbms_random.value,10) padding1, rpad(dbms_random.value,10) padding2, rpad(dbms_random.value,10) padding3, rpad(dbms_random.value,10) padding from generator v1, generator v2 where rownum <= 1e6 ; declare /* COMP_NOCOMPRESS COMP_FOR_OLTP COMP_FOR_QUERY_HIGH NUMBER COMP_FOR_QUERY_LOW COMP_FOR_ARCHIVE_HIGH COMP_FOR_ARCHIVE_LOW */ m_scratchtbsname varchar2(32) := 'USERS'; m_ownname varchar2(32) := 'TEST_USER'; m_tabname varchar2(32) := 'T1'; m_partname varchar2(32) := null; m_comptype number(10,0) := dbms_compression.comp_for_archive_high; m_blkcnt_cmp pls_integer; m_blkcnt_uncmp pls_integer; m_row_cmp pls_integer; m_row_uncmp pls_integer; m_cmp_ratio number; m_comptype_str varchar2(4000); begin DBMS_COMPRESSION.GET_COMPRESSION_RATIO ( scratchtbsname => m_scratchtbsname, ownname => m_ownname, tabname => m_tabname, partname => m_partname, comptype => m_comptype, blkcnt_cmp => m_blkcnt_cmp, blkcnt_uncmp => m_blkcnt_uncmp, row_cmp => m_row_cmp, row_uncmp => m_row_uncmp, cmp_ratio => m_cmp_ratio, comptype_str => m_comptype_str ); dbms_output.put_line('Blocks compressed: ' || m_blkcnt_cmp); dbms_output.put_line('Blocks uncompressed: ' || m_blkcnt_uncmp); dbms_output.put_line('Rows per block compressed: ' || m_row_cmp); dbms_output.put_line('Rows per block uncompressed: ' || m_row_uncmp); dbms_output.put_line('Compression Ratio: ' || m_cmp_ratio); dbms_output.put_line('Comment: ' || m_comptype_str); end; /
The sample code takes a table – which has to hold at least a million rows, creates an uncompressed copy of a sample, then creates a compressed copy of the sample, and reports a few stats about the results. In fact, as it runs it creates and drops several tables – including, in the closing steps, a table called dbms_tabcomp_temp_cmp. (If you’re wondering how I know this, think sql_trace).
Having seen how the package worked I created a DDL trigger that raised an error on any attempt to drop that particular table, then re-ran the packaged call. Here’s the code for the trigger I created:
rem rem Script: drop_comp_trigger.sql rem Dated: Sept 2011 rem Author: Jonathan Lewis rem create or replace trigger before_drop before drop on test_user.schema begin if ( (sys.dbms_standard.DICTIONARY_OBJ_OWNER = 'TEST_USER') and (sys.dbms_standard.dictionary_obj_type = 'TABLE') and (sys.dbms_standard.DICTIONARY_OBJ_NAME = 'DBMS_TABCOMP_TEMP_CMP') ) then raise_application_error(-20001,'Trying to drop compressed table'); return; end if; end; /
So the package did its thing and failed, leaving a table with archive high compression in place. You can’t do much with it, of course; any attempt to select from it or index it results in Oracle error: “ORA-64307: hybrid columnar compression is only supported in tablespaces residing on Exadata storage”. I think the only thing you can do with it is “alter table move nocompress” – which recreates it as an uncompressed table. The option to rebuild it uncompressed would allow you to run a dataguard type of scenario from an Exadata box to a non-Exadata box and still be able to access the data (eventually) if you have to fail over.
Footnote: to use the packaged call the tablespace has to be ASSM, with system allocated extents, and the table name, owner, and tablespace name need to be in upper case (or quoted if the names have got into the data dictionary in mixed case).
Footnote 2: In an Oracle presentation yesterday one of the slides reported “compression ratios from some of our customers”, with a worst case of 15x for the “archive high” compression; the first data set, modelling sales data, that I threw at the advisor gave me a compression ratio of just 4.1. Compression doesn’t always do everything you might hope without a little help.
Footnote 3: There is one more parameter to the procedure: subset_numrows which gives the number of rows to be used in the compression sample. This defaults to 1,000,000 (but ends up using 1,000,001), has to be at least one million, and can be set to -1 to compress the entire data set.
Great post,
Compression seems like packing a suitcase. I experienced it while preparing mine for the OOW. Even tough the only thing compressed in suitcase is the air, the compression ratio (CR) of a t-shirt is different from the CR of a jacket , which will be worn in the presentation.
And also the rates and methods are different at the beginning of the trip and at the end of it. (similar like the type and number of queries change when the data gets older and older)
On our environment some tables are loaded in query high compression for swift loading and less cpu usage (just like my suitcase in the beginning of a trip ) , and one month later we move them to archive high when the system is idle on cpu. (similar to my suitcase when returning home, my wife won’t be happy with the scene )
Sorting has positive effects on compression :
http://ferhatsengonul.wordpress.com/2010/08/09/getting-the-most-from-hybrid-columnar-compression/
Exadata uses the same oracle code as a non-exadata system on the compute nodes. And the compression is done on the compute nodes.
When it comes to decompression , it’s done on the storage cells only if smart scan is used. For the cases like index access and when the smart scan is disabled by the user, the decompression has to be done on the compute nodes. So compute nodes have also got the code piece for it. Therefore all oracle installations have the ability to use HCC , but it’s not allowed on non-exadata systems.
So here comes the question: where does oracle code understand that the data is coming from storage cells or the traditional storage?
And is there a way to fake it?
Comment by fsengonul — October 4, 2011 @ 2:19 pm BST Oct 4,2011 |
fsengonul,
I had copied down the URL during your presentation on Exadata yesterday and was planning to publish it here, but you’ve saved me a little time and typing.
Your point about decompression is interesting – just to be sure, I think you’re saying that if you access an HCC table without doing a smart scan (e.g. you access by an index to get one row) then the storage cell has to send the database cell the entire cluster unit so that the database cell can decompress it. that doesn’t seem like a good idea – on the other hand the alternative doesn’t look like a good idea either ;) Tanel made the point in his presentation at the end of the day that you have to think carefully about what type of compression you need (if any) when you set up a mixed environment on Exadata – I think you’ve given one very clear indication of how carefully you need to think before you start.
As far as faking things – I think there has to be a basic function call available from pl/sql (the private “sys compression” package, perhaps) that could set a flag to stop some sort of storage check taking place. If this weren’t the case then we couldn’t create a table with HCC for testing.
Thought experiment for the future: if you’re doing basic compression (i.e. deduplication) sorting can also help, but so can a larger block size – because it increases the chance of duplicates appearing inside a single block. Would the block size also have a benefit to HCC, or is the compression unit dictated by number of rows/bytes and not affected by whether the data is initially spread across (say) 64 x 8KB blocks or 32 x 16KB blocks.
Comment by Jonathan Lewis — October 4, 2011 @ 3:49 pm BST Oct 4,2011 |
I’ve just remembered that I’m also looking forward to your HCC presentation at 11:45 today.
Comment by Jonathan Lewis — October 4, 2011 @ 4:24 pm BST Oct 4,2011 |
fsengonul,
I try to provide a reasonable guess to the question how oracle ‘understands’ if the disks are from storage cells or not:
Without an exadata at hand, I have to refer to “Expert Oracle Exadata” – there in page 46 to 48 Kerry Osborne shows a strong link between ‘all disks are in storage cells’ – ‘DiskGroup has attribute cell.smart_scan_capable=TRUE’ and ‘Smart Scan is possible’.
As Oracle allows HCC only when ‘Smart Scan is possible’, I see the ASM DG as the key.
Without smart_scan_capable, you get ORA-64307: hybrid columnar compression is only supported in tablespaces residing on Exadata storage.
There is a documentation how to fake this (and corrupt your ASM DG): http://canali.web.cern.ch/canali/docs/Compressing_VLDS_Oracle_UKOUG09_LC_CERN.ppt in pages 41-44
Martin
Comment by Martin Berger (@martinberx) — October 4, 2011 @ 10:38 pm BST Oct 4,2011 |
Hi Jonathan,
According to this announcement – http://www.oracle.com/us/corporate/press/508020 – HCC should be available not only in Exadata. I don’t know if these feature will be enabled in ZFS Storage Appliance simulator running on VirtualBox but maybe it is something you can check on OOW.
regards,
Marcin
Comment by Marcin Przepiorowski — October 5, 2011 @ 12:31 am BST Oct 5,2011 |
Marcin,
Thanks, that’s an observation worth pursuing.
Comment by Jonathan Lewis — October 5, 2011 @ 11:10 am BST Oct 5,2011 |
Hi Jonathan,
I have tested new patch from MOS for 11.2.0.3 on my ZFS Storage Appliance simulator and it is working – short post about it is here – http://oracleprof.blogspot.com/2011/10/quick-note-about-hybrid-columnar.html
I have tested it only with single instance and NFS (using Oracle Direct NFS client) but I think it will be working in other lab configurations as well.
regards,
Marcin
Comment by Marcin Przepiorowski — October 12, 2011 @ 7:40 am BST Oct 12,2011 |
Hello Marcin,
Perhaps you could mount another NFS volume from any other NFS server (even just a simple Linux loopback) to see whether there is any intelligent “discovery” in patch 13041324. I think you’ll find that it will work with any NFS mount via dNFS.
Comment by Kevin Closson — October 12, 2011 @ 4:06 pm BST Oct 12,2011
Marcin,
Thanks for the link.
As soon as I can make some time I’ll probably have a go at installing that on my laptop. As usual, though, life is never perfect – I run VMWare, so I guess I’ll have to do some hacking to get the ZFS simulator into it, I’ve got so much going on in VMWare that I’m not keen to change.
Comment by Jonathan Lewis — November 13, 2011 @ 11:58 am GMT Nov 13,2011
Hi Jonathan,
You don’t need to hack SunStorage (it is working fine with VMWare anyway). I followed Kevin suggestion and I found out way to run HCC using standard Linux NFS server. Option “insecure” in NFS configuration will do a trick – see my post for details – http://oracleprof.blogspot.com/2011/11/dnfs-configuration-and-hybrid-column.html
regards,
Marcin
Comment by Marcin Przepiorowski — November 13, 2011 @ 12:34 pm GMT Nov 13,2011
Marcin,
Another useful link.
Thanks
Comment by Jonathan Lewis — November 13, 2011 @ 4:35 pm GMT Nov 13,2011
I should think the same mount options used for OpenFiler would also allow jsut a simple local NFS loopback mount to work along with HCC as well. Just a thought.
Comment by oradba — November 14, 2011 @ 4:37 pm GMT Nov 14,2011
[…] See Jonathan Lewis nice cheat to keep the HCC compressed table that gets created during the evaluation under the covers – […]
Pingback by DBMS_COMPRESSION Example « The Oracle Instructor — October 5, 2011 @ 6:48 am BST Oct 5,2011 |
[…] a little follow-up to my previous note on hybrid columnar compression. The following is the critical selection of code I extracted from the trace file after tracing a […]
Pingback by HCC – 2 « Oracle Scratchpad — October 5, 2011 @ 12:08 pm BST Oct 5,2011 |
[…] a little follow-up to my previous note on hybrid columnar compression. The following is the critical selection of code I extracted from the trace file after tracing a […]
Pingback by HCC – 2 « Ukrainian Oracle User Group — October 6, 2011 @ 9:32 am BST Oct 6,2011 |
Folks,
The HCC Exadata storage enforcement has nothing to do with smart scan capable. It has simply to do with a routine in kds.c called kcfis_tablespace_is_on_sage(). I suspect there is a patch now with this code removed to support the Oracle Sept 30 announcement that Oracle-flavored block/file storage supports HCC:
http://www.oracle.com/us/corporate/press/508020
The license guide now reads as quoted below. So if you get the bits that support this new “Open Systems” view of HCC you should also be able to create an NCC tablespace or HCC table in any storage. The customer will have to be his own police.
HCC is in fact an intrinsic RDBMS feature and was fully supported during the 11g Beta on any storage only to be temporarily limited through a couple lines of code in kds.c and then finally to be opened up to only Oracle supplied general purpose file and block storage. I obviously have very strong opinions about Oracle’s move in this matter.
Folks should be aware that the Apress Expert Exadata book covers the fact that HCC decompression does drive cell CPU utilization to a critical level at which point cells “passthru” HCC CUs that are *not* decompressed even during a smart scan.
Features Specific to Exadata, ZFS, and Pillar Axiom 600 Storage Platforms
Hybrid Columnar Compression is a feature of the Enterprise Edition of Oracle Database 11g that is dependent on the underlying storage system. Only Exadata storage, ZFS storage, and Pillar Axiom 600 storage support Hybrid Columnar Compression. On storage platforms other than those previously mentioned, Hybrid Columnar Compression is not available, and no right to use Hybrid Columnar
Compression is granted by a license of the Oracle Database.
Comment by kevinclosson — October 6, 2011 @ 5:53 pm BST Oct 6,2011 |
Kevin,
Thanks for the comments.
Comment by Jonathan Lewis — November 13, 2011 @ 11:59 am GMT Nov 13,2011 |
How do I find and decide, which are the suitable candidates (tables) for hybrid columnar compression and Basic compression for DB Optimization in oracle?
You could start with the compression advisor. See: https://jonathanlewis.wordpress.com/2011/10/04/hcc/
Trackback by Quora — October 15, 2015 @ 6:13 pm BST Oct 15,2015 |
[…] jest blokowana na zasadzie „nie, bo nie”. Pomysły na jej odblokowanie publikowali już Jonathan Lewis i Martin Berger. Natomiast zupełnie legalnie z HCC korzysta funkcja […]
Pingback by HCC – ocena skuteczności na laptopie | Explain IT — December 5, 2015 @ 12:05 pm GMT Dec 5,2015 |
[…] jest blokowana na zasadzie „nie, bo nie”. Pomysły na jej odblokowanie publikowali już Jonathan Lewis i Martin Berger. Natomiast zupełnie legalnie z HCC korzysta funkcja […]
Pingback by Kompresja HCC – ocena skuteczności na laptopie – ExplainIT — August 9, 2016 @ 7:54 pm BST Aug 9,2016 |