Hybrid Columnar Compression is one of the big features of Exadata that can make fairly dramatic differences to the amount of space it takes to store your data. But how do you find out what’s going on under the covers if you haven’t got an Exadata machine in your garage ?
Here’s a simple starting point that occurred to me a couple of days ago after the product manager (or some such) pointed out that there was no need to make an Exadata emulator available to anyone because all you needed was the compression advisor which you could trust because it actually compressed a sample of your data to see how well it could compress.
So here’s a simple example of how you can use the advisor:
declare /* COMP_NOCOMPRESS COMP_FOR_OLTP COMP_FOR_QUERY_HIGH NUMBER COMP_FOR_QUERY_LOW COMP_FOR_ARCHIVE_HIGH COMP_FOR_ARCHIVE_LOW */ m_scratchtbsname varchar2(32) := 'USERS'; m_ownname varchar2(32) := 'TEST_USER'; m_tabname varchar2(32) := 'T1'; m_partname varchar2(32) := null; m_comptype number(10,0) := dbms_compression.comp_for_archive_high; m_blkcnt_cmp pls_integer; m_blkcnt_uncmp pls_integer; m_row_cmp pls_integer; m_row_uncmp pls_integer; m_cmp_ratio number; m_comptype_str varchar2(4000); begin DBMS_COMPRESSION.GET_COMPRESSION_RATIO ( scratchtbsname => m_scratchtbsname, ownname => m_ownname, tabname => m_tabname, partname => m_partname, comptype => m_comptype, blkcnt_cmp => m_blkcnt_cmp, blkcnt_uncmp => m_blkcnt_uncmp, row_cmp => m_row_cmp, row_uncmp => m_row_uncmp, cmp_ratio => m_cmp_ratio, comptype_str => m_comptype_str ); dbms_output.put_line('Blocks compressed: ' || m_blkcnt_cmp); dbms_output.put_line('Blocks uncompressed: ' || m_blkcnt_uncmp); dbms_output.put_line('Rows per block compressed: ' || m_row_cmp); dbms_output.put_line('Rows per block uncompressed: ' || m_row_uncmp); dbms_output.put_line('Compression Ratio: ' || m_cmp_ratio); dbms_output.put_line('Comment: ' || m_comptype_str); end; /
The sample code takes a table – which has to hold at least a million rows, creates an uncompressed copy of a sample (of one million and one rows in my case), then creates a compressed copy of the sample, and reports a few stats about the results. In fact, as it runs it creates and drop several tables – including, in the closing steps, a table called ‘DBMS_TABCOMP_TEMP_CMP’. (If you’re wondering how I know this, think sql_trace).
Having seen how the package worked, I created a DDL trigger that raised an error on any attempt to drop that particular table, then re-ran the packaged call. Here’s the code for the trigger I created:
create or replace trigger before_drop before drop on test_user.schema begin if ( (sys.dbms_standard.DICTIONARY_OBJ_OWNER = 'TEST_USER') and (sys.dbms_standard.dictionary_obj_type = 'TABLE') and (sys.dbms_standard.DICTIONARY_OBJ_NAME = 'DBMS_TABCOMP_TEMP_CMP') ) then raise_application_error(-20001,'Trying to drop compressed table'); return; end if; end; /
So the package did it’s thing, and failed, leaving a table with archive high compression in place. You can’t do much with it, of course; any attempt to select from it, or index it, results in Oracle error: “ORA-64307: hybrid columnar compression is only supported in tablespaces residing on Exadata storage”. I think the only thing you can do with it is “alter table move nocompress” – which recreates it as an uncompressed table. The option to rebuild it uncompressed would allow you to run a dataguard type of scenario from an Exadata box to a non-Exadata box, and still be able to access the data (eventually) if you have to fail over.
Footnote: to use the packaged call, the tablespace has to be ASSM, with system allocated extents, and the table name, owner, and tablespace name need to be in upper case (or quoted if the names have got into the data dictionary in mixed case).
Footenote 2: In an Oracle presentation yesterday one of the slides reported “compression ratios from some of our customers”, with a worst case of 15x for the “archive high” compression; the first data set, modelling sales data, that I threw at the advisor gave me a compression ratio of 4.1. Compression doesn’t always do everything you might hope without a little help.
Footenote 3: There is one more parameter to the procedure: subset_numrows which gives the number of rows to be used in the compression sample. This defaults to 1,000,000 (but ends up using 1,000,001), has to be at least one million, and can be set to -1 to compress the entire data set.