Just a little follow-up to my previous note on hybrid columnar compression. The following is the critical selection of code I extracted from the trace file after tracing a run of the advisor code against a table with 1,000,000 rows in it:
create table "TEST_USER".DBMS_TABCOMP_TEMP_ROWID1 tablespace "USERS" nologging as select /*+ FULL(mytab) NOPARALLEL(mytab) */ rownum rnum, mytab.* from "TEST_USER"."T1" mytab where rownum <= 1000001 create table "TEST_USER".DBMS_TABCOMP_TEMP_ROWID2 tablespace "USERS" nologging as select /*+ FULL(mytab) NOPARALLEL(mytab) */ * from "TEST_USER".DBMS_TABCOMP_TEMP_ROWID1 mytab where rnum >= 1 alter table "TEST_USER".DBMS_TABCOMP_TEMP_ROWID2 set unused(rnum) create table "TEST_USER".DBMS_TABCOMP_TEMP_UNCMP tablespace "USERS" nologging as select /*+ FULL(mytab) NOPARALLEL (mytab) */ * from "TEST_USER".DBMS_TABCOMP_TEMP_ROWID2 mytab create table "TEST_USER".DBMS_TABCOMP_TEMP_CMP organization heap tablespace "USERS" compress for archive high nologging as select /*+ FULL(mytab) NOPARALLEL (mytab) */ * from "TEST_USER".DBMS_TABCOMP_TEMP_UNCMP mytab drop table "TEST_USER".DBMS_TABCOMP_TEMP_ROWID1 purge drop table "TEST_USER".DBMS_TABCOMP_TEMP_ROWID2 purge drop table "TEST_USER".DBMS_TABCOMP_TEMP_UNCMP purge drop table "TEST_USER".DBMS_TABCOMP_TEMP_CMP purge
Note: in my example the code seems to pick the first 1M rows in the table; if this is the way Oracle works for larger volumes of data this might give you a unrepresentative set of data and misleading results. I would guess, though, that this may be a side effect of using a small table in the test; it seems likely that if I had a much larger table – perhaps in the 10s of millions of rows – then Oracle would use a sample clause to select the data. If Oracle does use the sample clause then the time to do the test will be influenced by the time it takes to do a full tablescan of the entire data set.
Note 2: The code to drop all 4 tables runs only at the end of the test. If you pick a large sample size you will need enough free space in the tablespace to create three tables hold data of around that sample size, plus the final compressed table. This might be more space, and take more time, than you initially predict.
Note 3: There are clues in the trace file suggesting that Oracle may choose to sort the data (presumably by adding an order by clause in the final CTAS) to maximise compression.
Note 4: You’ve got to wonder why Oracle creates two copies of the data before coming up with the final compressed copy. You might also why the UNCMP copy isn’t created with PCTFREE 0 to allow for a more reasonable comparison between the “free” option for archiving the table and the compressed version. (It would also be more useful to have a comparision between the free “basic compression” and the HCC compression, rather than the default 10% free space copy.)
For reference (though not to be taken too seriously) the following figures show the CPU and Elapsed times for creating the four tables:
Table CPU Ela ----- ----- --- Rowid1 1.12 6.67 Rowid2 0.70 0.20 UNCMP 0.59 7.31 CMP 18.29 0.04
Don’t ask me why the elapsed times don’t make sense; but do note that this was 126.96.36.199 on 32-bit Windows running in a VM.
And a few more statistics for comparison, showing block sizes of the test table of 1M rows:
Original size: 10,247 Data size reported by dbms_compression: 10,100 Final size reported by dbms_compression: 2,438 Original table recreated at pctfree 0: 9,234 Original table with basic compression: 8,169 Optimal sort and basic compression: 6,781
There’s no question that HCC can give you much better results than basic compression – but it’s important to note that the data patterns and basic content make a big difference to how well the data can be compressed.
Footnote: The question of how indexes work with HCC tables came up in one of the presentations I went to. The correct answer is: “not very well”.