Just a little follow-up to my previous note on hybrid columnar compression. The following is the critical selection of code I extracted from the trace file after tracing a run of the advisor code against a table with 1,000,000 rows in it:
create table "TEST_USER".DBMS_TABCOMP_TEMP_ROWID1
tablespace "USERS" nologging
as
select /*+ FULL(mytab) NOPARALLEL(mytab) */
rownum rnum, mytab.*
from "TEST_USER"."T1" mytab
where rownum <= 1000001
create table "TEST_USER".DBMS_TABCOMP_TEMP_ROWID2
tablespace "USERS" nologging
as
select /*+ FULL(mytab) NOPARALLEL(mytab) */ *
from "TEST_USER".DBMS_TABCOMP_TEMP_ROWID1 mytab
where rnum >= 1
alter table "TEST_USER".DBMS_TABCOMP_TEMP_ROWID2 set unused(rnum)
create table "TEST_USER".DBMS_TABCOMP_TEMP_UNCMP
tablespace "USERS" nologging
as
select /*+ FULL(mytab) NOPARALLEL (mytab) */ *
from "TEST_USER".DBMS_TABCOMP_TEMP_ROWID2 mytab
create table "TEST_USER".DBMS_TABCOMP_TEMP_CMP
organization heap
tablespace "USERS"
compress for archive high
nologging
as
select /*+ FULL(mytab) NOPARALLEL (mytab) */ *
from "TEST_USER".DBMS_TABCOMP_TEMP_UNCMP mytab
drop table "TEST_USER".DBMS_TABCOMP_TEMP_ROWID1 purge
drop table "TEST_USER".DBMS_TABCOMP_TEMP_ROWID2 purge
drop table "TEST_USER".DBMS_TABCOMP_TEMP_UNCMP purge
drop table "TEST_USER".DBMS_TABCOMP_TEMP_CMP purge
Note: in my example the code seems to pick the first 1M rows in the table; if this is the way Oracle works for larger volumes of data this might give you a unrepresentative set of data and misleading results. I would guess, though, that this may be a side effect of using a small table in the test; it seems likely that if I had a much larger table – perhaps in the 10s of millions of rows – then Oracle would use a sample clause to select the data. If Oracle does use the sample clause then the time to do the test will be influenced by the time it takes to do a full tablescan of the entire data set.
Note 2: The code to drop all 4 tables runs only at the end of the test. If you pick a large sample size you will need enough free space in the tablespace to create three tables hold data of around that sample size, plus the final compressed table. This might be more space, and take more time, than you initially predict.
Note 3: There are clues in the trace file suggesting that Oracle may choose to sort the data (presumably by adding an order by clause in the final CTAS) to maximise compression.
Note 4: You’ve got to wonder why Oracle creates two copies of the data before coming up with the final compressed copy. You might also why the UNCMP copy isn’t created with PCTFREE 0 to allow for a more reasonable comparison between the “free” option for archiving the table and the compressed version. (It would also be more useful to have a comparision between the free “basic compression” and the HCC compression, rather than the default 10% free space copy.)
For reference (though not to be taken too seriously) the following figures show the CPU and Elapsed times for creating the four tables:
Table CPU Ela ----- ----- --- Rowid1 1.12 6.67 Rowid2 0.70 0.20 UNCMP 0.59 7.31 CMP 18.29 0.04
Don’t ask me why the elapsed times don’t make sense; but do note that this was 11.2.0.2 on 32-bit Windows running in a VM.
And a few more statistics for comparison, showing block sizes of the test table of 1M rows:
Original size: 10,247 Data size reported by dbms_compression: 10,100 Final size reported by dbms_compression: 2,438 Original table recreated at pctfree 0: 9,234 Original table with basic compression: 8,169 Optimal sort and basic compression: 6,781
There’s no question that HCC can give you much better results than basic compression – but it’s important to note that the data patterns and basic content make a big difference to how well the data can be compressed.
Footnote: The question of how indexes work with HCC tables came up in one of the presentations I went to. The correct answer is: “not very well”.


Is there a “less than” and maybe six zeroes missing in the trace? I see four tables of one row (where rownum = 1) each created and then dropped.
Comment by Flado — October 5, 2011 @ 2:30 pm UTC Oct 5,2011 |
Possibly but I’ll have to wait a bit before I can restart my laptop to check exactly what’s there.
It’s also possible, of course that the odd 1000001 rows is because this ‘rownum = 1′ is followed by ‘insert 1,000,000′ – and I was only
looking for create, alter and drop statement.
Update: I’ve looked at the original trace, and it had “<= 1000001". I have no idea how I got the wrong predicate while extracting and editing the text.
Comment by Jonathan Lewis — October 5, 2011 @ 3:07 pm UTC Oct 5,2011 |
Im in the, some would say fortunate, position of having 3 exadata boxes that I work on. Now, Im no expert .. but lets just say that when people talk about the “bleeding” edge, I know what they mean.
Sure I see the things flying, when they are up that is. But, I have to say, Im impressed … most of all with this flash cache and intelligent storage … BIIGGG queries running somewhat faster than any previous RAC system Ive worked on.
Comment by Anon. — October 11, 2011 @ 3:16 pm UTC Oct 11,2011 |
I forgot to add the whole point of my comment in relation to this post … Ive been having “fun” with oracle and indexes on HCC. We wanted to use it due to the shere size of the databases and the fact that its hard to get a fast backup solution in place. Ie cannot reliably/speedily backup 20Tb of data, so reduce the size down instead. Unfortunately, right now we are having to stick with regular compression.
Comment by Anon. — October 11, 2011 @ 4:01 pm UTC Oct 11,2011 |