Oracle Scratchpad

October 5, 2011

HCC – 2

Filed under: Exadata,HCC,Infrastructure,Oracle — Jonathan Lewis @ 12:07 pm BST Oct 5,2011

Just a little follow-up to my previous note on hybrid columnar compression. The following is the critical selection of code I extracted from the trace file after tracing a run of the advisor code against a table with 1,000,000 rows in it:

tablespace "USERS" nologging
select  /*+ FULL(mytab) NOPARALLEL(mytab) */ 
        rownum rnum, mytab.*
from    "TEST_USER"."T1"  mytab
where   rownum <= 1000001

tablespace "USERS" nologging 
select  /*+ FULL(mytab) NOPARALLEL(mytab) */ * 
where   rnum >= 1

alter table "TEST_USER".DBMS_TABCOMP_TEMP_ROWID2 set unused(rnum)

tablespace "USERS" nologging
select  /*+ FULL(mytab) NOPARALLEL (mytab) */ *

organization heap
tablespace "USERS"
compress for archive high
select  /*+ FULL(mytab) NOPARALLEL (mytab) */ *


Note: in my example the code seems to pick the first 1M rows in the table; if this is the way Oracle works for larger volumes of data this might give you a unrepresentative set of data and misleading results. I would guess, though, that this may be a side effect of using a small table in the test; it seems likely that if I had a much larger table – perhaps in the 10s of millions of rows – then Oracle would use a sample clause to select the data. If Oracle does use the sample clause then the time to do the test will be influenced by the time it takes to do a full tablescan of the entire data set.

Note 2: The code to drop all 4 tables runs only at the end of the test. If you pick a large sample size you will need enough free space in the tablespace to create three tables hold data of around that sample size, plus the final compressed table. This might be more space, and take more time, than you initially predict.

Note 3: There are clues in the trace file suggesting that Oracle may choose to sort the data (presumably by adding an order by clause in the final CTAS) to maximise compression.

Note 4: You’ve got to wonder why Oracle creates two copies of the data before coming up with the final compressed copy. You might also why the UNCMP copy isn’t created with PCTFREE 0 to allow for a more reasonable comparison between the “free” option for archiving the table and the compressed version. (It would also be more useful to have a comparision between the free “basic compression” and the HCC compression, rather than the default 10% free space copy.)

For reference (though not to be taken too seriously) the following figures show the CPU and Elapsed times for creating the four tables:

Table      CPU    Ela
-----    -----    ---
Rowid1    1.12  6.67
Rowid2    0.70  0.20
UNCMP     0.59  7.31
CMP      18.29  0.04

Don’t ask me why the elapsed times don’t make sense; but do note that this was on 32-bit Windows running in a VM.

And a few more statistics for comparison, showing block sizes of the test table of 1M rows:

Original size:                           10,247
Data size reported by dbms_compression:  10,100
Final size reported by dbms_compression:  2,438
Original table recreated at pctfree 0:    9,234
Original table with basic compression:    8,169
Optimal sort and basic compression:       6,781

There’s no question that HCC can give you much better results than basic compression – but it’s important to note that the data patterns and basic content make a big difference to how well the data can be compressed.

Footnote: The question of how indexes work with HCC tables came up in one of the presentations I went to. The correct answer is: “not very well”.


  1. Is there a “less than” and maybe six zeroes missing in the trace? I see four tables of one row (where rownum = 1) each created and then dropped.

    Comment by Flado — October 5, 2011 @ 2:30 pm BST Oct 5,2011 | Reply

    • Possibly but I’ll have to wait a bit before I can restart my laptop to check exactly what’s there.

      It’s also possible, of course that the odd 1000001 rows is because this ‘rownum = 1’ is followed by ‘insert 1,000,000’ – and I was only
      looking for create, alter and drop statement.

      Update: I’ve looked at the original trace, and it had “<= 1000001". I have no idea how I got the wrong predicate while extracting and editing the text.

      Comment by Jonathan Lewis — October 5, 2011 @ 3:07 pm BST Oct 5,2011 | Reply

  2. Im in the, some would say fortunate, position of having 3 exadata boxes that I work on. Now, Im no expert .. but lets just say that when people talk about the “bleeding” edge, I know what they mean.
    Sure I see the things flying, when they are up that is. But, I have to say, Im impressed … most of all with this flash cache and intelligent storage … BIIGGG queries running somewhat faster than any previous RAC system Ive worked on.

    Comment by Anon. — October 11, 2011 @ 3:16 pm BST Oct 11,2011 | Reply

  3. I forgot to add the whole point of my comment in relation to this post … Ive been having “fun” with oracle and indexes on HCC. We wanted to use it due to the shere size of the databases and the fact that its hard to get a fast backup solution in place. Ie cannot reliably/speedily backup 20Tb of data, so reduce the size down instead. Unfortunately, right now we are having to stick with regular compression.

    Comment by Anon. — October 11, 2011 @ 4:01 pm BST Oct 11,2011 | Reply

RSS feed for comments on this post. TrackBack URI

Comments and related questions are welcome.

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Powered by