Oracle Scratchpad

July 23, 2012

Compression Units – 2

Filed under: CBO,Exadata,HCC,Indexing,Oracle — Jonathan Lewis @ 4:41 pm BST Jul 23,2012

When I arrived in Edinburgh for the UKOUG Scotland conference a little while ago Thomas Presslie, one of the organisers and chairman of the committee, asked me if I’d sign up on the “unconference” timetable to give a ten-minute talk on something. So I decided to use Hybrid Columnar Compression to make a general point about choosing and testing features. For those of you who missed this excellent conference, here’s a brief note of what I said.

The first point that I had to stress was that I was going to describe a total disaster that appeared when I was doing some work with Exadata, but what I was doing was a deliberate attempt to cause a problem, and the outcome was the sort of thing I had been expecting. The problem I had wasn’t an Exadata problem, it was an investigative test. A critical feature of using Exadata is that it changes the “balance of power” between using indexes and doing tablescans and makes it much harder (at least initially) to decide which columns should be indexed.

I created my disaster with a simple query that used an index when it should have been using a tablescan – so not a new problem, just a variation of the commonest optimizer question (in reverse) that gets asked on the Internet. I had created a data set of about 32 million rows (actually 225)using archive high compression, and then I had created a bitmap index on one of the columns as the first step in setting up some tests of star transformations. Having got the first index in place, though, I decided to run the following aggregate query before building any more indexes:

        n_128k between 1000 and 1999

As you might guess from the column name my index is on a column that has (roughly) 128,000 distinct values, and my query is asking for data relating to 1,000 of those values. I generated the column, as I often do, using the mod() function so the rows were evenly spread (in terms  both of quantity and scattering) throughout the table – and I’m after one row in 128 for a total of about 256,00 rows out of my 32 million.

The optimizer chose to use an indexed access path for this query and took about 2,700 CPU seconds to complete (that’s about 1/100 CPU second per row). When hinted to do a tablescan the query completed in 22 (CPU) seconds.

For what it was doing, the machine was working really quickly when it used the index – but it shouldn’t have used the index. The compression was so good that I had roughly 32,000 rows packed into compression units of 32 blocks each – and the nature of my query (combined with the nature of my data) required Oracle to walk a 32-block CU and decompress one column of that CU for each of the 256,000 rows that I wanted.

The optimizer code has no idea about how much CPU it’s going to use on decompression – all it could see was that it needed quite a lot of rows, but they were packed into a fairly small space. (The fact that this was a bitmap index, with it’s 80/20 packing assumption, didn’t help either – a B*tree index using an estimated clustering_factor might have made a better choice.)

WARNING: I have heard several stories about people thinking that you don’t need ANY indexes if you’re using Exadata – even if you’re using it for a datawarehouse system. That suggestion is WRONG. You will probably decide that you need fewer indexes, you may even manage to get to the point where your system performs sufficiently well with none at all, but you should not assume that a move to Exadata automatically means a move to no indexes. Hybrid Columnar Compression makes an enormous difference to the desirability of specific indexes (as can Storage indexes), but you still need to calculate the pros and cons of each index individually.

Footnote: I’ll be particularly in hearing what Richard Foote (Indexing whizz-kid) and Maria Colgan (Optimizer lady) have to say about the collision between Exadata and indexing strategies at the E4 conference that’s coming up in a few weeks’ time. Frits Hoogland’s talk on Exadata and OLTP should also be interesting – but I’ll bet that he makes the point that OLTP and HCC mix like OLTP and bitmap indexes, i.e. not at all.

I’ll  have more to say about this, or similar examples, in the future – this was just a quick note to capture the contents of a little impromptu talk.


  1. I’m glad you reiterated the “WARNING”. Even with Exadata, thinking and evaluation is still required.

    Comment by Greg Rahn — July 23, 2012 @ 5:27 pm BST Jul 23,2012 | Reply

  2. [...] those who have read the previous posting of how I engineered an Exadata disaster and want to reproduce it, here’s the script I used to [...]

    Pingback by Compression Units – 3 « Oracle Scratchpad — July 27, 2012 @ 5:09 pm BST Jul 27,2012 | Reply

  3. Whenever I present to our sales orgs I make it a point to dispell a number of these myths, such as “you don’t need indexes” or “onecommand is a simple way to patch the whole system”. I saw Maria’s presentation internally a few weeks ago and it was exceptionally good. Look forward to seeing you at E4.

    Comment by Tyler Muth — July 27, 2012 @ 5:40 pm BST Jul 27,2012 | Reply

    • Tyler,

      I’m even keener to hear Maria now – it’s not an easy problem to deal with.
      I hope she’s ready for a few questions ;)

      Comment by Jonathan Lewis — August 7, 2012 @ 6:16 pm BST Aug 7,2012 | Reply

RSS feed for comments on this post. TrackBack URI

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

The Rubric Theme Blog at


Get every new post delivered to your Inbox.

Join 3,507 other followers