Oracle Scratchpad

December 13, 2016

Index Compression

Filed under: 12c,Indexing,Oracle — Jonathan Lewis @ 1:11 pm GMT Dec 13,2016

Richard Foote has published a couple of articles in the last few days on the new (licensed under the advanced compression option) compression mechanism in 12.2 for index leaf blocks. The second of these pointed out that the new “high compression” mechanism was even able to compress single-column unique indexes – a detail that doesn’t make sense and isn’t allowed for the older style “leading edge deduplication” mechanism for index compression.

In 12.2 an index can be created (or rebuilt) with the option “compress advanced high” – and at the leaf-block level this will create “compression units” (possibly just one per leaf block – based on my early testing) that takes the complexity of compression far beyond the level of constructing a directory of prefixes. Richard demonstrated the benefit by creating a table with a numeric unique index – then compressed the index, reducing its size from 2,088 leaf blocks to 965 leaf blocks, which is pretty dramatic difference.

It crossed my mind, though, to wonder whether the level of compression was a side effect of the very straightforward code that Richard had used to create the data set: the table was a million rows with a primary key that had been generated as the rownum selected from the now-classic “connect by..” query against dual, and the row length happened to allow for 242 rows per 8KB table block.

If you pause to think about this data set you realise that if you pick the correct starting point and walk through 242 consecutive entries of the index you will be walking through 242 consecutive rows in the table starting from the zeroth row in a particular table block and ending at the 241st row in that block. A rowid (as stored by Oracle in a simple B-tree index) consists of 6 bytes and the first five bytes of the rowid will be the same for the first 256 rows in any one table block (and the first four will still be the same for the remaining rows in the block). Richard’s data set will be very close to ideal for any byte-oriented, or bit-oriented, compression algorithm because (to use Oracle terminology) the data will have a perfect clustering_factor. (He might even have got a slightly better compression ratio if he’d used an /*+ append */ on the insert, or done a CTAS, and reduced the rowsize to get a uniform 256 rows per table block.)

So how do things change if you randomise the ordering of the key ? Here’s a variant on Richard’s code:


rem
rem	Script:		index_compression_12c_2.sql
rem	Author:		Jonathan Lewis
rem	Dated:		Dec 2016
rem
rem	Last tested 
rem		12.2.0.1
rem

execute dbms_random.seed(0)

create table t1
nologging
as
with generator as (
        select 
                rownum id
        from dual 
        connect by 
                level <= 1e4
)
select
	rownum				id,
	lpad('x',130,'x')		padding
from
        generator       v1,
        generator       v2
where
        rownum <= 1e6
order by
	dbms_random.value
;

select table_name, blocks, round(num_rows/blocks,0) rpb from user_tables where table_name = 'T1';

drop index t1_i1;
create unique index t1_i1 on t1(id);
execute dbms_stats.gather_index_stats(user,'t1_i1');
select index_name, compression, pct_free, leaf_blocks from user_indexes where index_name = 'T1_I1';

drop index t1_i1;
create unique index t1_i1 on t1(id) compress advanced high;
execute dbms_stats.gather_index_stats(user,'t1_i1');
select index_name, compression, pct_free, leaf_blocks from user_indexes where index_name = 'T1_I1';

The initial drop index is obviously redundant, and the calls to gather_index_stats should also be redundant – but they’re there just to make it obvious I haven’t overlooked any checks for correctness in the build and stats.

You’ll notice that my row length is going to be slightly more “real-world” than Richard’s so that the degree of compression I get from nearly identical rowid values is likely to be reduced slightly, and I’ve completely randomised the order of key values.

So what do the results like ?

With the default pctfree = 10, and in a tablespace of uniform 1MB extents, 8KB blocks, utilising ASSM I get this:


TABLE_NAME               BLOCKS        RPB
-------------------- ---------- ----------
T1                        19782         51

INDEX_NAME           COMPRESSION     PCT_FREE LEAF_BLOCKS
-------------------- ------------- ---------- -----------
T1_I1                DISABLED              10        2088
T1_I1                ADVANCED HIGH         10        1303

Unsurprisingly the uncompressed index is exactly the same size as Richard’s (well, it was just the integers from 1 to 1M in both cases) but the compression ratio is significantly less – though still pretty impressive.

Of course, for this type of index my example probably goes to the opposite extreme from Richard’s. Realistically if you have a sequence based key with an OLTP pattern of data arrival then consecutive key values are likely to be scattered within a few blocks of each other rather than being scattered complely randomly across the entire width of the table; so a more subtle model (using a suitable number of concurrent processes to insert ids based on a sequence, perhaps) would probably get a better compression ratio than I did, though a worse one than Richard’s.There’s also the issue of the size of the key value itself – once you get to values in the order of 10 million to 100 million you’re looking at mostly 4 bytes (internal format) storage where for large runs of values the first 3 bytes match, possibly leading to a better compression ratio.

Of course the question of globally partitioned indexes will be relevant for some people since the principle reason for global indexes on partitioned tables is to enforce uniqueness on column combinations that don’t include the partition key, and that introduces another possible benefit – the rowid goes up to 10 bytes, of which the first 4 bytes are the object id of the table partition: depending on the nature of the partitioning that repeated 4 bytes per row may be close to non-existent after compression, giving you a better compression ratio on globally partitioned than you get on any other type of single column unique index.

Once you start looking at the details there are a surpising number of factors that can affect how well the advanced compression high can work.

Footnote:

Once you’ve created the index, you can start poking around in all sorts of ways to try and work out what the compression algorithm does. A simple block dump is very informative, with lots of clues in the descriptive details – and lots of puzzles when you start looking too closely. There are hints that this type of index compression adopts a strategy similar to “oltp comprssion” for tables in that compression occurs only as the leaf block becomes full – and possibly allows some sort of batching process within a leaf block before finally compressing to a single contiguous unit. (This is just conjecture, at present: the only firm statement I’ll make about the actual implementation of index compression is that it uses a lot of CPU; in my example the baseline create index took about 1.5 seconds of CPU, the compressed create took about 4.5 seconds of CPU.)

There are also a couple of amusing side effects that may confound those who use the old “validate index / query index_stats” two-step to decide when to rebuild indexes. Here’s what I got on the compressed index:


SQL> validate index t1_i1;

SQL> select blocks, lf_rows, lf_rows_len, btree_space, used_space, pct_used from index_stats;

    BLOCKS    LF_ROWS LF_ROWS_LEN BTREE_SPACE USED_SPACE   PCT_USED
---------- ---------- ----------- ----------- ---------- ----------
      1408    1000000	 14979802    10416812	14994105	144

My index is using 144% of the space that it has been allocated. You don’t have to be very good at maths (or math, even) to realise that something’s wrong with that number.

4 Comments »

  1. Hi Jonathan,

    it seems like index compression could be interesting peace of code inside oracle kernel, however I’d definitely prefer to use storage level compression (like IBM SVC/Storwize) for index compression, you get easily compression ratio 2-3x for b-tree indexes
    – oltp compression is similar to index and there is plenty of cpu cycles spent on in. And oracle charges you per core, so this gets very expensive easily. With storage compression, you burn deidacated cpu inside storage, which are not licensed by expensive Oracle.
    – there will be many strange behaviour with index compression and many bugs for sure. As always with new oracle features.

    Regards
    Pavol Babel

    Comment by Pavol Babel — December 14, 2016 @ 6:42 pm GMT Dec 14,2016 | Reply

    • Pavol,
      Thanks for that comment – it’s an interesting variation on the problem of “picking your bottleneck”, though in this case it’s not so much a bottleneck as a way of using the right CPU at the right place. It’s even a bit Exadata-like – using the CPU at the other end of the wire to rather than using the database server CPU.

      I haven’t looked at the options – but I do recall that when Delphix first came out part of their efficiency was that they would compress an Oracle block down to the minimum number of disc sectors so 8KB might turn into (e.g. 3 x 512 bytes); this did mean that when the new 4KB sector discs appeared there was effectively a hard limit (of x2) on how much compression you could get from a single 8KB block. Would something similar apply with the technology you’ve mentioned – viz compression at the level of the Oracle block and a minimum of one sector per Oracle block ?

      Comment by Jonathan Lewis — December 17, 2016 @ 3:27 pm GMT Dec 17,2016 | Reply

  2. Hello Jonathan,

    Here is the excerpt of definition of PCT_USED column in INDEX_STATS view and other columns relevant to this:

    ceil(((kdxstlln+kdxstbln+kdxstpln)*100)/
      (kdxstlbk*kdxstlub+kdxstbbk*kdxstbub)) pct_used,
    kdxstlbk*kdxstlub+kdxstbbk*kdxstbub     btree_space,
    kdxstlln+kdxstbln+kdxstpln              used_space,
    kdxstlbk        lf_blks,
    kdxstlub        lf_blk_len,
    kdxstlln        lf_rows_len,
    kdxstbbk        br_blks,
    kdxstbln        br_rows_len,
    kdxstbub        br_blk_len,
    kdxstpln        pre_rows_len,
    

    Let’s simplify the PCT_USED formula to end up with the base (not derived ones) values:

    pct_used=ceil(((kdxstlln+kdxstbln+kdxstpln)*100)/(kdxstlbk*kdxstlub+kdxstbbk*kdxstbub))=
    =ceil(used_space*100/btree_space)=
    =ceil((lf_rows_len+br_rows_len+pre_rows_len)*100/(lb_blks*lb_blk_len+br_blks*br_blk_len))
    

    The next data was gathered in my 12.2 database (beta), where type=”UNC” means the uncompressed index in your example, and “COMP” means the compressed one:

    TYPE BLOCKS LF_ROWS LF_BLKS LF_ROWS_LEN LF_BLK_LEN BR_ROWS BR_BLKS BR_ROWS_LEN BR_BLK_LEN BTREE_SPACE USED_SPACE   PCT_USED   PRE_ROWS PRE_ROWS_LEN
    ---- ------ ------- ------- ----------- ---------- ------- ------- ----------- ---------- ----------- ---------- ---------- ---------- ------------
    UNC    2176 1000000    2088    14979802       7996    2087       4       22935       8028    16727760   15002737         90          0            0
    COMP   1408 1000000    1311    14979802       7976    1310       3       14389       8028    10480620   14994191        144          0            0
    

    We may see that the LF_ROWS_LEN values are calculated as if the data is not compressed – the values for both indexes are the same (the same true for the BR_ROWS_LEN), but LF_BLKS, BR_BLKS are exact number of blocks used.
    Coming back to our formula:

    pct_used=ceil((lf_rows_len+br_rows_len+pre_rows_len)*100/(lb_blks*lb_blk_len+br_blks*br_blk_len))
    

    The numerator operates on uncompressed values, but the denominator does not.
    This formula may be used under the assumption that data is not compressed.

    Comment by Mikhail Velikikh — December 15, 2016 @ 3:45 am GMT Dec 15,2016 | Reply

    • Mikhail,

      Thanks for taking the time to write up those notes.
      I was contemplating writing an addendum to explain why the anomaly appeared but it’s nice to have one of my readers do it.

      Comment by Jonathan Lewis — December 17, 2016 @ 3:30 pm GMT Dec 17,2016 | Reply


RSS feed for comments on this post. TrackBack URI

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Powered by WordPress.com.