Oracle Scratchpad

May 8, 2007

More of a hash

Filed under: Partitioning,Performance,Tuning — Jonathan Lewis @ 8:46 pm BST May 8,2007

No matter how simple a topic you pick, a few minutes thought invariably allow you to conjure up some new anomalies that could appear in the right (or possibly wrong) circumstances.

Yesterday I made a few comments about hash partitioning and performance. Today I suddenly realised that global indexes on hash-partitioned tables could exhibit an unfortunate behaviour pattern that would make them pretty useless – unless hacked or hinted. Consider the following table:

create table t1 
partition by hash(id) 
-- partitions 1 
partitions 4 
as 
with generator as ( 
	select	--+ materialize 
		rownum 	id 
	from	all_objects 
	where	rownum <= 3000 
) 
select 
	trunc(sysdate) + trunc((rownum-1)/100)	trade_date, 
	rownum					id, 
	lpad(rownum,10,'0')			small_vc, 
	rpad('x',100)				padding 
from 
	generator	v1, 
	generator	v2 
where 
	rownum <= 100000 
;  

create index t1_i1 on t1(trade_date, id);  

-- now use dbms_stats to generate statistics  

There are 100,000 rows, and I’ve generated data in a way that is essentially time and sequence based. It’s a fairly common pattern of data generation. Choosing to create the specific index as a global index is perhaps a little unusual – but not beyond belief.

Notice how I’ve shown two options for creating this table – four partitions or one partition (which is a lazy way of emulating a non-partitioned table, and also happens to be the basis of a cunning trick to help you migrate from non-partitioned to partitioned tables).

Here’s the issue: based on a block size of 8KB, and in the absence of ASSM (automatic segment space management) the clustering_factor of the index is 1,815 if the table is not partitioned – or has a single partition. The clustering_factor is 75,251 when you partition the table four ways.

Switch from a non-partitioned table to a hash-partitioned table with global indexes, and you may find some cases where the optimizer stops using an index because the clustering_factor makes the cost of using the index appear much too big.

Chapter 5 of my book mentioned a few ways in which structural strategies (multiple freelists, ASSM, reverse indexes, compacting tables) could affect the clustering_factor of an index – hash partitioning is another one. By hash-partitioning, you can reduce contention on hot blocks by having multiple data insertion points in a table – but creating multiple data insertion points will change the clustering_factor of some indexes for the worse.

15 Comments »

  1. In your test case, the situation is aggravated by the fact that when hash-partitioning the table, you loose the bit of order that you had in the non-partitioned case.

    Comment by ghassan — May 9, 2007 @ 7:04 am BST May 9,2007 | Reply

  2. Ghassan, the situation isn’t “aggravated by” the partitioning; the situation is due to the partitioning; that was actually the point of the demonstration.

    I had a LOT of order in my data – and still do after hash partitioning – but thanks to Oracle’s mechanism for calculating the clustering_factor the ordering is invisible to the optimizer.

    Comment by Jonathan Lewis — May 9, 2007 @ 8:26 am BST May 9,2007 | Reply

  3. Jonathan,
    What I meant is that the result is logical. If you take the rows with the same trade_date, they will be scattered among many partitions and so many blocks, and this shows in the clustering factor. If the index was local, the clustering factor will be much better as the data, in the partition is ordered.
    If the test case, in my db (10.2.0.3), the clustering factor was 75251 for a global index, and 453 for each of the 4 partitions when created local.

    Comment by ghassan — May 9, 2007 @ 11:13 am BST May 9,2007 | Reply

  4. I couldn’t help but notice a mathematical formula here – a global index on an hash-partitioned table that includes the partitioned column is going to have a clustering factor (CF) of at least

    (1-1/num_partitions) * num_distinct (partitioned column)

    in fact in your example the CF increased from 1,815 to 75,251 (very close to 3/4th of num_distinct(id) = num_rows).

    It is apparent if one calculates mentally the CF by following (INDEX FULL SCAN) the index – whenever the partitioned column value changes, its new hash value will select a new partition at “random” and this new partition has only a probability of 1/num_partitions to be the same as the old.

    Comment by Alberto Dell'Era — May 10, 2007 @ 8:33 pm BST May 10,2007 | Reply

  5. @Alberto

    Nice observation…and the 1-1/num_partitions seems a probability and in this case an uniform distribution.

    Comment by Antonio — May 14, 2007 @ 12:33 pm BST May 14,2007 | Reply

  6. Alberto & Antonio,
    keep in mind that row distribution in a hash partition scheme is not a uniform probabilistic thing. (it is in jonathan’s test case as the id is a sequence number, with no holes).

    Comment by ghassan — May 14, 2007 @ 3:21 pm BST May 14,2007 | Reply

  7. Ghassan, a good hash function design tries to map “any” value set to a uniform set, it is its only purpose in life – in the hash partitioned table case, in order to spread the rows as uniformly as possible into all the available partitions.

    You can try by yourself and use a value set with “holes” or any other distribution. Just keep the number of values (num_distinct) much greater than the number of partitions if you want to use a single sample from your chosen distribution.

    Comment by Alberto Dell'Era — May 14, 2007 @ 9:57 pm BST May 14,2007 | Reply

  8. Jonathan,

    I hope you get notified about comments to old posts, as this topic seems most relevant to the strange behaviour I ran into lately. Test case:


    select banner from v$version;
    drop table t;
    create table t(cl varchar2(8), r  integer) 
    partition by list(cl) (
       partition big values('big'), 
       partition small values ('small'), 
       partition empty values (default)
    );
    insert /*+ append */ into t(cl,r)
    select case when level between 1 and 4 then 'small' else 'big' end,
             dbms_random.value(1,9000)
    from dual connect by level<=10000;
    commit;
    create index i_tr on t(r);

    begin dbms_stats.gather_table_stats(user,'T', cascade=>true, granularity=>'all', estimate_percent=>100); end;
    /
    explain plan for
    select * from t where cl='big' and r=:42
    union all
    select /*+index(t)*/ * from t where cl='small' and r=:42
    union all
    select /*+index(t)*/ * from t where cl='nonexistent' and r=:42;
    select * from table(dbms_xplan.display);

    Running this script in SQL Developer produces this:

    BANNER
    ----------------------------------------------------------------
    Oracle Database 10g Enterprise Edition Release 10.2.0.3.0 - Prod
    PL/SQL Release 10.2.0.3.0 - Production
    CORE 10.2.0.3.0 Production
    TNS for Linux: Version 10.2.0.3.0 - Production
    NLSRTL Version 10.2.0.3.0 - Production

    5 rows selected

    drop table t succeeded.
    create table succeeded.
    10000 rows inserted
    commit succeeded.
    create index succeeded.
    anonymous block completed
    explain plan succeeded.
    PLAN_TABLE_OUTPUT
    -------------------------------------------------------------------------------------------------------------
    Plan hash value: 2509517241

    ------------------------------------------------------------------------------------------------------------
    | Id  | Operation                           | Name | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |
    ------------------------------------------------------------------------------------------------------------
    |   0 | SELECT STATEMENT                    |      |     4 |    43 |  2501 (100)| 00:00:04 |       |       |
    |   1 |  UNION-ALL                          |      |       |       |            |          |       |       |
    |*  2 |   TABLE ACCESS BY GLOBAL INDEX ROWID| T    |     2 |    14 |     3   (0)| 00:00:01 |     1 |     1 |
    |*  3 |    INDEX RANGE SCAN                 | I_TR |     2 |       |     1   (0)| 00:00:01 |       |       |
    |*  4 |   TABLE ACCESS BY GLOBAL INDEX ROWID| T    |     1 |    10 |  2401   (2)| 00:00:04 |     2 |     2 |
    |*  5 |    INDEX RANGE SCAN                 | I_TR |  2500 |       |     9  (23)| 00:00:01 |       |       |
    |*  6 |   TABLE ACCESS BY GLOBAL INDEX ROWID| T    |     1 |    19 |    97   (3)| 00:00:01 |     3 |     3 |
    |*  7 |    INDEX RANGE SCAN                 | I_TR |   100 |       |     1   (0)| 00:00:01 |       |       |
    ------------------------------------------------------------------------------------------------------------

    Predicate Information (identified by operation id):
    ---------------------------------------------------

       2 - filter("CL"='big')
       3 - access("R"=TO_NUMBER(:42))
       4 - filter("CL"='small')
       5 - access("R"=TO_NUMBER(:42))
       6 - filter("CL"='nonexistent')
       7 - access("R"=TO_NUMBER(:42))

    I hope the code survives the formatting madness relatively intact.

    Weirdness:
    How come a range scan of the same global index (I_TR) looking for a single unknown value is expected to return different number of rows? (see lines 3, 5, and 7 of the execution plan)

    Additional observations:
    The cardinality estimate for the range scan (looking for an unknown value) appears to be num_rows(table)/num_rows(partition), thus becoming more and more inaccurate as the partition gets smaller right until it is empty, in which case the estimate stops at 1% of num_rows(table).
    The effect does not occur if the table is not partitioned or the index is LOCAL. This leads me to believe I’ve stumbled upon another danger with global indexes on list-partitioned tables. I’ve actually run into this problem head first in our test database where this behaviour caused Oracle to prefer a full scan of a totally irrelevant global index (say, on L,R) over a range scan of the relevant one (say, on R).

    I hope this is “expected behaviour” with a well-known workaround that doesn’t require me to make my indexes local :-/

    Thanks,
    Flado

    Comment by Flado — September 24, 2009 @ 9:58 am BST Sep 24,2009 | Reply

    • Flado,

      I hadn’t noticed this one before – global indexes on extremely skewed data distributions.

      I think your analysis is pretty good – though the estimate looks like it might be more like:
      num_rows(index) / num_distinct(partition)

      I’m not sure about the 1% – but I haven’t tested this carefully. When I ran your example with a 10053 trace I think the run-time plan gave me a cardinality of 1 where explain plan gave me a cardinality of 100 (and then I had to stop work and get off the train ;)) – anomalies like that sometimes happen.

      Your example reproduces on 11.1.0.6 – so I would raise this as an SR with Oracle and give them the test case.

      Comment by Jonathan Lewis — September 25, 2009 @ 5:10 pm BST Sep 25,2009 | Reply

      • Quick status update: I opened a SR and the analyst found out it reproduces on 11.1.0.7, but if optimizer_features_enable is set to ’9.2.0′, the problem disappears (cardinality estimate is correct).

        Comment by Flado — September 29, 2009 @ 9:26 am BST Sep 29,2009 | Reply

        • Flado,
          Thanks for the update.

          Comment by Jonathan Lewis — October 1, 2009 @ 6:12 pm BST Oct 1,2009

        • Flado,

          Thanks for coming up to say hello on the seminar in Bulgaria last week. I hope you enjoyed the event as much as I did.

          Thanks also for letting me have a note of the bug number (8971829) to publish on the blog so that other people can follow it.

          Comment by Jonathan Lewis — October 24, 2009 @ 8:53 am BST Oct 24,2009

        • The bug is now fixed (in 12.1). I will try to get the fix backported to 10.2.

          Comment by Flado — December 1, 2009 @ 9:50 am BST Dec 1,2009

        • Bug 8971829 is backported to 10.2.0.4. The one-off patch for AIX is available via MetaLink.

          Comment by Flado — December 16, 2009 @ 9:34 am BST Dec 16,2009

        • Flado,

          Thanks for keeping us up to date on that.
          It’s a fix that could be beneficial on quite a lot of systems.

          Comment by Jonathan Lewis — December 16, 2009 @ 1:58 pm BST Dec 16,2009


RSS feed for comments on this post. TrackBack URI

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

The Rubric Theme. Blog at WordPress.com.

Follow

Get every new post delivered to your Inbox.

Join 3,529 other followers