Oracle Scratchpad

July 27, 2013

Clustering_factor

Filed under: CBO,Indexing,Oracle — Jonathan Lewis @ 7:21 am GMT Jul 27,2013

The clustering_factor is one of the most important numbers (if not the most important number) affecting the optimizer’s choice of execution plan – it’s the thing that has the most significant effect on the optimizer’s decision on whether to choose a table scan or an index, and on which index to choose.

I have a rant about the clustering_factor; something that has irritated me for years. The name is wrong (on two counts, but only one of them is serious): the number is not about clustering, it’s about SCATTER – the bigger the number, the worse the scatter. You may, of course, point out that mechanically speaking scatter is virtually the opposite of clustering so the choice of name doesn’t really matter – but I think there’s a big psychological difference that would make it easier for people to get an intuitive feel for what the number is telling them if something as simple as the choice of name is improved.

Fortunately Oracle restates the clustering_factor as the avg_data_blocks_per_key in the user_indexes view, dividing out by the distinct_keys and rounding. This is a far more helpful measure of how scattered your data is: “for a given key value, how many randomly scattered table blocks do I have to acquire”. (This is only specifically relevant for predicates of the type “complete set of index columns = {constant}”, of course, but it’s still a very useful intuitive measure even when you are considering more general predicates.

The second error in the name (the not-serious error) is that a “factor”, to me at least, is a number between zero and one (or a percentage between 0% and 100%) – the clustering_factor really ought to be something like the scatter_count.

9 Comments »

  1. As usual, you were very precise and gave me a better perspective of Oracle.
    Thank you Jonathan

    Comment by heliodias — July 27, 2013 @ 3:15 pm GMT Jul 27,2013 | Reply

  2. […] #DATABASE #ORACLE by Jonathan Lewis : Clustering_factor […]

    Pingback by #DATABASE #ORACLE by Jonathan Lewis : Clustering_factor | Database Scene — July 27, 2013 @ 7:59 pm GMT Jul 27,2013 | Reply

  3. Jonathan,
    As much as I agree with your rant about the misleading naming, I tend to disagree with your second thought: Why should a factor – as in “a part of a multiplication” – be constrained to a number between zero and one?

    Comment by Uwe M. Küchler — July 30, 2013 @ 11:51 am GMT Jul 30,2013 | Reply

    • Uwe,

      You’re right – my personal preference for 0-1 / 0-100% is too aggressive a requirement; however I would be inclinced to insist that (informally rather than mathematically) a factor is a thing “by which you scale”, and the clustering_factor is the thing that is “scaled by” the selectivity.

      Comment by Jonathan Lewis — July 30, 2013 @ 12:00 pm GMT Jul 30,2013 | Reply

  4. An astute observation Jonathan, as usual. :)
    I’ve always thought that ‘clustering factor’ had little to do with the measure – your suggestion is a good one.
    Perhaps in a future version of Oracle…

    Comment by Jared — August 6, 2013 @ 4:15 pm GMT Aug 6,2013 | Reply

  5. Hi Jon,

    Here are the details
    [
    The number blocks in table 728295
    No of rows 5159680
    Table is LIST partitioned
    -----
    Index I1 (NONUNIQUE)
    concatenated index on 2 columns (c_flag,bev_c_code) --both the cloumns being used in select query
    avg_data_blocks_per_key 230
    distinct_keys 1119
    leaf_blocks 12700
    CF : 257927
    --------------------
    Index I2 (NONUNIQUE)
    concatenated index on 2 columns (c_flag,LST_UPDT_DT)--only one cloumn(c_flag) being used in select query
    avg_data_blocks_per_key 1
    distinct_keys 3019165
    leaf_blocks 21873
    CF : 1444626
    ]

    Optimizer is default using the I1 index and CPU usage is high(80%)
    when I force optimizer to I2(which has high CF than I1) the cost has gone high but CPU has come down to 10%.
    Is that because CF is high on I2 than I1 optimizer has ignored the I2? If so then why optimizer has chosen index with High CPU?

    Comment by Ram — January 22, 2014 @ 4:46 pm GMT Jan 22,2014 | Reply


RSS feed for comments on this post. TrackBack URI

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Theme: Rubric. Get a free blog at WordPress.com

Follow

Get every new post delivered to your Inbox.

Join 4,306 other followers