Oracle Scratchpad

January 23, 2018

Histograms

Filed under: Histograms,humour,Oracle,Statistics — Jonathan Lewis @ 7:58 am BST Jan 23,2018

I had a sudden urge to go a bit meta – so here’s a relative frequency histogram of my observations of the general use of histograms in Oracle:

 

 

 

10 Comments »

  1. I never considered that red category could cause problems until few days ago.

    But now I am armed with “dbms_stats.delete_column_stats( … , col_stat_type=>’HISTOGRAM’);” and about to hunt .

    I have just one doubt : will automatic gathering recreate undesiderd histograms ?

    Comment by paolocastle — January 23, 2018 @ 9:06 am BST Jan 23,2018 | Reply

    • The default gather will (almost always) recreate the histograms because the default method_opt is “for all columns size auto”, and that appearance of “auto” tells Oracle to decide for itself (from information recorded in sys.col_usage$) which columns to investigate for suitability of histograms.

      If you find a red column but want to leave Oracle’s default stats gathering routines unchanged you will have to set the method_opt preference for the table to something like:
      ‘for all columns size auto for columns XYZ size 1’ or (taking the opposite strategy – which is the one I usually adopt) ‘for all columns size 1 for columns size 254 XYZ ABC PQR’.

      See also: https://jonathanlewis.wordpress.com/2018/01/18/column-stats/

      Comment by Jonathan Lewis — January 23, 2018 @ 9:26 am BST Jan 23,2018 | Reply

  2. Hi Jonathan – for the red bar in particular, what proportion would you assign to Hybrids (or Height Balanced) versus Frequency? Perhaps we could have a stacked chart for that? :-)

    Comment by Nigel Bayliss — January 23, 2018 @ 1:10 pm BST Jan 23,2018 | Reply

    • Nigel,

      Good point – I’ll edit it tomorrow.
      Quick answer for today – the stack will be mostly height/hybrid.

      Comment by Jonathan Lewis — January 23, 2018 @ 5:03 pm BST Jan 23,2018 | Reply

      • Hi Jonathan – I’d like to understand the ‘don’t need’ criteria for the red column. Given that we test for range and/or value skew before creating a height-balanced/hybrid histogram on a column (subject to predicate usage), are you saying that we create these histograms when there is little or no range or value skew? Could you send me any examples that illustrate your point?

        Comment by Nigel Bayliss — January 29, 2018 @ 11:08 am BST Jan 29,2018 | Reply

        • Nigel,

          The problem with the heuristic that creates histograms is that it is not enough to have an “static” picture of the data and the operators used – it’s also necessary to know whether or not the histogram would cause execution plans to change – and if you acept as axiomatic that you should not do work that you don’t need to do then you won’t want to create histograms that make no difference to execution plans.

          Of course the “don’t do work you don’t need to do” can be applied to the machine at run time AND to the developer/DBA – which means it is perfectly reasonable to argue that it’s okay to create loads of redundant histograms if you have the machine resources available rather than spending developer/DBA time working out which tiny subset of histograms you actually need. And, as I’ve said in the past, Maria and I have a (non-fatal) difference of opinion here – she favours letting Oracle do its default thing then getting the DBA to identify the tiny number of histograms that are causing problems, I favour starting with no histograms then identifying the tiny number of histograms that would be useful.

          It probably won’t take me more than 30 minutes to run up an example where Oracle creates a redundant histogram on a data set with a frequency skew – I’ll see if I can do one this evening – but the problem with creating models is that it’s too easy to respond with the claim that they don’t represent the things that happen in real life: unfortunately real life is often rather more silly than the developers imagine.

          Comment by Jonathan Lewis — January 30, 2018 @ 10:39 am BST Jan 30,2018

  3. […] eliminating the critical histogram (using dbms_stats.delete_column_stats()) and eliminating the redundant child cursors (using dbms_pool.purge()) the massive proliferation […]

    Pingback by Histogram Threat | Oracle Scratchpad — January 30, 2018 @ 8:07 am BST Jan 30,2018 | Reply

    • Hi Jonathan, if I interpret what you’re saying correctly, ‘redundant’ in this context means a histogram on a column with skew, but a histogram that does not change the plan of an existing SQL statement. Sure, the ‘work axiom’ is OK if a system is relatively static, but work can be classed as useful if it yields a future advantage, especially if it avoids the need for manual intervention. This can be the case with systems that change over time (in terms of data and/or workload). Now I see what you’re saying, I agree that a constructed model might not be particularly revealing.

      Comment by Nigel Bayliss — January 30, 2018 @ 2:01 pm BST Jan 30,2018 | Reply

      • Yes. I think that you’ve clearly explained what redundant histogram means. Jonathan Lewis provided an example of this kind of redundant histogram in this article:
        https://jonathanlewis.wordpress.com/2018/01/30/histogram-threat/
        And I would tend to prefer Jonathan Lewis opinion about starting with no histograms then identifying the tiny number of histograms that would be useful. Because not every developer/DBA is able to identify a parsing issue due to ACS which itself is caused by a the presence of histogram on the predicate which makes a cursor bind sensitive before switching to bind aware after a warm up period.
        Best regards
        Mohamed Houri

        Comment by hourim — January 31, 2018 @ 3:29 pm BST Jan 31,2018 | Reply

      • Nigel,

        I can’t approve of the idea of doing extra work just in case there’s some future benefit. It would be a bit extreme to compare creating excessive histograms with creating excessive indexes “just in case”, but the analogy isn’t completely unfair – and in both cases the threat that appears when “just in case” finally happens is that performance becomes unpredictable for a while because of “randomly” variable plans so manual intervention is likely to be needed anyway.

        Comment by Jonathan Lewis — February 1, 2018 @ 8:31 am BST Feb 1,2018 | Reply


RSS feed for comments on this post. TrackBack URI

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

w

Connecting to %s

Powered by WordPress.com.