Oracle Scratchpad

September 24, 2010

Frequency Histogram 3

Filed under: Histograms,Oracle,Statistics — Jonathan Lewis @ 6:45 pm GMT Sep 24,2010

Here’s the output I get from querying dba_tab_histograms for a column that has been given a frequency histogram by a call to dbms_stats.gather_table_stats().

ENDPOINT_NUMBER ENDPOINT_VALUE
--------------- --------------
              2             -1
           3190              0
           3813              1
           4310              4
           4480              5
           7507            999
          17210           2000
          17212           2002
          17213           2004
          17215           2006
          17729           2008
          17750           2009
          17752           2010
          17904           2011
          17906           2012
          17909           2017
          17994           5000
          18006           5001
          18009           5002
          18023           5003
          18062           6001
          39885          11000

In an earlier blog I wrote about a query that turned the figures from a frequency histogram into a list of column values and column frequencies, and if I had used that query against the histogram data I would have found that the value 11,000 appears 21,823 times – according to the histogram.

But there’s a problem with this histogram: the value 11,000 actually appeared roughly 2 million times in the real data set – and when I enabled autotrace on the query that I had used to count the number of times that 11,000 appeared the optimizer’s prediction (the cardinality in the execution plan) matched the actual value fairly closely. So how did the optimizer manage to get from the histogram to the correct cardinality ?

You have to remember that Oracle may have sampled the data to generate the stats, and it may have sampled different columns with different sample sizes. So there has to be a way for the optimizer to allow for this variation in sample sizes, and it works as follows:

Checking dba_tables for num_rows I got 15,819,184.
Checking dba_tab_columns for num_nulls for this column I got 11,976,735

For simplicity let’s call these figures 16 million and 12 million. With these approximations we can see that roughly 4 million rows must have a value for this column.

Checking back to the frequency histogram we see the total row count sampled (which we get from the final (i.e. maximum) endpoint_number) is 39,885 – which we’ll call 40,000.

We have a histogram that sampled 40,000 rows out of 4,000,000 (viz: 1%) – so the optimizer scales the histogram up by a factor of 100 when it uses it, and the histogram figure of 21,827 is scaled up to (roughly) 2,000,000.

[Further reading on Histograms]

3 Comments »

  1. I am looking for a nice way to gather statistics for one column (e.g. after it has been added to a table and populated) without updating any of the other table statistics. My search turned up this blog entry with its reference to DBMS_STATS.GATHER_COLUMN_STATS.

    Is this a misprint or are the numerous references to this procedure on the web referring to something real?

    Comment by Tony Hasler — September 26, 2010 @ 3:00 pm GMT Sep 26,2010 | Reply

    • Tony,

      Sorry, it’s typo (now corrected).

      The mechanism you want is probably: “gather_table_stats()” with the method_opt set to ‘for columns {column name} size {whatever}’. Better do a quick check test, though, in case this also does something you’d rather not see to the table-level stats like num_rows or blocks.

      Comment by Jonathan Lewis — September 26, 2010 @ 3:04 pm GMT Sep 26,2010 | Reply

  2. Jonathan, sorry for taking so long to acknowledge your reply. Yes, your suggested approach does update the table statistics. The only way I can see to do what I want is to specify a statistics table on the gather_table_stats call, then delete the row for the column in that statistics table (if any), and then re-import.

    Hardly elegant. Such is life.

    Comment by tonyhasler — October 1, 2010 @ 5:25 pm GMT Oct 1,2010 | Reply


RSS feed for comments on this post. TrackBack URI

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Theme: Rubric. Get a free blog at WordPress.com

Follow

Get every new post delivered to your Inbox.

Join 4,430 other followers