<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:georss="http://www.georss.org/georss" xmlns:geo="http://www.w3.org/2003/01/geo/wgs84_pos#" xmlns:media="http://search.yahoo.com/mrss/"
		>
<channel>
	<title>Comments on: Fake Histograms</title>
	<atom:link href="http://jonathanlewis.wordpress.com/2010/03/23/fake-histograms/feed/" rel="self" type="application/rss+xml" />
	<link>http://jonathanlewis.wordpress.com/2010/03/23/fake-histograms/</link>
	<description>Just another Oracle weblog</description>
	<lastBuildDate>Wed, 19 Jun 2013 09:52:54 +0000</lastBuildDate>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.com/</generator>
	<item>
		<title>By: Webinar questions &#124; Oracle Scratchpad</title>
		<link>http://jonathanlewis.wordpress.com/2010/03/23/fake-histograms/#comment-56277</link>
		<dc:creator><![CDATA[Webinar questions &#124; Oracle Scratchpad]]></dc:creator>
		<pubDate>Fri, 14 Jun 2013 16:41:19 +0000</pubDate>
		<guid isPermaLink="false">http://jonathanlewis.wordpress.com/?p=3411#comment-56277</guid>
		<description><![CDATA[[&#8230;] though, don&#8217;t forget that I also pointed out that sometimes you may still need to create &#8220;fake&#8221; histograms to get the best possible [&#8230;]]]></description>
		<content:encoded><![CDATA[<p>[&#8230;] though, don&#8217;t forget that I also pointed out that sometimes you may still need to create &#8220;fake&#8221; histograms to get the best possible [&#8230;]</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Usage Stats &#171; Oracle Scratchpad</title>
		<link>http://jonathanlewis.wordpress.com/2010/03/23/fake-histograms/#comment-53161</link>
		<dc:creator><![CDATA[Usage Stats &#171; Oracle Scratchpad]]></dc:creator>
		<pubDate>Thu, 24 Jan 2013 19:00:51 +0000</pubDate>
		<guid isPermaLink="false">http://jonathanlewis.wordpress.com/?p=3411#comment-53161</guid>
		<description><![CDATA[[...] Yes &#8211; here&#8217;s one example. [...]]]></description>
		<content:encoded><![CDATA[<p>[...] Yes &#8211; here&#8217;s one example. [...]</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Amit</title>
		<link>http://jonathanlewis.wordpress.com/2010/03/23/fake-histograms/#comment-44568</link>
		<dc:creator><![CDATA[Amit]]></dc:creator>
		<pubDate>Fri, 20 Jan 2012 16:06:28 +0000</pubDate>
		<guid isPermaLink="false">http://jonathanlewis.wordpress.com/?p=3411#comment-44568</guid>
		<description><![CDATA[Hi Jonathan, 

My apologies for providing incorrect and incomplete information. Naming convention of the index did not use this column name and I did not notice the index used in the query actually contains this column. You are right this is unique index and which contains three columns  (the leading column in the index is the one having histograms issue). I verified the info from the index and columns and actually data in the column is not skew because the distinct_keys in the index and num_distinct in the column are approximately the same as below, the other columns have only 2 and 0 distinct values 
[sourcecode]
SQL&gt; select column_name,num_distinct from dba_tab_cols where table_name=&#039;EVE&#039; and column_name=&#039;IDENTITY&#039;;

COLUMN_NAME                    NUM_DISTINCT
------------------------------ ------------
IDENTITY                         	24749753

SQL&gt; select  index_name,num_rows,distinct_keys from dba_indexes where table_name=&#039;EVE&#039; and index_name=&#039;UI_EVE&#039;;

INDEX_NAME                       NUM_ROWS DISTINCT_KEYS
------------------------------ ---------- -------------
UI_EVE              		25670860      25670860
[/sourcecode]
As you mentioned because first 32 chars in the column values are same so it was reporting incorrect histograms.I decided to delete histograms on this column and now the query is using the Index scan in place FTS.

Thanks for your inputs.

Amit]]></description>
		<content:encoded><![CDATA[<p>Hi Jonathan, </p>
<p>My apologies for providing incorrect and incomplete information. Naming convention of the index did not use this column name and I did not notice the index used in the query actually contains this column. You are right this is unique index and which contains three columns  (the leading column in the index is the one having histograms issue). I verified the info from the index and columns and actually data in the column is not skew because the distinct_keys in the index and num_distinct in the column are approximately the same as below, the other columns have only 2 and 0 distinct values </p>
<pre class="brush: plain; title: ; notranslate">
SQL&gt; select column_name,num_distinct from dba_tab_cols where table_name='EVE' and column_name='IDENTITY';

COLUMN_NAME                    NUM_DISTINCT
------------------------------ ------------
IDENTITY                         	24749753

SQL&gt; select  index_name,num_rows,distinct_keys from dba_indexes where table_name='EVE' and index_name='UI_EVE';

INDEX_NAME                       NUM_ROWS DISTINCT_KEYS
------------------------------ ---------- -------------
UI_EVE              		25670860      25670860
</pre>
<p>As you mentioned because first 32 chars in the column values are same so it was reporting incorrect histograms.I decided to delete histograms on this column and now the query is using the Index scan in place FTS.</p>
<p>Thanks for your inputs.</p>
<p>Amit</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Jonathan Lewis</title>
		<link>http://jonathanlewis.wordpress.com/2010/03/23/fake-histograms/#comment-43994</link>
		<dc:creator><![CDATA[Jonathan Lewis]]></dc:creator>
		<pubDate>Tue, 03 Jan 2012 20:01:55 +0000</pubDate>
		<guid isPermaLink="false">http://jonathanlewis.wordpress.com/?p=3411#comment-43994</guid>
		<description><![CDATA[Amit,
&lt;i&gt;1. Why it is reporting NewDensity:0.000000&lt;/i&gt;

You&#039;ve got more than 2M distinct values according to the stats; what&#039;s 1/2,000,000 ?

&lt;i&gt;2. The selectivity optimizer is trying to calculate is (80-0.5)/254=0.312992 which is calculating the cardinality as 20297719*0.312992=6353026.22 but I am not sure why it is trying to deduct 0.5 from the popular bucket count. I got the same results for other 10053 trace files for this query&lt;/i&gt;

I made the mistake of assuming the column stats and the index access path information you gave me were somehow related - I suppose I should have guessed that an index named UI_Col1 didn&#039;t contain a column named Col2 but I rather discounted that possibility as a typo because I didn&#039;t think your columns were really called Col1 and Col2 anyway. Why would the number of popular buckets for one column have anything to do with the index selectivity of an index which doesn&#039;t include that column ?

&lt;em&gt; 3. And the actual problem, should there be no histogram on this column or how to deal with such kind of histogram issue. Predicate used in the query (with skewed histogram) is not a unique column, the access path shown is for RangeScan &lt;strong&gt;on other unique&lt;/strong&gt; index column UI_COL1 which the optimizer is trying to evaluate for other strings not having 32 chars&lt;/em&gt;

I&#039;m going to guess that there are two predicates (at least), one on Col1 and one on Col2. But how I&#039;m supposed to work out an answer to the question when you haven&#039;t given me any clues about what the predicates look like, or what the index looks like ?]]></description>
		<content:encoded><![CDATA[<p>Amit,<br />
<i>1. Why it is reporting NewDensity:0.000000</i></p>
<p>You&#8217;ve got more than 2M distinct values according to the stats; what&#8217;s 1/2,000,000 ?</p>
<p><i>2. The selectivity optimizer is trying to calculate is (80-0.5)/254=0.312992 which is calculating the cardinality as 20297719*0.312992=6353026.22 but I am not sure why it is trying to deduct 0.5 from the popular bucket count. I got the same results for other 10053 trace files for this query</i></p>
<p>I made the mistake of assuming the column stats and the index access path information you gave me were somehow related &#8211; I suppose I should have guessed that an index named UI_Col1 didn&#8217;t contain a column named Col2 but I rather discounted that possibility as a typo because I didn&#8217;t think your columns were really called Col1 and Col2 anyway. Why would the number of popular buckets for one column have anything to do with the index selectivity of an index which doesn&#8217;t include that column ?</p>
<p><em> 3. And the actual problem, should there be no histogram on this column or how to deal with such kind of histogram issue. Predicate used in the query (with skewed histogram) is not a unique column, the access path shown is for RangeScan <strong>on other unique</strong> index column UI_COL1 which the optimizer is trying to evaluate for other strings not having 32 chars</em></p>
<p>I&#8217;m going to guess that there are two predicates (at least), one on Col1 and one on Col2. But how I&#8217;m supposed to work out an answer to the question when you haven&#8217;t given me any clues about what the predicates look like, or what the index looks like ?</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: NewDensity &#171; Oracle Scratchpad</title>
		<link>http://jonathanlewis.wordpress.com/2010/03/23/fake-histograms/#comment-43987</link>
		<dc:creator><![CDATA[NewDensity &#171; Oracle Scratchpad]]></dc:creator>
		<pubDate>Tue, 03 Jan 2012 17:56:16 +0000</pubDate>
		<guid isPermaLink="false">http://jonathanlewis.wordpress.com/?p=3411#comment-43987</guid>
		<description><![CDATA[[...] recent comment on a note I wrote some time ago about faking histograms asked about the calculations of selectivity in the latest versions of Oracle. As I read the [...]]]></description>
		<content:encoded><![CDATA[<p>[...] recent comment on a note I wrote some time ago about faking histograms asked about the calculations of selectivity in the latest versions of Oracle. As I read the [...]</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Amit</title>
		<link>http://jonathanlewis.wordpress.com/2010/03/23/fake-histograms/#comment-43933</link>
		<dc:creator><![CDATA[Amit]]></dc:creator>
		<pubDate>Mon, 02 Jan 2012 19:35:57 +0000</pubDate>
		<guid isPermaLink="false">http://jonathanlewis.wordpress.com/?p=3411#comment-43933</guid>
		<description><![CDATA[Thanks Jonathan for the response.My actual problem is the same that you pointed.Because of same characters for the first 32 bytes,my histograms are reporting skewed data and calculating incorrect cardinality.I have posted this problem in separate forum, here I wanted to understand the selectivity/cardinality calculation so did not mention the issue but you spotted that :)

My query is doing FTS because it is getting the cardinality as 1/3 of records from the table and Index RangeScan for other values in no-popular buckets. But I am trying to understand:

1. Why it is reporting NewDensity:0.000000
2. The selectivity optimizer is trying to calculate is (80-0.5)/254=0.312992 which is calculating the cardinality as 20297719*0.312992=6353026.22 but I am not sure why it is trying to deduct 0.5 from the popular bucket count. I got the same results for other 10053 trace files for this query
3. And the actual problem, should there be no histogram on this column or how to deal with such kind of histogram issue. Predicate used in the query (with skewed histogram) is not a unique column, the access path shown is for RangeScan on other unique index column UI_COL1 which the optimizer is trying to evaluate for other strings not having 32 chars]]></description>
		<content:encoded><![CDATA[<p>Thanks Jonathan for the response.My actual problem is the same that you pointed.Because of same characters for the first 32 bytes,my histograms are reporting skewed data and calculating incorrect cardinality.I have posted this problem in separate forum, here I wanted to understand the selectivity/cardinality calculation so did not mention the issue but you spotted that :)</p>
<p>My query is doing FTS because it is getting the cardinality as 1/3 of records from the table and Index RangeScan for other values in no-popular buckets. But I am trying to understand:</p>
<p>1. Why it is reporting NewDensity:0.000000<br />
2. The selectivity optimizer is trying to calculate is (80-0.5)/254=0.312992 which is calculating the cardinality as 20297719*0.312992=6353026.22 but I am not sure why it is trying to deduct 0.5 from the popular bucket count. I got the same results for other 10053 trace files for this query<br />
3. And the actual problem, should there be no histogram on this column or how to deal with such kind of histogram issue. Predicate used in the query (with skewed histogram) is not a unique column, the access path shown is for RangeScan on other unique index column UI_COL1 which the optimizer is trying to evaluate for other strings not having 32 chars</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Jonathan Lewis</title>
		<link>http://jonathanlewis.wordpress.com/2010/03/23/fake-histograms/#comment-43854</link>
		<dc:creator><![CDATA[Jonathan Lewis]]></dc:creator>
		<pubDate>Sat, 31 Dec 2011 15:34:15 +0000</pubDate>
		<guid isPermaLink="false">http://jonathanlewis.wordpress.com/?p=3411#comment-43854</guid>
		<description><![CDATA[Amit,

I have corrected &lt;em&gt;&lt;strong&gt;&lt;a href=&quot;http://jonathanlewis.wordpress.com/2010/03/23/fake-histograms/#comment-42351&quot; rel=&quot;nofollow&quot;&gt;the comment above&lt;/a&gt;&lt;/strong&gt;&lt;/em&gt; that you are using as the source of the formula. When I wrote it, I was thinking about &lt;strong&gt;cardinality &lt;/strong&gt;rather than &lt;strong&gt;selectivity &lt;/strong&gt;(and the &lt;strong&gt;selectivity &lt;/strong&gt;for &lt;em&gt;&#039;column = constant&#039;&lt;/em&gt; is the value that the optimizer reports as the &lt;strong&gt;NewDensity&lt;/strong&gt;).

But even when you know what the formula is really supposed to give you, you still get a strange answer - and that&#039;s because there is something very wrong with your statistics. Note that the table has 20,297,719 rows and the column has 20,297,719 distinct values. Despite this, the histogram claims that there are 80 buckets out of 254 that all represent the same value ... which means the NDV ought to be roughly 2/3 of the size that it is.

I note, then, that your index is called &lt;strong&gt;UI_Col1&lt;/strong&gt; - which suggests you have a naming convention that prefixes unique indexes with the letters &lt;strong&gt;UI&lt;/strong&gt;, and that means something even more bizarre is happening with the stats. Given the path is a &lt;strong&gt;RangeScan&lt;/strong&gt;, of course, the &lt;strong&gt;ix_sel&lt;/strong&gt; in the &lt;em&gt;Access path &lt;/em&gt;calculation doesn&#039;t have to have anything to do with the &lt;strong&gt;NewDensity &lt;/strong&gt;from the table stats. On the other hand the value of 0.31 (combined with the 80 buckets of popular values) suggests that your query may be targetting a range that includes the value that Oracle thinks is the popular value (viz, a little less than one third of the rows  ... 80/254 = 0.3149).

At this point I think I&#039;m going to guess that &lt;strong&gt;Col1 &lt;/strong&gt;is a character based column of more than 32 bytes and a large number of rows have a value that starts with the same 32 bytes. I say this because it&#039;s one way to reproduce your anomaly - the code to calculate the &lt;strong&gt;NDV &lt;/strong&gt;can count the actual distinct values, the code to generate the histogram limits itself (with some complications) to the first 32 bytes.

I&#039;ll try to write up a short note to demonstrate the NewDensity calculation in a couple of days time.]]></description>
		<content:encoded><![CDATA[<p>Amit,</p>
<p>I have corrected <em><strong><a href="http://jonathanlewis.wordpress.com/2010/03/23/fake-histograms/#comment-42351" rel="nofollow">the comment above</a></strong></em> that you are using as the source of the formula. When I wrote it, I was thinking about <strong>cardinality </strong>rather than <strong>selectivity </strong>(and the <strong>selectivity </strong>for <em>&#8216;column = constant&#8217;</em> is the value that the optimizer reports as the <strong>NewDensity</strong>).</p>
<p>But even when you know what the formula is really supposed to give you, you still get a strange answer &#8211; and that&#8217;s because there is something very wrong with your statistics. Note that the table has 20,297,719 rows and the column has 20,297,719 distinct values. Despite this, the histogram claims that there are 80 buckets out of 254 that all represent the same value &#8230; which means the NDV ought to be roughly 2/3 of the size that it is.</p>
<p>I note, then, that your index is called <strong>UI_Col1</strong> &#8211; which suggests you have a naming convention that prefixes unique indexes with the letters <strong>UI</strong>, and that means something even more bizarre is happening with the stats. Given the path is a <strong>RangeScan</strong>, of course, the <strong>ix_sel</strong> in the <em>Access path </em>calculation doesn&#8217;t have to have anything to do with the <strong>NewDensity </strong>from the table stats. On the other hand the value of 0.31 (combined with the 80 buckets of popular values) suggests that your query may be targetting a range that includes the value that Oracle thinks is the popular value (viz, a little less than one third of the rows  &#8230; 80/254 = 0.3149).</p>
<p>At this point I think I&#8217;m going to guess that <strong>Col1 </strong>is a character based column of more than 32 bytes and a large number of rows have a value that starts with the same 32 bytes. I say this because it&#8217;s one way to reproduce your anomaly &#8211; the code to calculate the <strong>NDV </strong>can count the actual distinct values, the code to generate the histogram limits itself (with some complications) to the first 32 bytes.</p>
<p>I&#8217;ll try to write up a short note to demonstrate the NewDensity calculation in a couple of days time.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Amit</title>
		<link>http://jonathanlewis.wordpress.com/2010/03/23/fake-histograms/#comment-43822</link>
		<dc:creator><![CDATA[Amit]]></dc:creator>
		<pubDate>Fri, 30 Dec 2011 18:53:24 +0000</pubDate>
		<guid isPermaLink="false">http://jonathanlewis.wordpress.com/?p=3411#comment-43822</guid>
		<description><![CDATA[Hi Jonathan

I am using 11.2 and from the formula (total number of rows - number of rows for popular values) / (total number of distinct values - number of popular values). I tried to calculate the selectivity for my query but not getting the correct value as depicted in the 10053 trace. Below is my 10053 trace output:
[sourcecode]
Column (#2): 
    NewDensity:0.000000, OldDensity:0.000000 BktCnt:254, PopBktCnt:80, PopValCnt:1, NDV:20297719
  Column (#2): Col2 (
    AvgLen: 36 NDV: 20297719 Nulls: 0 Density: 0.000000
    Histogram: HtBal  #Bkts: 254  UncompBkts: 254  EndPtVals: 176
  Table: Tab1  Alias: UNIT1_
    Card: Original: 20297719.000000  Rounded: 6353026  Computed: 6353026.22  Non Adjusted: 6353026.22
[/sourcecode]
after putting values in the formula, the selectivity comes out to be (20297719- (20297719/254)*80)/(20297719-1)=0.6850394. Though the selectivity depicted in the trace is 
[sourcecode]
Access Path: index (RangeScan)
    Index: UI_Col1
    resc_io: 5766117.00  resc_cpu: 43557791448
    ix_sel: 0.312992  ix_sel_with_filters: 0.312992 
    Cost: 5772791.97  Resp: 5772791.97  Degree: 1
[/sourcecode]
Please tell me where I am wrong in my calculation.

Thanks
Amit]]></description>
		<content:encoded><![CDATA[<p>Hi Jonathan</p>
<p>I am using 11.2 and from the formula (total number of rows &#8211; number of rows for popular values) / (total number of distinct values &#8211; number of popular values). I tried to calculate the selectivity for my query but not getting the correct value as depicted in the 10053 trace. Below is my 10053 trace output:</p>
<pre class="brush: plain; title: ; notranslate">
Column (#2): 
    NewDensity:0.000000, OldDensity:0.000000 BktCnt:254, PopBktCnt:80, PopValCnt:1, NDV:20297719
  Column (#2): Col2 (
    AvgLen: 36 NDV: 20297719 Nulls: 0 Density: 0.000000
    Histogram: HtBal  #Bkts: 254  UncompBkts: 254  EndPtVals: 176
  Table: Tab1  Alias: UNIT1_
    Card: Original: 20297719.000000  Rounded: 6353026  Computed: 6353026.22  Non Adjusted: 6353026.22
</pre>
<p>after putting values in the formula, the selectivity comes out to be (20297719- (20297719/254)*80)/(20297719-1)=0.6850394. Though the selectivity depicted in the trace is </p>
<pre class="brush: plain; title: ; notranslate">
Access Path: index (RangeScan)
    Index: UI_Col1
    resc_io: 5766117.00  resc_cpu: 43557791448
    ix_sel: 0.312992  ix_sel_with_filters: 0.312992 
    Cost: 5772791.97  Resp: 5772791.97  Degree: 1
</pre>
<p>Please tell me where I am wrong in my calculation.</p>
<p>Thanks<br />
Amit</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Jonathan Lewis</title>
		<link>http://jonathanlewis.wordpress.com/2010/03/23/fake-histograms/#comment-42526</link>
		<dc:creator><![CDATA[Jonathan Lewis]]></dc:creator>
		<pubDate>Tue, 22 Nov 2011 08:21:43 +0000</pubDate>
		<guid isPermaLink="false">http://jonathanlewis.wordpress.com/?p=3411#comment-42526</guid>
		<description><![CDATA[Jason,

I wouldn&#039;t be 100% sure of that. It&#039;s possible, for example, that there are cases where the optimizer code hasn&#039;t been made completely consistent. I would still prefer to set a density to &quot;the figure I want Oracle to believe&quot; even if I thought it should be ignored.]]></description>
		<content:encoded><![CDATA[<p>Jason,</p>
<p>I wouldn&#8217;t be 100% sure of that. It&#8217;s possible, for example, that there are cases where the optimizer code hasn&#8217;t been made completely consistent. I would still prefer to set a density to &#8220;the figure I want Oracle to believe&#8221; even if I thought it should be ignored.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Jason Bucata</title>
		<link>http://jonathanlewis.wordpress.com/2010/03/23/fake-histograms/#comment-42374</link>
		<dc:creator><![CDATA[Jason Bucata]]></dc:creator>
		<pubDate>Mon, 14 Nov 2011 15:23:23 +0000</pubDate>
		<guid isPermaLink="false">http://jonathanlewis.wordpress.com/?p=3411#comment-42374</guid>
		<description><![CDATA[So does that mean, then, that if I&#039;m faking up a histogram, either frequency or height-balanced, it&#039;s OK to leave density NULL?  That DBMS_STATS will either calculate a decent value, or even if it doesn&#039;t it won&#039;t matter since the CBO won&#039;t use it?]]></description>
		<content:encoded><![CDATA[<p>So does that mean, then, that if I&#8217;m faking up a histogram, either frequency or height-balanced, it&#8217;s OK to leave density NULL?  That DBMS_STATS will either calculate a decent value, or even if it doesn&#8217;t it won&#8217;t matter since the CBO won&#8217;t use it?</p>
]]></content:encoded>
	</item>
</channel>
</rss>
