<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:georss="http://www.georss.org/georss" xmlns:geo="http://www.w3.org/2003/01/geo/wgs84_pos#" xmlns:media="http://search.yahoo.com/mrss/"
		>
<channel>
	<title>Comments on: Small Tables</title>
	<atom:link href="http://jonathanlewis.wordpress.com/2011/03/24/small-tables/feed/" rel="self" type="application/rss+xml" />
	<link>http://jonathanlewis.wordpress.com/2011/03/24/small-tables/</link>
	<description>Just another Oracle weblog</description>
	<lastBuildDate>Wed, 19 Jun 2013 09:25:44 +0000</lastBuildDate>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.com/</generator>
	<item>
		<title>By: Small Partitions &#124; Oracle Scratchpad</title>
		<link>http://jonathanlewis.wordpress.com/2011/03/24/small-tables/#comment-56108</link>
		<dc:creator><![CDATA[Small Partitions &#124; Oracle Scratchpad]]></dc:creator>
		<pubDate>Sun, 09 Jun 2013 12:10:58 +0000</pubDate>
		<guid isPermaLink="false">http://jonathanlewis.wordpress.com/?p=5999#comment-56108</guid>
		<description><![CDATA[[&#8230;] - and I haven&#039;t been keeping up.] [Update again: Some notes I wrote a couple of years laters about small tables and the variation in treatment depending on [&#8230;]]]></description>
		<content:encoded><![CDATA[<p>[&#8230;] &#8211; and I haven&#039;t been keeping up.] [Update again: Some notes I wrote a couple of years laters about small tables and the variation in treatment depending on [&#8230;]</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Lonny (@sql_handle)</title>
		<link>http://jonathanlewis.wordpress.com/2011/03/24/small-tables/#comment-56024</link>
		<dc:creator><![CDATA[Lonny (@sql_handle)]]></dc:creator>
		<pubDate>Mon, 03 Jun 2013 16:36:51 +0000</pubDate>
		<guid isPermaLink="false">http://jonathanlewis.wordpress.com/?p=5999#comment-56024</guid>
		<description><![CDATA[Excellent stuff!  Is the &quot;forget-bit&quot; still used in Oracle 11.2.0.3?  Can it be retrieved from x$bh along with tim for time and tch for touch count, or elsewhere?
The reason for my interest:
I&#039;m working with a DSS style workload - lots of reports\queries run in batch with a concurrency of 40 to 100 (and many of the parallelized) on a set of tables that is updated once nightly via ETL.  Reports/queries with similar SQL text, varying only by parameters, are queued sequentially.  This can lead to lots of near-concurrent repetitive full table scans and index fast full scans into PGA via direct path read.  That increases the total read volume for the workload, making the Oracle instance a sometimes-unkind neighbor on shared storage.  It can also lead to queue full (QFULL) conditions for the physical volumes/LUNs on the Oracle host, punishing query performance in response to the overloaded storage queues.
My first thought was to increase _small_table_threshold and possibly _large_object_threshold, decreasing the fts/ffs direct path reads and increasing SGA traffic.  Cache hit for similar concurrent/near-concurrent similar queries could go up, total IO for a given set of similar queries could go down, and the risk of QFULL conditions could go down together with the total IO (this is on systems with no filesystem buffer cache, else the repetitive requests could be satisfied from fs buffer cache).

But, if DPR fts/ffs are just replaced by fts/ffs into SGA database cache, it looks like I should watch the following pretty closely to see if I&#039;m really getting the benefit I want:
1. Midpoint vs near-endpoint buffer cache LRU insertion.  Midpoint insertion would allow a larger time window for near-concurrent queries to find the data still in cache.  Insertion of fts/ffs blocks near the buffer cache LRU might not lower the total workload IO or the risk of QFULL much.
2. change of touch count (x$bh.tch).  If the ffs/fts blocks don&#039;t have their touch count increased by subsequent touches after the three second timer goes by, they&#039;ll age out and again might not provide as much cache benefit as originally expected.
3. forget-bit.
4. Updates to last touch time (x$bh.tim)

In addition to watching total accumulated IO, short and long table scan counts, ffs counts... that&#039;s a lot to watch in comparative tests.  Makes me think in my case it may be better to work with optimizer_index_cost_adj, optimizer_index_caching, and/or table_cached_blocks to replace fts and ffs with more targeted index use with index blocks and index-selected table blocks just going to the normal database buffer cache LRU insertion point, with their touch count handled just like everything else. 

I&#039;d like to design a good system test against a full-scale test workload, comparing baseline, modifications to _small_table_threshold, and modifications to optimizer_index_cost_adj.  The comparisons would be batch execution time (of course), workload total IO (data volume), workload total IOPs, long table scans, short table scans, index ffs, DPR traffic, midpoint database buffer cache insertions, near-endpoint database buffer cache insertions, and total count database cache buffers with forget-bit set (if this is still used in 11.2.0.3).]]></description>
		<content:encoded><![CDATA[<p>Excellent stuff!  Is the &#8220;forget-bit&#8221; still used in Oracle 11.2.0.3?  Can it be retrieved from x$bh along with tim for time and tch for touch count, or elsewhere?<br />
The reason for my interest:<br />
I&#8217;m working with a DSS style workload &#8211; lots of reports\queries run in batch with a concurrency of 40 to 100 (and many of the parallelized) on a set of tables that is updated once nightly via ETL.  Reports/queries with similar SQL text, varying only by parameters, are queued sequentially.  This can lead to lots of near-concurrent repetitive full table scans and index fast full scans into PGA via direct path read.  That increases the total read volume for the workload, making the Oracle instance a sometimes-unkind neighbor on shared storage.  It can also lead to queue full (QFULL) conditions for the physical volumes/LUNs on the Oracle host, punishing query performance in response to the overloaded storage queues.<br />
My first thought was to increase _small_table_threshold and possibly _large_object_threshold, decreasing the fts/ffs direct path reads and increasing SGA traffic.  Cache hit for similar concurrent/near-concurrent similar queries could go up, total IO for a given set of similar queries could go down, and the risk of QFULL conditions could go down together with the total IO (this is on systems with no filesystem buffer cache, else the repetitive requests could be satisfied from fs buffer cache).</p>
<p>But, if DPR fts/ffs are just replaced by fts/ffs into SGA database cache, it looks like I should watch the following pretty closely to see if I&#8217;m really getting the benefit I want:<br />
1. Midpoint vs near-endpoint buffer cache LRU insertion.  Midpoint insertion would allow a larger time window for near-concurrent queries to find the data still in cache.  Insertion of fts/ffs blocks near the buffer cache LRU might not lower the total workload IO or the risk of QFULL much.<br />
2. change of touch count (x$bh.tch).  If the ffs/fts blocks don&#8217;t have their touch count increased by subsequent touches after the three second timer goes by, they&#8217;ll age out and again might not provide as much cache benefit as originally expected.<br />
3. forget-bit.<br />
4. Updates to last touch time (x$bh.tim)</p>
<p>In addition to watching total accumulated IO, short and long table scan counts, ffs counts&#8230; that&#8217;s a lot to watch in comparative tests.  Makes me think in my case it may be better to work with optimizer_index_cost_adj, optimizer_index_caching, and/or table_cached_blocks to replace fts and ffs with more targeted index use with index blocks and index-selected table blocks just going to the normal database buffer cache LRU insertion point, with their touch count handled just like everything else. </p>
<p>I&#8217;d like to design a good system test against a full-scale test workload, comparing baseline, modifications to _small_table_threshold, and modifications to optimizer_index_cost_adj.  The comparisons would be batch execution time (of course), workload total IO (data volume), workload total IOPs, long table scans, short table scans, index ffs, DPR traffic, midpoint database buffer cache insertions, near-endpoint database buffer cache insertions, and total count database cache buffers with forget-bit set (if this is still used in 11.2.0.3).</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Jonathan Lewis</title>
		<link>http://jonathanlewis.wordpress.com/2011/03/24/small-tables/#comment-54037</link>
		<dc:creator><![CDATA[Jonathan Lewis]]></dc:creator>
		<pubDate>Sun, 10 Mar 2013 12:08:18 +0000</pubDate>
		<guid isPermaLink="false">http://jonathanlewis.wordpress.com/?p=5999#comment-54037</guid>
		<description><![CDATA[Tanel,

Thanks for the comment and links.

I didn&#039;t think of it when I was writing the note, but the behaviour for tablescans (and ndex fast full scans) is probably linked directly to the 25% target that I describe for the auxiliary replacement list (REPL_AUX) in Oracle Core pp. 107- 109. I haven&#039;t worked through it in detail, but I suspect - rough outline only - that:

&lt;ul&gt;
any tablescan under the 2% limit goes to &lt;strong&gt;repl_main&lt;/strong&gt; (mid-point)
any tablescan (index ffs) over the 2% figure goes directly to &lt;strong&gt;repl_aux&lt;/strong&gt;
any tablescan (index ffs) less than the 10% figure gets its touch count incremented if the blocks are still in repl_aux when the scan is repeated (which means they can eventually be promoted to the hot end of &lt;strong&gt;repl_main&lt;/strong&gt;).
any tablescan (index ffs) that is larger than repl_aux automatically cycles itself out of repl_aux as the scan is repeated.&lt;/ul&gt;


&lt;em&gt;&quot;going direct to repl_aux&quot;&lt;/em&gt; may actually turn out to mean blocks first go into &lt;strong&gt;repl_main&lt;/strong&gt; but are relinked almost immediately (on the next physical read request) to &lt;strong&gt;repl_aux&lt;/strong&gt; as a side effect of the free buffer scan.

Guessing (hypothesising) again - I don&#039;t think multiple working data sets and database writers would make any difference to the 25% magic number, because every working data set has its own repl_aux which hovers around 25% of the set, and the tablescan will (statistically) spread its data evenly across all the working data sets. (That&#039;s why the object queue table &lt;strong&gt;x$kcboqh&lt;/strong&gt; ends up with multiple entries per obj# - it&#039;s one row per object per working data set of the relevant buffer pool).]]></description>
		<content:encoded><![CDATA[<p>Tanel,</p>
<p>Thanks for the comment and links.</p>
<p>I didn&#8217;t think of it when I was writing the note, but the behaviour for tablescans (and ndex fast full scans) is probably linked directly to the 25% target that I describe for the auxiliary replacement list (REPL_AUX) in Oracle Core pp. 107- 109. I haven&#8217;t worked through it in detail, but I suspect &#8211; rough outline only &#8211; that:</p>
<ul>
any tablescan under the 2% limit goes to <strong>repl_main</strong> (mid-point)<br />
any tablescan (index ffs) over the 2% figure goes directly to <strong>repl_aux</strong><br />
any tablescan (index ffs) less than the 10% figure gets its touch count incremented if the blocks are still in repl_aux when the scan is repeated (which means they can eventually be promoted to the hot end of <strong>repl_main</strong>).<br />
any tablescan (index ffs) that is larger than repl_aux automatically cycles itself out of repl_aux as the scan is repeated.</ul>
<p><em>&#8220;going direct to repl_aux&#8221;</em> may actually turn out to mean blocks first go into <strong>repl_main</strong> but are relinked almost immediately (on the next physical read request) to <strong>repl_aux</strong> as a side effect of the free buffer scan.</p>
<p>Guessing (hypothesising) again &#8211; I don&#8217;t think multiple working data sets and database writers would make any difference to the 25% magic number, because every working data set has its own repl_aux which hovers around 25% of the set, and the tablescan will (statistically) spread its data evenly across all the working data sets. (That&#8217;s why the object queue table <strong>x$kcboqh</strong> ends up with multiple entries per obj# &#8211; it&#8217;s one row per object per working data set of the relevant buffer pool).</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Tanel Poder (@TanelPoder)</title>
		<link>http://jonathanlewis.wordpress.com/2011/03/24/small-tables/#comment-53987</link>
		<dc:creator><![CDATA[Tanel Poder (@TanelPoder)]]></dc:creator>
		<pubDate>Thu, 07 Mar 2013 11:18:17 +0000</pubDate>
		<guid isPermaLink="false">http://jonathanlewis.wordpress.com/?p=5999#comment-53987</guid>
		<description><![CDATA[I&#039;ll drop a couple of related links here too :)

- Alex Fatkulin&#039;s has written a little script for test the effect of object caching too: http://afatkulin.blogspot.ca/2009/01/11g-adaptive-direct-path-reads-what-is.html
- Regarding buffer cache warmup - there&#039;s a nice statistic called &quot;physical reads prefetch warmup&quot; which should tell whether prefetching happened due to buffer cache being &quot;empty&quot;. There&#039;s an example of this happening here: http://blog.tanelpoder.com/2012/05/02/advanced-oracle-troubleshooting-guide-part-10-index-unique-scan-doing-multiblock-reads/

Also, regarding the 25% of buffer cache threshold, I think one more thing what may affect this is that how many working sets you have in the buffer pool and how many DBWRs servicing them... I haven&#039;t tested any of this but there&#039;s a chance that these factors affect the 25% threshold logic as well...]]></description>
		<content:encoded><![CDATA[<p>I&#8217;ll drop a couple of related links here too :)</p>
<p>- Alex Fatkulin&#8217;s has written a little script for test the effect of object caching too: <a href="http://afatkulin.blogspot.ca/2009/01/11g-adaptive-direct-path-reads-what-is.html" rel="nofollow">http://afatkulin.blogspot.ca/2009/01/11g-adaptive-direct-path-reads-what-is.html</a><br />
- Regarding buffer cache warmup &#8211; there&#8217;s a nice statistic called &#8220;physical reads prefetch warmup&#8221; which should tell whether prefetching happened due to buffer cache being &#8220;empty&#8221;. There&#8217;s an example of this happening here: <a href="http://blog.tanelpoder.com/2012/05/02/advanced-oracle-troubleshooting-guide-part-10-index-unique-scan-doing-multiblock-reads/" rel="nofollow">http://blog.tanelpoder.com/2012/05/02/advanced-oracle-troubleshooting-guide-part-10-index-unique-scan-doing-multiblock-reads/</a></p>
<p>Also, regarding the 25% of buffer cache threshold, I think one more thing what may affect this is that how many working sets you have in the buffer pool and how many DBWRs servicing them&#8230; I haven&#8217;t tested any of this but there&#8217;s a chance that these factors affect the 25% threshold logic as well&#8230;</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Jonathan Lewis</title>
		<link>http://jonathanlewis.wordpress.com/2011/03/24/small-tables/#comment-44684</link>
		<dc:creator><![CDATA[Jonathan Lewis]]></dc:creator>
		<pubDate>Thu, 26 Jan 2012 21:37:25 +0000</pubDate>
		<guid isPermaLink="false">http://jonathanlewis.wordpress.com/?p=5999#comment-44684</guid>
		<description><![CDATA[Mich,

I&#039;ve replied to this on the OTN thread, but I didn&#039;t say anything about Oracle choosing to do the direct path read just after startup. The public information about this is still under investigation but I recall seeing some notes somewhere about Oracle&#039;s decision being based in part on the number (or percentage) of blocks from the object that are already in the cache. When you start up there&#039;s nothing from the object in the cache, so it maximises the chance of Oracle picking the direct path option.

I made some comments, and showed some arithmetic, about the effect of an empty cache on the speed of repeating the tablescan using db file scattered, so I won&#039;t repeat that bit here.]]></description>
		<content:encoded><![CDATA[<p>Mich,</p>
<p>I&#8217;ve replied to this on the OTN thread, but I didn&#8217;t say anything about Oracle choosing to do the direct path read just after startup. The public information about this is still under investigation but I recall seeing some notes somewhere about Oracle&#8217;s decision being based in part on the number (or percentage) of blocks from the object that are already in the cache. When you start up there&#8217;s nothing from the object in the cache, so it maximises the chance of Oracle picking the direct path option.</p>
<p>I made some comments, and showed some arithmetic, about the effect of an empty cache on the speed of repeating the tablescan using db file scattered, so I won&#8217;t repeat that bit here.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Mich Talebzadeh</title>
		<link>http://jonathanlewis.wordpress.com/2011/03/24/small-tables/#comment-44377</link>
		<dc:creator><![CDATA[Mich Talebzadeh]]></dc:creator>
		<pubDate>Fri, 13 Jan 2012 16:25:10 +0000</pubDate>
		<guid isPermaLink="false">http://jonathanlewis.wordpress.com/?p=5999#comment-44377</guid>
		<description><![CDATA[Hi Jonathan,

Very interesting reads. I follow up the thread that I opened in OTN  under title &quot;serial table scan with direct path read compared to db file scattered read&quot;, http://forums.oracle.com/forums/thread.jspa?messageID=10081807

I have a table 1.7 Million rows with 1,943,824 blocks (no index) and a default  buffer cache of 8320MB. That is 8320*1024*1024/8192 = 1,064,960 8K available blocks. At 25% max this table can have up to 1,064,960 * 0.25 = 266,240 blocks in memory or around 14% of the table. So roughly in 7 cycles the whole table can be scanned in. 

When the buffer is empty after reboot, it prefers to use &quot;direct path read&quot; (DPR) as opposed to full serial table scan (FTS). So there must be some costing estimates that says DPR is cheaper than FTS. I suspect this may be the size of the underlying table larger than 2% of small_table_threshold as shown below).

The full stats are as follows:
[sourcecode]
Parameter                      buffer cache size/MB
------------------------------ --------------------
buffer_cache                                  8,320

Small table threshold at 2% of buffer cache size/MB Small table block limit
--------------------------------------------------- -----------------------
                                                166                  21,299

My table details

TABLE_NAME                               rows block size/KB       blocks avg free space/KB Table size/MB
-------------------------------- ------------ ------------- ------------ ----------------- -------------
TDASH                               1,729,204             8    1,943,824               805        13,714

Table block size/threhold limit
-------------------------------
                             91
[/sourcecode]
When I run a test load on this table as
[sourcecode]
ALTER SESSION SET TRACEFILE_IDENTIFIER = &#039;test_with_tdash_ssdtester_noindex&#039;;
DECLARE
        type array is table of tdash%ROWTYPE index by binary_integer;
        l_data array;
        l_rec tdash%rowtype;
BEGIN
        SELECT
                a.*
                ,RPAD(&#039;*&#039;,4000,&#039;*&#039;) AS PADDING1
                ,RPAD(&#039;*&#039;,4000,&#039;*&#039;) AS PADDING2
        BULK COLLECT INTO
        l_data
        FROM ALL_OBJECTS a;
 
        DBMS_MONITOR.SESSION_TRACE_ENABLE ( waits=&gt;true );
        FOR rs IN 1 .. 100
        LOOP
                BEGIN
                        SELECT * INTO l_rec FROM tdash WHERE object_id = l_data(rs).object_id;
                EXCEPTION
                  WHEN NO_DATA_FOUND THEN NULL;
                END;
        END LOOP;
END;
[/sourcecode]
It takes 6,520  seconds to finish.

When I force it not to use DPR, it chooses db file scattered read and it finishes in  4,299 seconds. Details are posted in the link

So I am a bit confused.

cheers,

Mich
]]></description>
		<content:encoded><![CDATA[<p>Hi Jonathan,</p>
<p>Very interesting reads. I follow up the thread that I opened in OTN  under title &#8220;serial table scan with direct path read compared to db file scattered read&#8221;, <a href="http://forums.oracle.com/forums/thread.jspa?messageID=10081807" rel="nofollow">http://forums.oracle.com/forums/thread.jspa?messageID=10081807</a></p>
<p>I have a table 1.7 Million rows with 1,943,824 blocks (no index) and a default  buffer cache of 8320MB. That is 8320*1024*1024/8192 = 1,064,960 8K available blocks. At 25% max this table can have up to 1,064,960 * 0.25 = 266,240 blocks in memory or around 14% of the table. So roughly in 7 cycles the whole table can be scanned in. </p>
<p>When the buffer is empty after reboot, it prefers to use &#8220;direct path read&#8221; (DPR) as opposed to full serial table scan (FTS). So there must be some costing estimates that says DPR is cheaper than FTS. I suspect this may be the size of the underlying table larger than 2% of small_table_threshold as shown below).</p>
<p>The full stats are as follows:</p>
<pre class="brush: plain; title: ; notranslate">
Parameter                      buffer cache size/MB
------------------------------ --------------------
buffer_cache                                  8,320

Small table threshold at 2% of buffer cache size/MB Small table block limit
--------------------------------------------------- -----------------------
                                                166                  21,299

My table details

TABLE_NAME                               rows block size/KB       blocks avg free space/KB Table size/MB
-------------------------------- ------------ ------------- ------------ ----------------- -------------
TDASH                               1,729,204             8    1,943,824               805        13,714

Table block size/threhold limit
-------------------------------
                             91
</pre>
<p>When I run a test load on this table as</p>
<pre class="brush: plain; title: ; notranslate">
ALTER SESSION SET TRACEFILE_IDENTIFIER = 'test_with_tdash_ssdtester_noindex';
DECLARE
        type array is table of tdash%ROWTYPE index by binary_integer;
        l_data array;
        l_rec tdash%rowtype;
BEGIN
        SELECT
                a.*
                ,RPAD('*',4000,'*') AS PADDING1
                ,RPAD('*',4000,'*') AS PADDING2
        BULK COLLECT INTO
        l_data
        FROM ALL_OBJECTS a;
 
        DBMS_MONITOR.SESSION_TRACE_ENABLE ( waits=&gt;true );
        FOR rs IN 1 .. 100
        LOOP
                BEGIN
                        SELECT * INTO l_rec FROM tdash WHERE object_id = l_data(rs).object_id;
                EXCEPTION
                  WHEN NO_DATA_FOUND THEN NULL;
                END;
        END LOOP;
END;
</pre>
<p>It takes 6,520  seconds to finish.</p>
<p>When I force it not to use DPR, it chooses db file scattered read and it finishes in  4,299 seconds. Details are posted in the link</p>
<p>So I am a bit confused.</p>
<p>cheers,</p>
<p>Mich</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: אתגר החודש – אתגר מאי2011 - Israel Database Portal</title>
		<link>http://jonathanlewis.wordpress.com/2011/03/24/small-tables/#comment-40593</link>
		<dc:creator><![CDATA[אתגר החודש – אתגר מאי2011 - Israel Database Portal]]></dc:creator>
		<pubDate>Wed, 01 Jun 2011 12:24:31 +0000</pubDate>
		<guid isPermaLink="false">http://jonathanlewis.wordpress.com/?p=5999#comment-40593</guid>
		<description><![CDATA[[...] http://jonathanlewis.wordpress.com/2011/03/24/small-tables/ http://dioncho.wordpress.com/2009/04/22/strong-doubt-on-small-table_small_table_threshold/ http://www.dbacomp.com.br/blog/?p=73 [...]]]></description>
		<content:encoded><![CDATA[<p>[...] <a href="http://jonathanlewis.wordpress.com/2011/03/24/small-tables/" rel="nofollow">http://jonathanlewis.wordpress.com/2011/03/24/small-tables/</a> <a href="http://dioncho.wordpress.com/2009/04/22/strong-doubt-on-small-table_small_table_threshold/" rel="nofollow">http://dioncho.wordpress.com/2009/04/22/strong-doubt-on-small-table_small_table_threshold/</a> <a href="http://www.dbacomp.com.br/blog/?p=73" rel="nofollow">http://www.dbacomp.com.br/blog/?p=73</a> [...]</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Kyle Hailey</title>
		<link>http://jonathanlewis.wordpress.com/2011/03/24/small-tables/#comment-40169</link>
		<dc:creator><![CDATA[Kyle Hailey]]></dc:creator>
		<pubDate>Mon, 28 Mar 2011 18:28:46 +0000</pubDate>
		<guid isPermaLink="false">http://jonathanlewis.wordpress.com/?p=5999#comment-40169</guid>
		<description><![CDATA[&gt;&gt; A possibly explanation for the 5x buffer cache anomaly might be that with a starting 
&gt;&gt; empty cache you fill the cache as the tablescan starts, and then keep recycling 
&gt;&gt; the bottom 25% of the cache from that point onwards.

That would make sense! I was wondering what kind of algorithm Oracle could use to efficiently cache a table that was larger than the buffer cache, but using this (accidental?) algorithm would be an explanation.

What I&#039;m most interested in is the parameters that control the caching mechanisms.
My main concern is a customers 10.2.0.4 is not caching even with an empty buffer cache when on my in house system it is caching with tables up to 80% of the buffer cache size. (I&#039;m on LINUX and customer is on Solaris)
In both cases _small_table_threshold is the default.

The use case I&#039;m looking to solve is a customer going from depending on the UNIX file system cache to depending on the SGA alone as they are switching to Direct File I/O.
On their system (Solaris 10.2.0.4 empty cache) the caching only happens well below 80%, more like 10-25% (I didn&#039;t have the liberty to run extensive tests and nail down the value). I&#039;m trying to figure out the options for making caching happen with bigger tables. Customer won&#039;t use the cache table option because they want the new configuration to run just like the old without any extra administration. They are willing to change init params and maybe _small_table_threshold is the only way, but on my LINUX 10.2.0.4 I have the default _small_table_threshold and I&#039;m seeing caching up with tables up to 80% of buffer cache size (start with an empty buffer cache) and I&#039;m wondering &quot;is there some other mechanism controlling caching besides _small_table_threshold?&quot;
Customer wants the same &quot;feel&quot; with direct I/O as without.  Without Direct I/O they could read a table once via FTS and then upon second read, the response time would be as if it was cached even though Oracle reports a high level of physical reads because the reads are coming from the UNIX file system cache.
Now, with Direct IO and even if I have them increase the size of their buffer even beyond the size of this table, my concern is they won&#039;t be caching the table. I can have them change _small_table_threshold but I&#039;m wondering if there is another mechanism which seems to be the case on my 10.2.0.4 on LINUX.
The other alternative is something on the customers database is preventing the caching of larger tables even after an &quot;alter system flush buffer_cache&quot;

PS my test script came from:
http://dioncho.wordpress.com/2009/04/22/strong-doubt-on-small-table_small_table_threshold/
Charles Hooper pointed me to your entry here, from his blog at:
http://hoopercharles.wordpress.com/2010/06/17/_small_table_threshold-parameter-and-buffer-cache-what-is-wrong-with-this-quote/
Glenn Fawcett has a good blog on the overhead of using the UNIX file system cache  verses Oracle buffer cache
http://blogs.sun.com/glennf/entry/where_do_you_cache_oracle]]></description>
		<content:encoded><![CDATA[<p>&gt;&gt; A possibly explanation for the 5x buffer cache anomaly might be that with a starting<br />
&gt;&gt; empty cache you fill the cache as the tablescan starts, and then keep recycling<br />
&gt;&gt; the bottom 25% of the cache from that point onwards.</p>
<p>That would make sense! I was wondering what kind of algorithm Oracle could use to efficiently cache a table that was larger than the buffer cache, but using this (accidental?) algorithm would be an explanation.</p>
<p>What I&#8217;m most interested in is the parameters that control the caching mechanisms.<br />
My main concern is a customers 10.2.0.4 is not caching even with an empty buffer cache when on my in house system it is caching with tables up to 80% of the buffer cache size. (I&#8217;m on LINUX and customer is on Solaris)<br />
In both cases _small_table_threshold is the default.</p>
<p>The use case I&#8217;m looking to solve is a customer going from depending on the UNIX file system cache to depending on the SGA alone as they are switching to Direct File I/O.<br />
On their system (Solaris 10.2.0.4 empty cache) the caching only happens well below 80%, more like 10-25% (I didn&#8217;t have the liberty to run extensive tests and nail down the value). I&#8217;m trying to figure out the options for making caching happen with bigger tables. Customer won&#8217;t use the cache table option because they want the new configuration to run just like the old without any extra administration. They are willing to change init params and maybe _small_table_threshold is the only way, but on my LINUX 10.2.0.4 I have the default _small_table_threshold and I&#8217;m seeing caching up with tables up to 80% of buffer cache size (start with an empty buffer cache) and I&#8217;m wondering &#8220;is there some other mechanism controlling caching besides _small_table_threshold?&#8221;<br />
Customer wants the same &#8220;feel&#8221; with direct I/O as without.  Without Direct I/O they could read a table once via FTS and then upon second read, the response time would be as if it was cached even though Oracle reports a high level of physical reads because the reads are coming from the UNIX file system cache.<br />
Now, with Direct IO and even if I have them increase the size of their buffer even beyond the size of this table, my concern is they won&#8217;t be caching the table. I can have them change _small_table_threshold but I&#8217;m wondering if there is another mechanism which seems to be the case on my 10.2.0.4 on LINUX.<br />
The other alternative is something on the customers database is preventing the caching of larger tables even after an &#8220;alter system flush buffer_cache&#8221;</p>
<p>PS my test script came from:<br />
<a href="http://dioncho.wordpress.com/2009/04/22/strong-doubt-on-small-table_small_table_threshold/" rel="nofollow">http://dioncho.wordpress.com/2009/04/22/strong-doubt-on-small-table_small_table_threshold/</a><br />
Charles Hooper pointed me to your entry here, from his blog at:<br />
<a href="http://hoopercharles.wordpress.com/2010/06/17/_small_table_threshold-parameter-and-buffer-cache-what-is-wrong-with-this-quote/" rel="nofollow">http://hoopercharles.wordpress.com/2010/06/17/_small_table_threshold-parameter-and-buffer-cache-what-is-wrong-with-this-quote/</a><br />
Glenn Fawcett has a good blog on the overhead of using the UNIX file system cache  verses Oracle buffer cache<br />
<a href="http://blogs.sun.com/glennf/entry/where_do_you_cache_oracle" rel="nofollow">http://blogs.sun.com/glennf/entry/where_do_you_cache_oracle</a></p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Jonathan Lewis</title>
		<link>http://jonathanlewis.wordpress.com/2011/03/24/small-tables/#comment-40154</link>
		<dc:creator><![CDATA[Jonathan Lewis]]></dc:creator>
		<pubDate>Sat, 26 Mar 2011 20:39:51 +0000</pubDate>
		<guid isPermaLink="false">http://jonathanlewis.wordpress.com/?p=5999#comment-40154</guid>
		<description><![CDATA[Kyle,

One of the boundary conditions on the buffer cache behaviour appears when the entire cache is &quot;free&quot; - that&#039;s why I started my experiment with an index range scan to load a very large fraction of the cache before I started doing any tablescans. 

In your 10.2.0.4 case the whole cache was free, which means there are no useful data blocks to protect, so the code seems to have been allowed to use the whole cache on the tablescan.

Your first 11.2.0.1 experiment (table above/below 10%) suggests that Oracle Corp. has changed the boundary case so that it always uses the 10% that would apply to a &quot;full&quot; cache, even if the whole cache is free.

The second 11.2.0.1 experiment (_small_table_threshold set to 10% of cache_size) suggests that the algorithm I suggested relating to the 10% limit still holds at 10% - and NOT at 5 * _small_table_threshold.

With the very large _small_table_threshold, it would be useful to check how many blocks from the table were in the cache after the tablescan - it might be 10% of the cache. (You didn&#039;t say what the cache size was in this test, was it about 110,000, or was it back to the original 220,000).

A possibly explanation for the 5x buffer cache anomaly might be that with a starting empty cache you fill the cache as the tablescan starts, and then keep recycling the bottom 25% of the cache from that point onwards.]]></description>
		<content:encoded><![CDATA[<p>Kyle,</p>
<p>One of the boundary conditions on the buffer cache behaviour appears when the entire cache is &#8220;free&#8221; &#8211; that&#8217;s why I started my experiment with an index range scan to load a very large fraction of the cache before I started doing any tablescans. </p>
<p>In your 10.2.0.4 case the whole cache was free, which means there are no useful data blocks to protect, so the code seems to have been allowed to use the whole cache on the tablescan.</p>
<p>Your first 11.2.0.1 experiment (table above/below 10%) suggests that Oracle Corp. has changed the boundary case so that it always uses the 10% that would apply to a &#8220;full&#8221; cache, even if the whole cache is free.</p>
<p>The second 11.2.0.1 experiment (_small_table_threshold set to 10% of cache_size) suggests that the algorithm I suggested relating to the 10% limit still holds at 10% &#8211; and NOT at 5 * _small_table_threshold.</p>
<p>With the very large _small_table_threshold, it would be useful to check how many blocks from the table were in the cache after the tablescan &#8211; it might be 10% of the cache. (You didn&#8217;t say what the cache size was in this test, was it about 110,000, or was it back to the original 220,000).</p>
<p>A possibly explanation for the 5x buffer cache anomaly might be that with a starting empty cache you fill the cache as the tablescan starts, and then keep recycling the bottom 25% of the cache from that point onwards.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Kyle Hailey</title>
		<link>http://jonathanlewis.wordpress.com/2011/03/24/small-tables/#comment-40135</link>
		<dc:creator><![CDATA[Kyle Hailey]]></dc:creator>
		<pubDate>Fri, 25 Mar 2011 20:07:09 +0000</pubDate>
		<guid isPermaLink="false">http://jonathanlewis.wordpress.com/?p=5999#comment-40135</guid>
		<description><![CDATA[Nice analysis as always.
Currently I&#039;m a bit perplexed by the caching results on my 10.2.0.4 (compatibility 10.2.0.3) on Linux:
[sourcecode]
  drop table cacher;
  create table cacher(c1 char(2000), c2 char(2000), c3 char(2000)) nologging;
  insert /*+ append */ into cacher
    select &#039;x&#039;, &#039;x&#039;, &#039;x&#039;
    from dual
       connect by level &lt;= 220000     ;
  EXEC DBMS_STATS.gather_table_stats(NULL, &#039;CACHER&#039;);
  commit;
  alter system flush buffer_cache;
  set autot on stat
  select count(*) from cacher;
  select count(*) from cacher;
  select count(*) from cacher;
  set autot off;
[/sourcecode]
with parameters
  _db_block_buffers                   273102
  _small_table_threshold              5462

[sourcecode]
80% of buffer cache Statistics with stats gathered
----------------------------------------------------------
     220033  consistent gets
     220005  physical reads
----------------------------------------------------------
     220023  consistent gets
          0  physical reads
----------------------------------------------------------
     220023  consistent gets
          0  physical reads
[/sourcecode]
On my windows box 11.2.0.1 the caching seems to start out at 10% of the buffer cache, then if I change small_table_threshold, that value seems to be what is used for the caching limit:

with parameters
    _db_block_buffers             51220   25% = 12805,10% = 5122
    _small_table_threshold        1024
[sourcecode]
 drop table cacher;
  create table cacher(c1 char(2000), c2 char(2000), c3 char(2000)) nologging;
  insert /*+ append */ into cacher
    select &#039;x&#039;, &#039;x&#039;, &#039;x&#039;
    from dual
    -- connect by level &lt;= 5120    -- caches
       connect by level &lt;= 5122    -- doesn&#039;t cache ;
  commit;
  alter system flush buffer_cache;
  select count(*) from cacher;
  select count(*) from cacher;
  set autot on stat
  select count(*) from cacher;
  set autot off;
[/sourcecode]
I get these results
[sourcecode]
  3rd count(*) Statistics  10% of buffer cache size
  ----------------------------------------------------------
         5126  consistent gets
         5122  physical reads
[/sourcecode]
now if I increase small table threshold and bounce the database:
[sourcecode]
alter system set &quot;_small_table_threshold&quot;=10240 scope=spfile;
startup force
[/sourcecode]
I can now cache more on full table scan:
[sourcecode]
Statistics with size just below small table threshold
----------------------------------------------------------
      10091  consistent gets
      10004  physical reads
----------------------------------------------------------
      10011  consistent gets
          0  physical reads
----------------------------------------------------------
      10011  consistent gets
          0  physical reads
Statistics for FTS with size just above small table threshold
----------------------------------------------------------
      11101  consistent gets
      11288  physical reads
----------------------------------------------------------
      11004  consistent gets
      11000  physical reads
----------------------------------------------------------
      11004  consistent gets
      11000  physical reads
[/sourcecode]
Also if I set small table threshold greater than the size of the buffer cache I get no caching with a table bigger than the buffer cache:
[sourcecode]
alter system set &quot;_small_table_threshold&quot;=1100000 scope=spfile;
startup force;
Statistics
-----------------------------------------------------
     102545  consistent gets
     102522  physical reads
-----------------------------------------------------
     102460  consistent gets
     102425  physical reads
-----------------------------------------------------
     102460  consistent gets
     102425  physical reads
[/sourcecode]
where as on LINUX I get surpringly helpful caching with a table 5x the buffer cache:
[sourcecode][/sourcecode]
  _db_block_buffers                   273102
  _small_table_threshold              5462

500%  bigger buffer cache
----------------------------------------------------------
    1350131  consistent gets
    1350002  physical reads
----------------------------------------------------------
    1350037  consistent gets
    1177881  physical reads
----------------------------------------------------------
    1350037  consistent gets
    1177856  physical reads
[sourcecode][/sourcecode]
]]></description>
		<content:encoded><![CDATA[<p>Nice analysis as always.<br />
Currently I&#8217;m a bit perplexed by the caching results on my 10.2.0.4 (compatibility 10.2.0.3) on Linux:</p>
<pre class="brush: plain; title: ; notranslate">
  drop table cacher;
  create table cacher(c1 char(2000), c2 char(2000), c3 char(2000)) nologging;
  insert /*+ append */ into cacher
    select 'x', 'x', 'x'
    from dual
       connect by level &amp;lt;= 220000     ;
  EXEC DBMS_STATS.gather_table_stats(NULL, &#039;CACHER&#039;);
  commit;
  alter system flush buffer_cache;
  set autot on stat
  select count(*) from cacher;
  select count(*) from cacher;
  select count(*) from cacher;
  set autot off;
</pre>
<p>with parameters<br />
  _db_block_buffers                   273102<br />
  _small_table_threshold              5462</p>
<pre class="brush: plain; title: ; notranslate">
80% of buffer cache Statistics with stats gathered
----------------------------------------------------------
     220033  consistent gets
     220005  physical reads
----------------------------------------------------------
     220023  consistent gets
          0  physical reads
----------------------------------------------------------
     220023  consistent gets
          0  physical reads
</pre>
<p>On my windows box 11.2.0.1 the caching seems to start out at 10% of the buffer cache, then if I change small_table_threshold, that value seems to be what is used for the caching limit:</p>
<p>with parameters<br />
    _db_block_buffers             51220   25% = 12805,10% = 5122<br />
    _small_table_threshold        1024</p>
<pre class="brush: plain; title: ; notranslate">
 drop table cacher;
  create table cacher(c1 char(2000), c2 char(2000), c3 char(2000)) nologging;
  insert /*+ append */ into cacher
    select 'x', 'x', 'x'
    from dual
    -- connect by level &amp;lt;= 5120    -- caches
       connect by level &amp;lt;= 5122    -- doesn&#039;t cache ;
  commit;
  alter system flush buffer_cache;
  select count(*) from cacher;
  select count(*) from cacher;
  set autot on stat
  select count(*) from cacher;
  set autot off;
</pre>
<p>I get these results</p>
<pre class="brush: plain; title: ; notranslate">
  3rd count(*) Statistics  10% of buffer cache size
  ----------------------------------------------------------
         5126  consistent gets
         5122  physical reads
</pre>
<p>now if I increase small table threshold and bounce the database:</p>
<pre class="brush: plain; title: ; notranslate">
alter system set &quot;_small_table_threshold&quot;=10240 scope=spfile;
startup force
</pre>
<p>I can now cache more on full table scan:</p>
<pre class="brush: plain; title: ; notranslate">
Statistics with size just below small table threshold
----------------------------------------------------------
      10091  consistent gets
      10004  physical reads
----------------------------------------------------------
      10011  consistent gets
          0  physical reads
----------------------------------------------------------
      10011  consistent gets
          0  physical reads
Statistics for FTS with size just above small table threshold
----------------------------------------------------------
      11101  consistent gets
      11288  physical reads
----------------------------------------------------------
      11004  consistent gets
      11000  physical reads
----------------------------------------------------------
      11004  consistent gets
      11000  physical reads
</pre>
<p>Also if I set small table threshold greater than the size of the buffer cache I get no caching with a table bigger than the buffer cache:</p>
<pre class="brush: plain; title: ; notranslate">
alter system set &quot;_small_table_threshold&quot;=1100000 scope=spfile;
startup force;
Statistics
-----------------------------------------------------
     102545  consistent gets
     102522  physical reads
-----------------------------------------------------
     102460  consistent gets
     102425  physical reads
-----------------------------------------------------
     102460  consistent gets
     102425  physical reads
</pre>
<p>where as on LINUX I get surpringly helpful caching with a table 5x the buffer cache:</p>
<p>  _db_block_buffers                   273102<br />
  _small_table_threshold              5462</p>
<p>500%  bigger buffer cache<br />
&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;-<br />
    1350131  consistent gets<br />
    1350002  physical reads<br />
&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;-<br />
    1350037  consistent gets<br />
    1177881  physical reads<br />
&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;-<br />
    1350037  consistent gets<br />
    1177856  physical reads</p>
]]></content:encoded>
	</item>
</channel>
</rss>
