<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:georss="http://www.georss.org/georss" xmlns:geo="http://www.w3.org/2003/01/geo/wgs84_pos#" xmlns:media="http://search.yahoo.com/mrss/"
		>
<channel>
	<title>Comments on: Joins &#8211; HJ</title>
	<atom:link href="http://jonathanlewis.wordpress.com/2010/08/10/joins-hj/feed/" rel="self" type="application/rss+xml" />
	<link>http://jonathanlewis.wordpress.com/2010/08/10/joins-hj/</link>
	<description>Just another Oracle weblog</description>
	<lastBuildDate>Fri, 24 May 2013 13:27:07 +0000</lastBuildDate>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.com/</generator>
	<item>
		<title>By: Amir Riaz</title>
		<link>http://jonathanlewis.wordpress.com/2010/08/10/joins-hj/#comment-37583</link>
		<dc:creator><![CDATA[Amir Riaz]]></dc:creator>
		<pubDate>Thu, 21 Oct 2010 18:22:34 +0000</pubDate>
		<guid isPermaLink="false">http://jonathanlewis.wordpress.com/?p=4244#comment-37583</guid>
		<description><![CDATA[great, 

i was not able to see that aspect. Truely impressed with your way of thinking. 

Thanks]]></description>
		<content:encoded><![CDATA[<p>great, </p>
<p>i was not able to see that aspect. Truely impressed with your way of thinking. </p>
<p>Thanks</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Jonathan Lewis</title>
		<link>http://jonathanlewis.wordpress.com/2010/08/10/joins-hj/#comment-37581</link>
		<dc:creator><![CDATA[Jonathan Lewis]]></dc:creator>
		<pubDate>Wed, 20 Oct 2010 20:54:02 +0000</pubDate>
		<guid isPermaLink="false">http://jonathanlewis.wordpress.com/?p=4244#comment-37581</guid>
		<description><![CDATA[Amir,
You still haven&#039;t described your scenario with sufficient accuracy.

Consider: if your 10 &quot;unique&quot; rows in T1 are 32KB each and your 1,000 &quot;non-unique&quot; rows in T2 are 12 bytes each, then Oracle might choose to  build the hash table from T1.

When you say that the rows in T1 are unique, do you mean that they are unique as far as the join column(s) are concerned - and are the 1,000 &quot;non-unique&quot; rows distributed so that there are roughly 100 rows from T2 to join to each row in T1 ?

As far as the appeal of your hypothesis is concerned, you seem to be thinking only of the speed with which the hash value can be calculated. Have you considered the extra cost (and mechanics) of creating a hash table that contains more data ? Have you considered what happens if the larger data set can&#039;t fit into memory ? Have you considered the cost of &quot;probing a hash bucket&quot; and finding that you have to follow a linked list to another 99 matching values in that bucket - if that 100 to 1 join is the thought behind your example.]]></description>
		<content:encoded><![CDATA[<p>Amir,<br />
You still haven&#8217;t described your scenario with sufficient accuracy.</p>
<p>Consider: if your 10 &#8220;unique&#8221; rows in T1 are 32KB each and your 1,000 &#8220;non-unique&#8221; rows in T2 are 12 bytes each, then Oracle might choose to  build the hash table from T1.</p>
<p>When you say that the rows in T1 are unique, do you mean that they are unique as far as the join column(s) are concerned &#8211; and are the 1,000 &#8220;non-unique&#8221; rows distributed so that there are roughly 100 rows from T2 to join to each row in T1 ?</p>
<p>As far as the appeal of your hypothesis is concerned, you seem to be thinking only of the speed with which the hash value can be calculated. Have you considered the extra cost (and mechanics) of creating a hash table that contains more data ? Have you considered what happens if the larger data set can&#8217;t fit into memory ? Have you considered the cost of &#8220;probing a hash bucket&#8221; and finding that you have to follow a linked list to another 99 matching values in that bucket &#8211; if that 100 to 1 join is the thought behind your example.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Amir Riaz</title>
		<link>http://jonathanlewis.wordpress.com/2010/08/10/joins-hj/#comment-37580</link>
		<dc:creator><![CDATA[Amir Riaz]]></dc:creator>
		<pubDate>Wed, 20 Oct 2010 20:04:44 +0000</pubDate>
		<guid isPermaLink="false">http://jonathanlewis.wordpress.com/?p=4244#comment-37580</guid>
		<description><![CDATA[so which table should completely fit in PGA&#039;s hash_area_size memory. in above example since T1 is parent table and T2 is child.  so T2 was bigger. in simple, if oracle create hash table with T1(10 rows) and probe it with T2. The entire probe input is scanned or computed one row at a time, and for each probe row, the hash key&#039;s value is computed, the corresponding hash bucket is scanned, and the matches are produced. Since T1 is hash table in memory so hash buckets exists in T1 and in this case we will have only one row.

in case we create hash table from T2(100) non unique rows which exists in hash buckets and probe it with T1.  i.e take a row from T1 calculate its hashing key and then use the hash key to reach the hash bucket of memory table T2.  this theory looks more appealing to me.]]></description>
		<content:encoded><![CDATA[<p>so which table should completely fit in PGA&#8217;s hash_area_size memory. in above example since T1 is parent table and T2 is child.  so T2 was bigger. in simple, if oracle create hash table with T1(10 rows) and probe it with T2. The entire probe input is scanned or computed one row at a time, and for each probe row, the hash key&#8217;s value is computed, the corresponding hash bucket is scanned, and the matches are produced. Since T1 is hash table in memory so hash buckets exists in T1 and in this case we will have only one row.</p>
<p>in case we create hash table from T2(100) non unique rows which exists in hash buckets and probe it with T1.  i.e take a row from T1 calculate its hashing key and then use the hash key to reach the hash bucket of memory table T2.  this theory looks more appealing to me.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Jonathan Lewis</title>
		<link>http://jonathanlewis.wordpress.com/2010/08/10/joins-hj/#comment-37564</link>
		<dc:creator><![CDATA[Jonathan Lewis]]></dc:creator>
		<pubDate>Tue, 19 Oct 2010 11:09:58 +0000</pubDate>
		<guid isPermaLink="false">http://jonathanlewis.wordpress.com/?p=4244#comment-37564</guid>
		<description><![CDATA[Amir Riaz,

I do not know the hashing function used by Oracle. However the in-memory hash table will have a number of buckets that is a power of 2 (in fact it might even be a power of 4) - and Oracle has a couple of hash functions based on generating value between 1 and 2^N that could be used.

Based on your description there is no guarantee that Oracle would choose to do a hash join - but if it did, and if its predicted cardinalities were realistic assessments of the figures you have given me then it would probably build the hash table from the 10 rows from t1 and probe it with the 1,000 rows from t2.]]></description>
		<content:encoded><![CDATA[<p>Amir Riaz,</p>
<p>I do not know the hashing function used by Oracle. However the in-memory hash table will have a number of buckets that is a power of 2 (in fact it might even be a power of 4) &#8211; and Oracle has a couple of hash functions based on generating value between 1 and 2^N that could be used.</p>
<p>Based on your description there is no guarantee that Oracle would choose to do a hash join &#8211; but if it did, and if its predicted cardinalities were realistic assessments of the figures you have given me then it would probably build the hash table from the 10 rows from t1 and probe it with the 1,000 rows from t2.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Amir Riaz</title>
		<link>http://jonathanlewis.wordpress.com/2010/08/10/joins-hj/#comment-37555</link>
		<dc:creator><![CDATA[Amir Riaz]]></dc:creator>
		<pubDate>Sun, 17 Oct 2010 19:30:27 +0000</pubDate>
		<guid isPermaLink="false">http://jonathanlewis.wordpress.com/?p=4244#comment-37555</guid>
		<description><![CDATA[Jonathon, 

I am a bit confused.  for hashing you need a hash function who oracle calculates it. secondly if i have two tables T1 containing 10 unique rows and other T2 table containing 1000 non unique rows. will oracle place T1 in PGA memory or T2 or it create a memory hash table with T2 and gets T1 from buffer cache and for each row of T1 gets the subsets of rows from T2, nested loops them and return to us.  will you please elobrate?

regards
Amir Riaz]]></description>
		<content:encoded><![CDATA[<p>Jonathon, </p>
<p>I am a bit confused.  for hashing you need a hash function who oracle calculates it. secondly if i have two tables T1 containing 10 unique rows and other T2 table containing 1000 non unique rows. will oracle place T1 in PGA memory or T2 or it create a memory hash table with T2 and gets T1 from buffer cache and for each row of T1 gets the subsets of rows from T2, nested loops them and return to us.  will you please elobrate?</p>
<p>regards<br />
Amir Riaz</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Xenofon Grigoriadis</title>
		<link>http://jonathanlewis.wordpress.com/2010/08/10/joins-hj/#comment-37034</link>
		<dc:creator><![CDATA[Xenofon Grigoriadis]]></dc:creator>
		<pubDate>Mon, 23 Aug 2010 02:12:43 +0000</pubDate>
		<guid isPermaLink="false">http://jonathanlewis.wordpress.com/?p=4244#comment-37034</guid>
		<description><![CDATA[Yes, I have to say, your book &quot;Cost Based Oracle – Fundamentals&quot; is so good, that I say to customers wanting to migrate to 11g: Don&#039;t! For 10g we have JL&#039;s book on the cost based optimizer! For 11g we don&#039;t! (Please write a follow-up...)]]></description>
		<content:encoded><![CDATA[<p>Yes, I have to say, your book &#8220;Cost Based Oracle – Fundamentals&#8221; is so good, that I say to customers wanting to migrate to 11g: Don&#8217;t! For 10g we have JL&#8217;s book on the cost based optimizer! For 11g we don&#8217;t! (Please write a follow-up&#8230;)</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Jonathan Lewis</title>
		<link>http://jonathanlewis.wordpress.com/2010/08/10/joins-hj/#comment-37016</link>
		<dc:creator><![CDATA[Jonathan Lewis]]></dc:creator>
		<pubDate>Thu, 19 Aug 2010 18:58:47 +0000</pubDate>
		<guid isPermaLink="false">http://jonathanlewis.wordpress.com/?p=4244#comment-37016</guid>
		<description><![CDATA[Radino,

I was just browsing some notes by Alex Fatkulin, and it crossed my mind that perhaps NLJ_BATCHING may be a mechanism that allows the &lt;a href=&quot;http://afatkulin.blogspot.com/2009/02/consistent-gets-from-cache-fastpath-2.html&quot; rel=&quot;nofollow&quot;&gt;&lt;em&gt;&lt;strong&gt;&quot;consistent gets (fastpath)&quot;&lt;/strong&gt;&lt;/em&gt;&lt;/a&gt; to appear more frequently.

(Just another thing to add to the list of ideas when I get around to looking at it properly.)
]]></description>
		<content:encoded><![CDATA[<p>Radino,</p>
<p>I was just browsing some notes by Alex Fatkulin, and it crossed my mind that perhaps NLJ_BATCHING may be a mechanism that allows the <a href="http://afatkulin.blogspot.com/2009/02/consistent-gets-from-cache-fastpath-2.html" rel="nofollow"><em><strong>&#8220;consistent gets (fastpath)&#8221;</strong></em></a> to appear more frequently.</p>
<p>(Just another thing to add to the list of ideas when I get around to looking at it properly.)</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: kartik</title>
		<link>http://jonathanlewis.wordpress.com/2010/08/10/joins-hj/#comment-37011</link>
		<dc:creator><![CDATA[kartik]]></dc:creator>
		<pubDate>Thu, 19 Aug 2010 09:45:51 +0000</pubDate>
		<guid isPermaLink="false">http://jonathanlewis.wordpress.com/?p=4244#comment-37011</guid>
		<description><![CDATA[HI Jonathan,

Thanks for the wonderful article, this was very helpful.

I have few doubts though, may be my oracle knowledge isnt good enough :(

could you please elaborate this statement as in the explanation above?

a hash join is a nested loop join where Oracle has extracted a subset of the data from the thing that was the “inner” table.

I just want to know how does Oracle does this, is it that we create Hash Table and then fire the select query? kindly clarify this.

Thanks,
Kartik]]></description>
		<content:encoded><![CDATA[<p>HI Jonathan,</p>
<p>Thanks for the wonderful article, this was very helpful.</p>
<p>I have few doubts though, may be my oracle knowledge isnt good enough :(</p>
<p>could you please elaborate this statement as in the explanation above?</p>
<p>a hash join is a nested loop join where Oracle has extracted a subset of the data from the thing that was the “inner” table.</p>
<p>I just want to know how does Oracle does this, is it that we create Hash Table and then fire the select query? kindly clarify this.</p>
<p>Thanks,<br />
Kartik</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Jonathan Lewis</title>
		<link>http://jonathanlewis.wordpress.com/2010/08/10/joins-hj/#comment-36993</link>
		<dc:creator><![CDATA[Jonathan Lewis]]></dc:creator>
		<pubDate>Tue, 17 Aug 2010 11:32:23 +0000</pubDate>
		<guid isPermaLink="false">http://jonathanlewis.wordpress.com/?p=4244#comment-36993</guid>
		<description><![CDATA[Sachin,

If you&#039;re expecting to join 10 rows from a small table to 10 row from a large table, you don&#039;t have a &quot;small data set&quot; and a &quot;large data set&quot; - you have two small data sets and the topic is irrelevant.

If you have Vipin&#039;s data sets: (100 rows in table with PK, 10,000,000 rows in table with FK) then do you want to join them with:
&lt;blockquote&gt;
NLJ from small to big
NLJ from big to small
NLJ from big to small after (permanently)recreating small as single table hash cluster
HJ with small as build table 
MJ - but we&#039;ll ignore that one for now
&lt;/blockquote&gt;

Now you can probably see the sense of comparing NLJ with HJ.

In passing, I was talking about latch gets, not buffer visits - some buffer visits require two latch gets. Depending on version, infrastructure, data patterns, and environment I could probably make the actual number vary from a few tens of thousands to an arbitrarily large number. The 40M is a ball-park figure to indicate order of magnitude in a fairly bland boring system joining a couple of heap tables with indexes.]]></description>
		<content:encoded><![CDATA[<p>Sachin,</p>
<p>If you&#8217;re expecting to join 10 rows from a small table to 10 row from a large table, you don&#8217;t have a &#8220;small data set&#8221; and a &#8220;large data set&#8221; &#8211; you have two small data sets and the topic is irrelevant.</p>
<p>If you have Vipin&#8217;s data sets: (100 rows in table with PK, 10,000,000 rows in table with FK) then do you want to join them with:</p>
<blockquote><p>
NLJ from small to big<br />
NLJ from big to small<br />
NLJ from big to small after (permanently)recreating small as single table hash cluster<br />
HJ with small as build table<br />
MJ &#8211; but we&#8217;ll ignore that one for now
</p></blockquote>
<p>Now you can probably see the sense of comparing NLJ with HJ.</p>
<p>In passing, I was talking about latch gets, not buffer visits &#8211; some buffer visits require two latch gets. Depending on version, infrastructure, data patterns, and environment I could probably make the actual number vary from a few tens of thousands to an arbitrarily large number. The 40M is a ball-park figure to indicate order of magnitude in a fairly bland boring system joining a couple of heap tables with indexes.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Jonathan Lewis</title>
		<link>http://jonathanlewis.wordpress.com/2010/08/10/joins-hj/#comment-36992</link>
		<dc:creator><![CDATA[Jonathan Lewis]]></dc:creator>
		<pubDate>Tue, 17 Aug 2010 11:15:52 +0000</pubDate>
		<guid isPermaLink="false">http://jonathanlewis.wordpress.com/?p=4244#comment-36992</guid>
		<description><![CDATA[Radino,
Thanks for that - it&#039;s a useful reference that gives us an important clue to where the &quot;batching&quot; appears.

I like his use of the &quot;repeatable test case&quot;.]]></description>
		<content:encoded><![CDATA[<p>Radino,<br />
Thanks for that &#8211; it&#8217;s a useful reference that gives us an important clue to where the &#8220;batching&#8221; appears.</p>
<p>I like his use of the &#8220;repeatable test case&#8221;.</p>
]]></content:encoded>
	</item>
</channel>
</rss>
