<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:georss="http://www.georss.org/georss" xmlns:geo="http://www.w3.org/2003/01/geo/wgs84_pos#" xmlns:media="http://search.yahoo.com/mrss/"
		>
<channel>
	<title>Comments on: Joins &#8211; MJ</title>
	<atom:link href="http://jonathanlewis.wordpress.com/2010/08/15/joins-mj/feed/" rel="self" type="application/rss+xml" />
	<link>http://jonathanlewis.wordpress.com/2010/08/15/joins-mj/</link>
	<description>Just another Oracle weblog</description>
	<lastBuildDate>Thu, 23 May 2013 12:47:17 +0000</lastBuildDate>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.com/</generator>
	<item>
		<title>By: Join Surprise &#171; Oracle Scratchpad</title>
		<link>http://jonathanlewis.wordpress.com/2010/08/15/joins-mj/#comment-38718</link>
		<dc:creator><![CDATA[Join Surprise &#171; Oracle Scratchpad]]></dc:creator>
		<pubDate>Wed, 15 Dec 2010 20:56:34 +0000</pubDate>
		<guid isPermaLink="false">http://jonathanlewis.wordpress.com/?p=4259#comment-38718</guid>
		<description><![CDATA[[...] We now run a &#8220;report&#8221; that generates data for a number-crunching tool that extracts all the data from the tables &#8211; using an outer join so that parent rows don&#8217;t get lost. For various reasons the tool wanted the data sorted in a certain order &#8211; so there&#8217;s also an order by clause in the query. I&#8217;m going to show you the original query &#8211; first unhinted, and then hinted to use a merge join: [...]]]></description>
		<content:encoded><![CDATA[<p>[...] We now run a &#8220;report&#8221; that generates data for a number-crunching tool that extracts all the data from the tables &#8211; using an outer join so that parent rows don&#8217;t get lost. For various reasons the tool wanted the data sorted in a certain order &#8211; so there&#8217;s also an order by clause in the query. I&#8217;m going to show you the original query &#8211; first unhinted, and then hinted to use a merge join: [...]</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Jeremy Schneider</title>
		<link>http://jonathanlewis.wordpress.com/2010/08/15/joins-mj/#comment-37427</link>
		<dc:creator><![CDATA[Jeremy Schneider]]></dc:creator>
		<pubDate>Wed, 29 Sep 2010 17:03:19 +0000</pubDate>
		<guid isPermaLink="false">http://jonathanlewis.wordpress.com/?p=4259#comment-37427</guid>
		<description><![CDATA[Aargh... ok I see it now.

Many-to-many join on non-unique datasets... I wasn&#039;t visualizing that quite right.  I think that merge sort and merge join algorithms got a little jumbled in my head.  It seemed &quot;artificial&quot; to me because I thought you unnecessarily invented the third pointer.  The third pointer is needed for a merge join (though not for a merge sort).

My apologies for accusing you of being artificial.  :)  Thanks for clearing up my misunderstanding.]]></description>
		<content:encoded><![CDATA[<p>Aargh&#8230; ok I see it now.</p>
<p>Many-to-many join on non-unique datasets&#8230; I wasn&#8217;t visualizing that quite right.  I think that merge sort and merge join algorithms got a little jumbled in my head.  It seemed &#8220;artificial&#8221; to me because I thought you unnecessarily invented the third pointer.  The third pointer is needed for a merge join (though not for a merge sort).</p>
<p>My apologies for accusing you of being artificial.  :)  Thanks for clearing up my misunderstanding.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Jonathan Lewis</title>
		<link>http://jonathanlewis.wordpress.com/2010/08/15/joins-mj/#comment-37418</link>
		<dc:creator><![CDATA[Jonathan Lewis]]></dc:creator>
		<pubDate>Wed, 29 Sep 2010 14:28:02 +0000</pubDate>
		<guid isPermaLink="false">http://jonathanlewis.wordpress.com/?p=4259#comment-37418</guid>
		<description><![CDATA[Jeremy,

In what way is the example artificial ?  I have rows which are non-unique in the first data set and some of those rows correspond to multiple rows in the second data set. The data is NOT artificial.

Your point about &quot;two pointers&quot; and the sequential nature of walking the data is valid - and it&#039;s the reason why we usually go to the expense of sorting and buffering the two sets of data.

But to do the merge join we need a third pointer that starts from the position dictated by the second pointer (when it is in position) and walks the data set until it comes to a row which no longer matches current item in the first data set. At that point, the first pointer will move forward, the second point may, or may not, move forward, and the third pointer has to be set to the latest value of the second pointer.


Irrespective of any thoughts about optimising the efficiency of the mechanism, though, the merge join at this point is still following the pattern: for each row in the first rowsource locate the matches in the second source.]]></description>
		<content:encoded><![CDATA[<p>Jeremy,</p>
<p>In what way is the example artificial ?  I have rows which are non-unique in the first data set and some of those rows correspond to multiple rows in the second data set. The data is NOT artificial.</p>
<p>Your point about &#8220;two pointers&#8221; and the sequential nature of walking the data is valid &#8211; and it&#8217;s the reason why we usually go to the expense of sorting and buffering the two sets of data.</p>
<p>But to do the merge join we need a third pointer that starts from the position dictated by the second pointer (when it is in position) and walks the data set until it comes to a row which no longer matches current item in the first data set. At that point, the first pointer will move forward, the second point may, or may not, move forward, and the third pointer has to be set to the latest value of the second pointer.</p>
<p>Irrespective of any thoughts about optimising the efficiency of the mechanism, though, the merge join at this point is still following the pattern: for each row in the first rowsource locate the matches in the second source.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Jeremy Schneider</title>
		<link>http://jonathanlewis.wordpress.com/2010/08/15/joins-mj/#comment-37412</link>
		<dc:creator><![CDATA[Jeremy Schneider]]></dc:creator>
		<pubDate>Tue, 28 Sep 2010 15:30:13 +0000</pubDate>
		<guid isPermaLink="false">http://jonathanlewis.wordpress.com/?p=4259#comment-37412</guid>
		<description><![CDATA[You claim here that a merge join is a form of nested loop join.  This is clear with the hash join, where the inner loop must check all rows that match a particular hash.  But I don&#039;t see it here.

Your example at the top of the post (scanning the second set) is artificial. A properly written merge join algorithm keeps a single pointer into both datasets and advances one or the other until either terminates. Certainly no looping is needed on the second dataset! The whole algorithm is a single loop - there shouldn&#039;t be any nesting.  Certainly when I designed and wrote merge joins for algorithms classes there weren&#039;t nested loops.

The caveat of course would be that both datasets are presorted.  And of course SORTING requires some for of nesting and can&#039;t beat O(n lg n).  But if the input sets are truly presorted then the join itself should not require a loop and should execute in O(n) time, right?]]></description>
		<content:encoded><![CDATA[<p>You claim here that a merge join is a form of nested loop join.  This is clear with the hash join, where the inner loop must check all rows that match a particular hash.  But I don&#8217;t see it here.</p>
<p>Your example at the top of the post (scanning the second set) is artificial. A properly written merge join algorithm keeps a single pointer into both datasets and advances one or the other until either terminates. Certainly no looping is needed on the second dataset! The whole algorithm is a single loop &#8211; there shouldn&#8217;t be any nesting.  Certainly when I designed and wrote merge joins for algorithms classes there weren&#8217;t nested loops.</p>
<p>The caveat of course would be that both datasets are presorted.  And of course SORTING requires some for of nesting and can&#8217;t beat O(n lg n).  But if the input sets are truly presorted then the join itself should not require a loop and should execute in O(n) time, right?</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Jonathan Lewis</title>
		<link>http://jonathanlewis.wordpress.com/2010/08/15/joins-mj/#comment-37052</link>
		<dc:creator><![CDATA[Jonathan Lewis]]></dc:creator>
		<pubDate>Thu, 26 Aug 2010 21:44:58 +0000</pubDate>
		<guid isPermaLink="false">http://jonathanlewis.wordpress.com/?p=4259#comment-37052</guid>
		<description><![CDATA[Henish,

It&#039;s certainly correct to say that there are cases where, for the same volume of data and same amount of memory used, the merge join CAN outperform the hash join for exactly these reasons.]]></description>
		<content:encoded><![CDATA[<p>Henish,</p>
<p>It&#8217;s certainly correct to say that there are cases where, for the same volume of data and same amount of memory used, the merge join CAN outperform the hash join for exactly these reasons.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: henish</title>
		<link>http://jonathanlewis.wordpress.com/2010/08/15/joins-mj/#comment-37050</link>
		<dc:creator><![CDATA[henish]]></dc:creator>
		<pubDate>Thu, 26 Aug 2010 20:26:46 +0000</pubDate>
		<guid isPermaLink="false">http://jonathanlewis.wordpress.com/?p=4259#comment-37050</guid>
		<description><![CDATA[Hello Sir,


Nice article..


&quot;If the sorted data sets for a merge join are too large to fit into memory Oracle dumps them to disc, but only reads through them (at most) once each&quot;

And in the last part of your article you show that if there is a suitable index exist on join column(s) of table(s) involve in join
optimizer tend to avoid sorting which eventually lead to save on memory and CPU cycles 


Is it correct to say that in such scenario merge join outperform compare to hash join? 


Thanks]]></description>
		<content:encoded><![CDATA[<p>Hello Sir,</p>
<p>Nice article..</p>
<p>&#8220;If the sorted data sets for a merge join are too large to fit into memory Oracle dumps them to disc, but only reads through them (at most) once each&#8221;</p>
<p>And in the last part of your article you show that if there is a suitable index exist on join column(s) of table(s) involve in join<br />
optimizer tend to avoid sorting which eventually lead to save on memory and CPU cycles </p>
<p>Is it correct to say that in such scenario merge join outperform compare to hash join? </p>
<p>Thanks</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Jonathan Lewis</title>
		<link>http://jonathanlewis.wordpress.com/2010/08/15/joins-mj/#comment-37017</link>
		<dc:creator><![CDATA[Jonathan Lewis]]></dc:creator>
		<pubDate>Thu, 19 Aug 2010 19:08:41 +0000</pubDate>
		<guid isPermaLink="false">http://jonathanlewis.wordpress.com/?p=4259#comment-37017</guid>
		<description><![CDATA[Kartik,

We (the designers/DBAs/programmers) do not have to create a hash table to get a hash join.  I introduced the real single table hash cluster in a nested loop as a way of showing the EFFECT of what Oracle does internally when it chooses to do a hash join.

The choice of hash join, merge join, or nested loop join is determined entirely by the optimizer based on its assessment of data volume, data scatter, and available machine resources.

I do not know if Cost Based Oracle Fundamentals is easily available in India.]]></description>
		<content:encoded><![CDATA[<p>Kartik,</p>
<p>We (the designers/DBAs/programmers) do not have to create a hash table to get a hash join.  I introduced the real single table hash cluster in a nested loop as a way of showing the EFFECT of what Oracle does internally when it chooses to do a hash join.</p>
<p>The choice of hash join, merge join, or nested loop join is determined entirely by the optimizer based on its assessment of data volume, data scatter, and available machine resources.</p>
<p>I do not know if Cost Based Oracle Fundamentals is easily available in India.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: kartik</title>
		<link>http://jonathanlewis.wordpress.com/2010/08/15/joins-mj/#comment-37013</link>
		<dc:creator><![CDATA[kartik]]></dc:creator>
		<pubDate>Thu, 19 Aug 2010 11:32:07 +0000</pubDate>
		<guid isPermaLink="false">http://jonathanlewis.wordpress.com/?p=4259#comment-37013</guid>
		<description><![CDATA[Hi Jonathan,

Thanks for this Article.

Just one doubt , the execution plan will decide to go for Merge Join depending upon the sql we write ? is this correct , unlike Hash join we need not create seperate Hash Tables i guess .. so please confirm if this determination is entirely based on the select statement plus the objects accessed?

Secondly the book you have mentioned , is it available in India? if yes then kindly let me know the publisher details as I am not getting in book stores here....

Thanks,
Kartik]]></description>
		<content:encoded><![CDATA[<p>Hi Jonathan,</p>
<p>Thanks for this Article.</p>
<p>Just one doubt , the execution plan will decide to go for Merge Join depending upon the sql we write ? is this correct , unlike Hash join we need not create seperate Hash Tables i guess .. so please confirm if this determination is entirely based on the select statement plus the objects accessed?</p>
<p>Secondly the book you have mentioned , is it available in India? if yes then kindly let me know the publisher details as I am not getting in book stores here&#8230;.</p>
<p>Thanks,<br />
Kartik</p>
]]></content:encoded>
	</item>
</channel>
</rss>
