Oracle Scratchpad

October 23, 2008

Manual Optimisation 3

Filed under: Execution plans,Hints,Oracle,Performance,sorting,Tuning — Jonathan Lewis @ 6:38 pm GMT Oct 23,2008

[Back to Manual Optimisation part 2]

This little series started from a note I wrote about manual optimisation where I took advantage of a sort operation in a non-mergeable view to produce sorted data from a final nested loop join without including an “order by” that would have produced a large sort operation.

In fact, as I showed in a follow-up post, this was taking a convenient pagination mechanism to an extreme – and you might decide (with good reason, as Tom Kyte did) that it was an extreme that should not be used.

Interestingly, though, the optimizer already uses exactly this technology in sorted hash clusters – in fact, it’s this very specific trick that makes the sorted hash cluster particularly useful. I showed in another series of notes how a simple query against a single table in a sorted hash cluster minimised the volume of data sorted – in this note I’ll bring together the “Sorted Hash Cluster” series and the “Manual Optimisation” series to show you that the mechanism is much smarter than that.

Consider the following data set:

create cluster sorted_hash_cluster (
	hash_value	number,
	sort_value	number		sort
)
hashkeys 1000
hash is hash_value
size 2000
;

create table sorted_hash_table (
	hash_value	number		not null,
	sort_value	number		not null,
	v1		varchar2(10),
	padding		varchar2(75)
)
cluster sorted_hash_cluster (
	hash_value, sort_value
)
;

begin
	for i in 1..20 loop
		insert into sorted_hash_table values(
			1,
			trunc(dbms_random.value(1,100)),
			lpad(i,10),
			rpad('x',75)
		);
		commit;
	end loop;
end;
/

create table t1
as
select
	rownum		id,
	lpad(rownum,10)	v1,
	rpad('x',1000)	padding
from
	all_objects
where
	rownum <= 3000
;

create unique index t1_i1 on t1(id);

I’ve created a table with 20 rows for a single hash key in a sorted hash cluster. I now execute a query that joins the hashed table with a normal heap table, using a nested loop join with indexed access.

The query will return 20 rows, and the length of each row is about 1KB. I’ve include an “order by” clause in my query that matches the sort key of the sorted hash cluster.

Here’s the query, and its execution plan:

alter session set events '10032 trace name context forever';

select
	/*+
		leading(ht t1)
		use_nl(t1)
	*/
	ht.hash_value,
	ht.sort_value,
	ht.v1,
	t1.id,
	t1.v1,
	t1.padding
from
	sorted_hash_table	ht,
	t1
where
	ht.hash_value = 1
and	t1.id = ht.sort_value
order by
	ht.sort_value
;

------------------------------------------------------------------------------
| Id| Operation                    | Name              | Rows | Bytes | Cost |
------------------------------------------------------------------------------
|  0| SELECT STATEMENT             |                   |   20 | 20640 |   20 |
|  1|  NESTED LOOPS                |                   |   20 | 20640 |   20 |
|* 2|   TABLE ACCESS HASH          | SORTED_HASH_TABLE |   20 |   340 |      |
|  3|   TABLE ACCESS BY INDEX ROWID| T1                |    1 |  1015 |    1 |
|* 4|    INDEX UNIQUE SCAN         | T1_I1             |    1 |       |      |
------------------------------------------------------------------------------

The plan prediction of 20KB of data is correct, and the data comes out in the correct sorted order – even though there is no sort operation in the plan.

On the other hand, by enabling the 10032 trace I can see that, despite the absence of a sort operation, the session has sorted 20 rows – but the volume of data sorted does not match the 20KB that the query actually produced.

So how do we sort the data without sorting all the data ?

As I showed in “Sorted Hash Clusters 2″, a single-table query can access the rowids and sort values from a hash cluster, sort that data set, then use the rowids to revisit the hash cluster to pick up the remainder of each row. In fact, though I didn’t mention it in the previous article, when you create a sorted hash cluster, you create an “invisible” index on the cluster key that allows the optimizer to treat this particular path as an indexed access path.

The example above simply takes that process one step further. The optimizer knows that the data is coming out of the hash cluster in an order that matches the “order by” clause, so it can choose an execution plan that takes advantage of this ordering to avoid sorting the final result set.

In fact, this is nothing new: the optimizer has been able to use an index-driven path to avoid a sort for many years – and the sorted hash cluster path is just a special case of that “*** Recost for ORDER BY ***” functionality.

So, when you see the trick in “Manual Optimisation 2″, you’re just looking at something that Oracle already does internally – so is there any reason why we should not copy the optimizer ? Yes – the optimizer knows what’s going on because the optimizer code is self-consistent; but when we try to trick the optimizer, it doesn’t know eveything about what’s going on – and one day the optimizer code may be enhanced to introduce an internal optimisation that destroys our ordering.

If course, it would be nice if Oracle Corp. gave us a hint that allowed us to say: “trust me, this data is sorted” – possibly through some enhancement of the /*+ no_eliminate_oby */ hint that appeared in 10g. It could be very useful in non-mergeable views (or pipelined functions – which is what Kerry wanted in this comment of the original note.)

Until then – the manual optimisation I showed you for pagination is safe, because we can afford to include an “order by” clause on the small amount of data we return to the end user; but you have to remain cautious about the large-scale query which didn’t include an “order by”.

[Back to Manual Optimisation part 2]

2 Comments »

  1. […] Manual Optimisation – 2 Filed under: Performance, Tuning — Jonathan Lewis @ 1:13 pm UTC May 9,2008 [Forward to Manual Optimisation part 3] […]

    Pingback by Manual Optimisation - 2 « Oracle Scratchpad — October 27, 2008 @ 1:01 pm GMT Oct 27,2008 | Reply

  2. […] Sorted Hash Clusters – 2 Filed under: Infrastructure, Performance — Jonathan Lewis @ 7:16 am UTC Jul 22,2008 [Back to part 1][Forward to part 3] […]

    Pingback by Sorted Hash Clusters – 2 « Oracle Scratchpad — July 10, 2009 @ 1:57 pm GMT Jul 10,2009 | Reply


RSS feed for comments on this post. TrackBack URI

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

The Rubric Theme. Blog at WordPress.com.

Follow

Get every new post delivered to your Inbox.

Join 4,556 other followers