Oracle Scratchpad

October 23, 2008

Manual Optimisation 3

Filed under: Execution plans,Hints,Oracle,Performance,sorting,Tuning — Jonathan Lewis @ 6:38 pm BST Oct 23,2008

[Back to Manual Optimisation part 2] [Back to Sorted Hash Clusters 2]

This little series started from a note I wrote about manual optimisation where I took advantage of a sort operation in a non-mergeable inline view to produce sorted data from a final nested loop join without having an “order by” clause in the main query block that would have required a large sort operation.

In fact, as I showed in a follow-up post, this was taking a convenient pagination mechanism to an extreme and you might decide (with good reason, as Tom Kyte did) that it was an extreme that should not be used.

Interestingly, though, the optimizer already uses exactly this technology with sorted hash clusters. In fact it’s this very specific trick that makes the sorted hash cluster particularly useful. I showed in another series of notes how a simple query against a single table in a sorted hash cluster minimised the volume of data sorted; in this note I’ll bring together the “Sorted Hash Cluster” series and the “Manual Optimisation” series to show you that the mechanism is much smarter than the simple case suggests.

Consider the following data set:

rem
rem     Script:         c_sorted_hash_2.sql
rem     Author:         Jonathan Lewis
rem     Dated:          Feb 2008
rem

create cluster sorted_hash_cluster (
        hash_value      number,
        sort_value      number          sort
)
hashkeys 1000
hash is hash_value
size 2000
;

create table sorted_hash_table (
        hash_value      number          not null,
        sort_value      number          not null,
        v1              varchar2(10),
        padding         varchar2(75)
)
cluster sorted_hash_cluster (
        hash_value, sort_value
)
;

begin
        for i in 1..20 loop
                insert into sorted_hash_table values(
                        1,
                        trunc(dbms_random.value(1,100)),
                        lpad(i,10),
                        rpad('x',75)
                );
                commit;
        end loop;
end;
/

create table t1
as
select
        rownum          id,
        lpad(rownum,10) v1,
        rpad('x',1000)  padding
from
        all_objects
where
        rownum <= 3000
;

create unique index t1_i1 on t1(id);

I’ve created a table with 20 rows for a single hash key in a sorted hash cluster. I now execute a query that joins the hash clustered table with a normal heap table, using a nested loop join with indexed access.

The query will return 20 rows and the length of each row is about 1KB. I’ve included an “order by” clause in my query that matches the sort key of the sorted hash cluster.

Here’s the query, and its execution plan:

alter session set events '10032 trace name context forever';

select
        /*+
                leading(ht t1)
                use_nl(t1)
        */
        ht.hash_value,
        ht.sort_value,
        ht.v1,
        t1.id,
        t1.v1,
        t1.padding
from
        sorted_hash_table       ht,
        t1
where
        ht.hash_value = 1
and     t1.id = ht.sort_value
order by
        ht.sort_value
;


------------------------------------------------------------------------------
| Id| Operation                    | Name              | Rows | Bytes | Cost |
------------------------------------------------------------------------------
|  0| SELECT STATEMENT             |                   |   20 | 20640 |   20 |
|  1|  NESTED LOOPS                |                   |   20 | 20640 |   20 |
|* 2|   TABLE ACCESS HASH          | SORTED_HASH_TABLE |   20 |   340 |      |
|  3|   TABLE ACCESS BY INDEX ROWID| T1                |    1 |  1015 |    1 |
|* 4|    INDEX UNIQUE SCAN         | T1_I1             |    1 |       |      |
------------------------------------------------------------------------------

The plan prediction of 20KB of data is correct and the data comes out in the correct sorted order even though there is no sort operation in the execution plan.

On the other hand, by enabling the 10032 (sort) trace I can see that, despite the absence of a sort operation, the session has sorted 20 rows though the volume of data sorted does not match the 20KB that the query actually produced.

So how do we sort the data without sorting all the data ?

As I showed in “Sorted Hash Clusters 2”, a single-table query can access the rowids and sort-key values from a sorted hash cluster, sort the data set, then use the rowids to revisit the hash cluster to pick up the remainder of each row. In fact, though I didn’t mention it in the previous article, when you create a sorted hash cluster, you create (logically, but not physically) an “invisible” index on the cluster key that allows the optimizer to treat this particular path as an indexed access path.

The example above simply takes that process one step further. The optimizer knows that the data is coming out of the hash cluster in an order that matches the “order by” clause, so it can choose an execution plan that takes advantage of this ordering to avoid sorting the final result set.

In fact, this is nothing new: the optimizer has been able to use an index-driven path to avoid a sort for many years and the sorted hash cluster path is just a special case of that “*** Recost for ORDER BY ***” functionality.

When you see the trick in “Manual Optimisation 2”, you’re just looking at something that Oracle already does internally, so you could ask yourself whether there’s any reason why we should not copy the optimizer. The answer is yes. The optimizer knows what’s going on because the optimizer code is self-consistent, but when we try to trick the optimizer it doesn’t know eveything about what’s going on, and one day the optimizer code may be enhanced to introduce an internal optimisation that accidentally destroys our ordering.

Of course it would be nice if Oracle Corp. gave us a hint that allowed us to say: “trust me, this data is sorted” – possibly through some enhancement of the /*+ no_eliminate_oby */ hint that appeared in 10g. It could be very useful in non-mergeable views (or pipelined functions – which is what Kerry wanted in this comment of the original note.)

Until then – the manual optimisation I showed you for pagination is safe, because we can afford to include an “order by” clause on the small amount of data we return to the end user; but you have to remain cautious about the large-scale query which didn’t include an “order by”.

[Back to Manual Optimisation part 2] [Back to Sorted Hash Clusters 2]

 

4 Comments »

  1. […] Manual Optimisation – 2 Filed under: Performance, Tuning — Jonathan Lewis @ 1:13 pm UTC May 9,2008 [Forward to Manual Optimisation part 3] […]

    Pingback by Manual Optimisation - 2 « Oracle Scratchpad — October 27, 2008 @ 1:01 pm GMT Oct 27,2008 | Reply

  2. […] Sorted Hash Clusters – 2 Filed under: Infrastructure, Performance — Jonathan Lewis @ 7:16 am UTC Jul 22,2008 [Back to part 1][Forward to part 3] […]

    Pingback by Sorted Hash Clusters – 2 « Oracle Scratchpad — July 10, 2009 @ 1:57 pm BST Jul 10,2009 | Reply

  3. […] Unfortunately you shouldn’t use this particular example in a production system because it relies on the data appearing in the right order without having an “order by” clause. This type of thing makes me really keen to have a hint that says something like: /*+ qb_name(my_driver) assume_ordered(@my_driver) */ so that you could tell the optimizer that it can assume that the rowset from a given query block will appear in the order of the final “order by” clause as it does, for example, with sorted hash clusters. […]

    Pingback by Manual Optimisation | Oracle Scratchpad — January 3, 2015 @ 11:33 am GMT Jan 3,2015 | Reply

  4. […] Manual Optimisation – 3: […]

    Pingback by Hash Clusters – 2 | Oracle Scratchpad — July 6, 2020 @ 11:08 am BST Jul 6,2020 | Reply


RSS feed for comments on this post. TrackBack URI

Comments and related questions are welcome.

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Website Powered by WordPress.com.