Oracle Scratchpad

July 13, 2008

Sorted Hash Clusters

Filed under: Execution plans,Infrastructure,Oracle,trace files,Tuning — Jonathan Lewis @ 9:25 pm BST Jul 13,2008

[Forward to Part 2]

Towards the end of April, I published a note about manual optimisation,  and mentioned in one of the comments (#11) that as part of the discussion of the (slightly suspect) mechanism I had introduced I would eventually get around to talking about sorted hash clusters. So I’ve finally managed to make a start.

Before you read about sorted hash clusters, you may want to glance at some notes I posted on the OTN forum some time ago about single table hash clusters as this will give you the general flavour of the significance and point of hash clusters.

The description of sorted hash clusters in the 10g manuals is not very helpful – a comment from the Administrators’ Guide says: “Sorted hash clusters are new data structures that allow faster retrieval of data for applications where data is consumed in the order in which it was inserted.”

The first seven words are correct – the rest leave a lot to be desired. A section in the manual with the title “Creating a Sorted Hash Cluster” then says: “In a sorted hash cluster, the rows corresponding to each value of the hash function are sorted on a specified set of columns in ascending order, which can improve response time during subsequent operations on the clustered data.”

Again, this description leaves a lot to be desired – and is basically untrue. You would be forgiven for thinking that this meant the data you entered was somehow stored in sorted order – and there’s no clue about where the real strength of the sorted hash cluster lies.

So here’s a little script to create, load, and query a sorted hash cluster – with the query output at the end of the listing:

create cluster sorted_hash_cluster (
	hash_value	number,
	sort_value	varchar2(2)	sort
)
hashkeys 10
hash is hash_value
size 750
;

create table sorted_hash_table (
	hash_value	number,
	sort_value	varchar2(2),	--	sort,
	v1		varchar2(10),
	padding		varchar2(100)
)
cluster sorted_hash_cluster (
	hash_value, sort_value
)
;

-- collect statistics here

execute dbms_random.seed(0)

begin
	for i in 1..60 loop
		insert into sorted_hash_table values(
			trunc(dbms_random.value(0,10)),
			dbms_random.string('U',2),
			lpad(i,10),
			rpad('x',100)
		);
		commit;
	end loop;
end;
/

select
	hash_value, sort_value, rowid, v1
from
	sorted_hash_table
where
	hash_value = 1
;

HASH_VALUE SO ROWID              V1
---------- -- ------------------ ----------
         1 AJ AAATLoAAFAAAAMYAAA          9
           BM AAATLoAAFAAAAMYAAH         55
           LG AAATLoAAFAAAAMYAAD         36
           LX AAATLoAAFAAAAMYAAE         45
           NR AAATLoAAFAAAAMYAAC         17
           RI AAATLoAAFAAAAMYAAB         16
           UY AAATLoAAFAAAAMYAAF         50
           VU AAATLoAAFAAAAMYAAG         51

I’ve created the cluster object, specifying names for the clustering column (hash_value), and for a sort column (sort_value). I’ve also declared that the clustering column I reference will hold the value that should be used to tell Oracle where to store each row.

I’ve stated that there will be 10 distinct values for the hash key, and that the combined length of all the rows for a given hash key will be 750 bytes.  This allows Oracle to calculate, and pre-format,  the space needed for the cluster.

I’ve then defined a table, assigning it to the cluster. I’ve used the same names for the clustering and sorting columns in the table definition as I did in the cluster definition, but this isn’t necessary. You’ll also notice that I’ve put in the sort keyword as a comment by the sort_value column of the table definition – this keyword is optional.

The pl/sql loop then generates some randomised data – using 10 different values for the hash_value column and a random two-character string for the sort_value column. Despite suggestions from the manuals, the data doesn’t have to be inserted in sorted order, and it won’t be stored in sorted order.

The query requests all the data for a given hash_value and the output shows that it appears in sorted order with respect to sort_value – despite the absence of an order by clause! Here’s the execution plan for the query:

Execution Plan
----------------------------------------------------------
Plan hash value: 1981143083

----------------------------------------------------------------------------
| Id  | Operation         | Name              | Rows  | Bytes | Cost (%CPU)|
----------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |                   |     6 |   174 |     0   (0)|
|*  1 |  TABLE ACCESS HASH| SORTED_HASH_TABLE |     6 |   174 |            |
----------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - access("HASH_VALUE"=1)

As you can see, there is no sort operation in the plan. Yet the data has clearly appeared in sorted order. So what’s going on ? (Note – you get the same execution plan if you include the order by clause, even – contrary to the indications in the manual – if you make that an ‘order by sort_value desc’).

First, take a look at the view user_segments – you’ll find an odd index segment with a name like SYS_IQ0000078561$$. (The number embedded in the name is the object_id – or possibly the data_object_id – of the cluster). Ordinary hash clusters don’t have cluster-related indexes – but a sorted hash cluster has an index which lists the hash key values, although it doesn’t contain any block identifiers to show where that key is stored.

Now take a block dump of the cluster – and you’ll find that the data hasn’t honoured the sizing parameters the way an ordinary hash cluster would. Instead this cluster spreads across 20 blocks – with alternate blocks empty, and one hash key per block. (There are various ways that sorted hash clusters manage to waste, or lose, a lot of space).

Finally, check your session stats as you re-run the query – and you’ll find that Oracle records a sort operation that just happens to sort the number of rows that you’ve just queried.

In effect, this query has simply accessed the correct block to find all the data – which it has then sorted without telling you that it’s doing a sort. So, for this type of query, the saving over a simple hash cluster is non-existent, and the overheads (which may be due to bugs, rather than deliberate design decisions) are significant.

But, looking on the bright side, there is something very interesting (but undocumented) that a sorted hash cluster can do that might make the overheads worthwhile – and I’ll tell you about the real poiwer of the sorted hash cluster in the next installment of this series.

[Forward to Part 2]

11 Comments »

  1. What?

    We must wait the next one?

    But I want to know it now! ^_^

    Great article as always.

    A very little error: on the first create cluster, the create cluster sorted_hash_cluster one, you miss a ‘–‘ before the word “sort”

    Comment by lascoltodelvenerdi — July 17, 2008 @ 12:06 pm BST Jul 17,2008 | Reply

  2. lascoltodelvenerdi,

    Sorry, finding the time to write this stuff can be a little difficult. But I’m going to be spending about 12 hours on trains with Wifi next week – so the wait might not be too long.

    The sort is required for the cluster but is optional for the table – that’s why I’ve commented it only for the table.

    Comment by Jonathan Lewis — July 17, 2008 @ 1:27 pm BST Jul 17,2008 | Reply

  3. @Jonathan

    Don’t worry, I was joking.

    I also run a little blog and I know that life+work+study+blog can be equal to less spare time… :(

    You are doing a very interesting work!
    Thumbs up! :lol:

    Comment by lascoltodelvenerdi — July 18, 2008 @ 9:31 am BST Jul 18,2008 | Reply

  4. If hash+sort columns is primary key, a typical unique index on those columns does not appear to exist to enforce this pk…

    Comment by vlad — July 19, 2008 @ 5:01 pm BST Jul 19,2008 | Reply

  5. Vlad,

    I’m not sure I understand your comment. In my example (hash_value, sort_value) is not intended to be the primary key. If I add a constraint to make it the primary key I would expect to see (and did in a quick test on 10.2.0.1) a unique index created on those two columns.

    Comment by Jonathan Lewis — July 19, 2008 @ 9:45 pm BST Jul 19,2008 | Reply

  6. Jonathan,

    My comment was related to your conclusion that there was something very specific about sorted hash clusters.
    That reminded me of something I discovered a long time ago.
    But was I wrong?
    Do you have an index when you create the primary key constraint at the same time as the table?
    I do not see any index created with the following script:

    drop table sorted_hash_table
    ;
    select count(*) from dba_indexes
    ;
    create table sorted_hash_table (
    	hash_value	number,
    	sort_value	varchar2(2),	--	sort,
    	v1		varchar2(10),
    	padding		varchar2(100),
    constraint sht_pk primary key(hash_value,sort_value)
    )
    cluster sorted_hash_cluster (
    	hash_value, sort_value
    )
    ;
    select count(*) from dba_indexes
    ;
    

    The user_constraints view shows an index named “_NEXT_OBJECT” which must be a weakness of the view definition.

    Comment by vlad — July 20, 2008 @ 1:35 am BST Jul 20,2008 | Reply

    • Vlad,

      I missed this posting first time around – sorry about that.

      I’ve just run a script like yours on 10.2.0.3 and 11.1.0.6 and Oracle doesn’t create a new index to protect the primary key.

      It looks like it’s using the system-generated index on the hash key to support the primary key – possibly on the assumption that it should be quick enough to check for a duplicate by using the basic mechanisms of handling sorted hash clusters.

      Comment by Jonathan Lewis — July 10, 2009 @ 12:50 pm BST Jul 10,2009 | Reply

  7. [...] Sorted Hash Clusters – 2 Filed under: Uncategorized — Jonathan Lewis @ 7:16 am UTC Jul 22,2008 [Back to Part 1] [...]

    Pingback by Sorted Hash Clusters - 2 « Oracle Scratchpad — July 22, 2008 @ 7:17 am BST Jul 22,2008 | Reply

  8. [...] Hash Clusters – 1: also references a link on OTN where I’ve made a couple of comments Sorted Hash Clusters: An introduction to a relatively new, interesting, but (at the time) buggy feature Sorted Hash [...]

    Pingback by Hash Clusters – 2 « Oracle Scratchpad — October 5, 2009 @ 9:17 pm BST Oct 5,2009 | Reply

  9. [...] tool that he likes to use for security audits. Next, Jonathan Lewis gives a hard-hitting expose on sorted hash clusters, and it doesn’t look good for [...]

    Pingback by Log Buffer #106: A Carnival of the Vanities for DBAs — February 13, 2013 @ 4:11 pm BST Feb 13,2013 | Reply

  10. […] Hash Clusters have been around for several years, but I’ve not yet seen them being used, or even investigated in detail. This is a bit of a […]

    Pingback by Sorted Hash Clusters RIP | Oracle Scratchpad — August 28, 2013 @ 8:22 am BST Aug 28,2013 | Reply


RSS feed for comments on this post. TrackBack URI

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Theme: Rubric. Get a free blog at WordPress.com

Follow

Get every new post delivered to your Inbox.

Join 4,266 other followers