A few weeks ago I wrote a note demonstrating the way in which Oracle’s strategy for hash partitioning is engineered to give an even data distribution when the number of partitions is a power of two. In one of the follow-up comments, Christo Kutrovsky pre-empted my planned follow-up by mentioning the hashing function ora_hash() that appeared in 10g.
This function takes up to three parameters: the input value (which can be of any base type), the maximum hash bucket value (the minimum value is implicitly zero) and a seed value (default zero) for the hash function. So, for example, ora_hash(‘fred’,1023) will return an integer value between zero and 1023.
The ora_hash() function is similar to the packaged function dbms_utility.get_hash_value({expression},{min},{max}) that appeared in 8i, although the older function only handles character expressions (using an implicit conversion for dates and numbers), requires you to specify the minimum and maximum bucket values, and doesn’t allow you to introduce a seed value.
In fact, for character inputs, the ora_hash() and get_hash_value() functions can be mapped to each other very easily by the relationship:
dbms_utility.get_hash_value(string,1,N) = 1 + ora_hash(string,N,0)
So, for example, we can run a little query like:
select
1 + ora_hash('fred',1023,0) ora_hash_plus_one,
dbms_utility.get_hash_value('fred',1,1024) get_hash_value
from
dual
/
ORA_HASH_PLUS_ONE GET_HASH_VALUE
----------------- --------------
904 904
One important feature of the ora_hash() function is that it seems to be the function used internally – with a zero seed – to determine which partition a row belongs to in a hash partitioned table. Of course, as I pointed out in the previous article, you have to tailor the use of the function to the number of partitions you have declared – rounding up to the nearest power of two (and subtracting one) for the “max bucket” input, then adding one to the result, then dropping the highest bit of the result if the number of partitions is not a power of two.
Here’s a little demonstration, based on a table with six partitions (i.e. poor choice of partition count), running under 11.1.0.6:
create table t1 ( id, v1 ) partition by hash(v1) partitions 6 as select rownum, object_name from all_objects where rownum <= 32768 ; execute dbms_stats.gather_table_stats(user,'t1') select partition_position, partition_name, num_rows from user_tab_partitions where table_name = 'T1' order by partition_position ; column partition_name new_value m_pt select partition_name from user_tab_partitions where table_name = 'T1' and partition_position = 3 ; select ora_hash(v1, 7) ora_hash, dbms_utility.get_hash_value(v1,1,8) get_hash_value, dbms_mview.pmarker(rowid) pmarker, count(*) from t1 partition( &m_pt ) group by ora_hash(v1, 7), dbms_utility.get_hash_value(v1,1,8), dbms_mview.pmarker(rowid) ;
You’ll see that I’ve collected stats on the table then reported each partition’s position, name, and number of rows . This is the result I got from that query:
PARTITION_POSITION PARTITION_NAME NUM_ROWS
------------------ -------------------- ----------
1 SYS_P608 3989
2 SYS_P609 4055
3 SYS_P610 8356
4 SYS_P611 8320
5 SYS_P612 4083
6 SYS_P613 3965
As you might expect from the previous article, two of my partitions (numbers 3 and 4) are twice the size of the rest. If I were to issue a ‘split partition’ call then partition 3 would split (roughly 50/50) into partitions 3 and 7, and on a second ‘split partition’ call partition 4 would split (again 50/50) into partitions 4 and 8.
To demonstrate the effect of the ora_hash() function and how it relates to hash partitioning I’ve then run a query that references partition 3 by name (using an earlier query to convert a partition position into a partition name).
You’ll notice that my call to ora_hash() uses the value seven for the maximum hash bucket – this is because (a) my ideal partition count is eight (the smallest power of two that is not less than six), and (b) although I want values between 1 and 8 the function will be generating values between zero and 7. Given the relationship between ora_hash() and dbms_utility.get_hash_value() – and given that I’m applying the functions to a character string – I’ve also called the get_hash_value() function to show that its behaviour really is consistent with ora_hash() across the entire data set. Here are the results from the final query:
ORA_HASH GET_HASH_VALUE PMARKER COUNT(*)
---------- -------------- ---------- ----------
2 3 86154 4194
6 7 86154 4162
You can see that both functions have split my set of values into two distinct sets in exactly the same way – values that would go into a new partition three on a “split partition”, and values that would go into partition seven. And the split is roughly 50/50 (to within 1%).
As a little bonus, I’ve also included a call to dbms_mview.pmarker() in the query. This was a function that Oracle introduced in 9i (I think) to allow more refresh options for materialized views on partitioned objects when you used partition maintenance operations on them. The function returns an identifier for the partition of a partitioned table that the rowed belongs to. (In fact all it is effectively doing is returning the data_object_id from dba_objects for the containing partition).
Once you’re aware of how ora_hash() works you may be able to find ways to take advantage of it. Occasionally I’ve been asked if it’s possible to work out which partition some data is going to go into – now you know that the answer is yes.
To be investigated: Does the ora_hash() function play any part in the distribution of data across hash clusters ? (Probably not – the manuals make some comments about rounding up the supplied number of hashkeys to the next larger prime number, which suggests that a different hashing algorithm is used.
To be confirmed: Does the ora_hash() function plan any part in parallel query distribution by hash ? One quick test suggests not – but the method of use in this case may be more subtle than the “obvious” guess that I used in my test.