Here’s a little-known feature that has been around since at least Oracle 10, though I don’t think I had ever seen it in the wild until today when someone reported on the ODC (OTN) database forum that they had a problem getting repeatable results. It’s always possible, of course, that failure to get repeatable results is the natural consequence of running queries against a multi-user system, but if we assume that this was not the cause in this case we have to ask why a special hashing function that Oracle supplies to allow you to check that a set of data hasn’t changed gives you different results when “the data hasn’t changed”.
I’m talking about the function dbms_sqlhash.gethash() – a packaged function that exists in the SYS schema and isn’t usually exposed to other users. The function takes as its inputs the text of query, a selected hashing function, and a “chunk” size. It will run the query, and use the hashing function to return a single, 16 – 64 byte, hash value representing the entire result set. Here’s an example of usage:
begin dbms_output.put_line( dbms_sqlhash.gethash( sqltext => 'select n1, d1 from t1 where id > 0', digest_type => dbms_crypto.hash_md5 -- chunk_size => 128*1048576 -- default 128MB ) ); end; / 6496D2438FECA960B1E916BF8C4BADCA
I haven’t specified a chunk size – the default is 128MB – and Oracle will hash this much of the result set in a single pass. If the result set is larger than this Oracle will hash each chunk in turn then generate a hash of the hash values. (This means, by the way, that changing the chunk size can change the hash value for large data sets).
There are 6 possible digest types in 12.1.0.2 (listed in the $ORACLE_HOME/rdbms/admin/dbmsobtk.sql script that creates the dbms_crypto package – so you will need the execute privilege on both dbms_sqlhash and dbms_crypto to use the function if you want to code with symbolic constants):
rem HASH_MD4 CONSTANT PLS_INTEGER := 1; rem HASH_MD5 CONSTANT PLS_INTEGER := 2; rem HASH_SH1 CONSTANT PLS_INTEGER := 3; rem HASH_SH256 CONSTANT PLS_INTEGER := 4; rem HASH_SH384 CONSTANT PLS_INTEGER := 5; rem HASH_SH512 CONSTANT PLS_INTEGER := 6;
Let’s put the whole thing into a demonstration that will allow us to see an important point – you have to be careful with your query:
rem rem Script: gethash.sql rem Author: Jonathan Lewis rem Dated: Feb 2016 rem execute dbms_random.seed(0) create table t1 nologging as select 1e4 - rownum id, trunc(dbms_random.value(0,100)) n1, trunc(sysdate) d1, lpad('x',100,'x') padding from dual connect by level <= 1e4 -- > comment to avoid WordPress format issue ; alter table t1 add constraint t1_pk primary key (id); begin dbms_stats.gather_table_stats( ownname => user, tabname => 'T1', method_opt => 'for all columns size 1' ); end; / set feedback off alter system flush shared_pool; alter session set optimizer_mode = first_rows_1; begin dbms_output.put_line( dbms_sqlhash.gethash( sqltext => 'select n1, d1 from t1 where id > 0', digest_type => dbms_crypto.hash_md5 -- chunk_size => 128*1048576 -- default 128MB ) ); end; / alter system flush shared_pool; alter session set optimizer_mode = all_rows; begin dbms_output.put_line( dbms_sqlhash.gethash( sqltext => 'select n1, d1 from t1 where id > 0', digest_type => dbms_crypto.hash_md5 -- chunk_size => 128*1048576 -- default 128MB ) ); end; / alter system flush shared_pool; alter session set nls_date_format='dd-mon-yyyy hh24:mi:ss'; begin dbms_output.put_line( dbms_sqlhash.gethash( sqltext => 'select n1, d1 from t1 where id > 0', digest_type => dbms_crypto.hash_md5 -- chunk_size => 128*1048576 -- default 128MB ) ); end; / alter session set nls_date_format='DD-MON-RR';
I’ve created a data set, added a primary key, and gathered stats, then I’ve called the same hashing function on the same sql statement three times in a row. However, I’ve changed the session environment for each call – in the first case I’ve set the optimizer to “first rows(1)” optimization, then I’ve set the optimizer back to all_rows, then I’ve changed the nls_date_format from its default of “DD-MON-RR” (and that’s significant because I’ve got a date column in my query). Here are the output from running the script:
Table created. Table altered. PL/SQL procedure successfully completed. 6496D2438FECA960B1E916BF8C4BADCA D41D4A2945D0B89A6C5DEB5060189A54 ECC3D2B66CB61821397CD9BD983FD5F4
The query has to return the same data content in all three cases – but the hash value is different in the three cases. The change in the optimizer mode has affected the order in which the data was returned (with first_rows(1) Oracle did a full scan of the primary key index, with all_rows it did a tablescan and sort); the change in the nls_XXX parameter meant the internal representation of the data changed.
You have to be very careful with dbms_sqlhash every time you use it if you want the same data set to produce the same result. First, to be safe, you need to ensure that you always use the same NLS parameters when using the function; then you need to have an “order by” clause in the query, and the columns used in the order by clause need to a possible candidate key (i.e. unique, not null) for the data otherwise a change in the optimizer parameters, or the object stats, could result in a change in execution plan with an ensuing change in the actual order of the data and a different hash value.
Comments and related questions are welcome.