Oracle Scratchpad

November 29, 2006

low_value / high_value

Filed under: CBO,Statistics,Troubleshooting — Jonathan Lewis @ 12:01 pm BST Nov 29,2006

If you take a look at view user_tab_columns (or equivalent) you will find that its definition includes columns low_value and high_value, which give the currently known low and high value for the data. For example:

 
select 
	column_name, low_value, high_value 
from 
	user_tab_columns 
where 
	table_name  = 'T1' 
and	column_name = 'SKEW' 
;         

COLUMN_NAME          LOW_VALUE       HIGH_VALUE 
-------------------- --------------- --------------- 
SKEW                 C102            C151       

This is fine, of course, if you are good at translating internal forms of data into human-readable forms.

For those of us less gifted, the dbms_stats package gives us some help with a set of procedures with names like convert_raw_value(). There are three overloaded functions to handle varchar2, date, and number types (with binary_float and binary_double appearing as new options in 10g), and two special cases: convert_raw_value_nvarchar() and convert_raw_value_rowid(). The procedures all take two parameters – the first in an input which is the raw value to be converted, the second is an output which has to be a variable of the correct type, so, for example:

 
SQL> declare 
  2     m_n     number; 
  3  begin 
  4     dbms_stats.convert_raw_value('C102',m_n); 
  5     dbms_output.put_line('Result: ' || m_n); 
  6  end; 
  7  . 
SQL> / 
Result: 1        

For convenience, you can wrap calls to the procedures inside functions, which then makes them available for use in SQL.

The following example is just to get you started (the column called skew is one I know to be a number type) – you would probably want to create a package to hold several different functions, and declare the functions as deterministic.

create or replace function raw_to_num(i_raw raw) 
return number 
as 
	m_n number; 
begin 
	dbms_stats.convert_raw_value(i_raw,m_n); 
	return m_n; 
end; 
/     

select 
	column_name, 
	raw_to_num(low_value)		low_val, 
	raw_to_num(high_value)		high_val 
from 
	user_tab_columns 
where 
	table_name  = 'T1' 
and	column_name = 'SKEW' 
;     

COLUMN_NAME             LOW_VAL   HIGH_VAL 
-------------------- ---------- ---------- 
SKEW                          1         80     

To finish off, you would probably want to query the user_tab_columns (or related) views with a decode() on the data_type to decide which of the functions to use, and then perform an explicit to_char() on the result of the decode(). But that task is left as an exercise for the reader.

Addendum: If you read the comments you will see that some of them have taken up the exercise and supplied some answers.

21 Comments »

  1. create or replace function raw_to_num(i_raw raw)
    return number
    as
    	m_n number;
    begin
    	dbms_stats.convert_raw_value(i_raw,m_n);
    	return m_n;
    end;
    /  
    
    create or replace function raw_to_date(i_raw raw)
    return date
    as
    	m_n date;
    begin
    	dbms_stats.convert_raw_value(i_raw,m_n);
    	return m_n;
    end;
    /  
    
    create or replace function raw_to_varchar2(i_raw raw)
    return varchar2
    as
    	m_n varchar2(20);
    begin
    	dbms_stats.convert_raw_value(i_raw,m_n);
    	return m_n;
    end;
    / 
    
    select 
            column_name,
            decode(data_type,
                    'VARCHAR2',to_char(raw_to_varchar2(low_value)),
                    'DATE',to_char(raw_to_date(low_value)),
                    'NUMBER',to_char(raw_to_num(low_value))
            ) low_value,
            decode(data_type,
                    'VARCHAR2',to_char(raw_to_varchar2(high_value)),
                    'DATE',to_char(raw_to_date(high_value)),
                    'NUMBER',to_char(raw_to_num(high_value))
            ) high_value
    from user_tab_columns 
    where table_name='';
    

    Exercise done.- reader

    Comment by anonymous — November 29, 2006 @ 8:02 pm BST Nov 29,2006 | Reply

  2. Seems funny that Oracle should put these procedures in the DBMS_STATS package when the UTL_RAW package has a bunch of casting functions for RAW values (CAST_TO_BINARY_DOUBLE, CAST_TO_BINARY_FLOAT, CAST_TO_BINARY_INTEGER, CAST_TO_NUMBER, CAST_TO_NVARCHAR2, CAST_TO_VARCHAR2). Note the absence of a CAST_TO_DATE function.

    Cheers

    Tim…

    Comment by Tim Hall — November 30, 2006 @ 8:01 am BST Nov 30,2006 | Reply

  3. We have found out dependence of LOW_VALUE, HIGH_VALUE from parameters NLS_COMP, NLS_SORT. However DBMS_STATS documentation is not present the description of such behaviour…

    Comment by Pavel Kozlov — June 7, 2007 @ 8:10 am BST Jun 7,2007 | Reply

  4. I am attempting to change the statistics value of high_value with the help of dbms_stats.set_column_stats. For some reason I struggle to find a function to convert date to raw. Am I looking in the wrong place or am I missing something?

    Comment by Jan Rod — August 24, 2009 @ 7:13 pm BST Aug 24,2009 | Reply

    • Jan,

      you don’t have to convert DATE to RAW when setting low/high value. Just let the dbms_stats.prepare_column_stats do the dirty work for you. I hope there are no bug in my EXAMPLE :)

      create or replace function raw_to_date(i_raw raw)
      return date
      as
      m_n date;
      begin
      dbms_stats.convert_raw_value(i_raw,m_n);
      return m_n;
      end;
      / 
      
      ALTER SESSION SET NLS_DATE_FORMAT='YYYY.MM.DD';
      
      CREATE TABLE t_d1 (d_date DATE);
      
      INSERT INTO t_d1 VALUES(TO_DATE('2009.08.01', 'YYYY.MM.DD'));
      INSERT INTO t_d1 VALUES(TO_DATE('2009.08.31', 'YYYY.MM.DD'));
      COMMIT;
      
      --gather_stats
      BEGIN
         DBMS_STATS.gather_table_stats (user,
                                        'T_D1',
                                        estimate_percent => NULL,
                                        method_opt => 'FOR ALL COLUMNS SIZE 1',
                                        CASCADE => TRUE,
                                        FORCE => TRUE
                                       );
      END;
      /
      
      --show low/high val
      SELECT low_value, raw_to_date(low_value), high_value, raw_to_date(high_value)
      FROM user_tab_col_statistics
      WHERE table_name = 'T_D1' AND column_name = 'D_DATE'
      /
      
      /*  !!!RESULT!!!
      
      LOW_VALUE            RAW_TO_DAT HIGH_VALUE           RAW_TO_DAT
      -------------------- ---------- -------------------- ----------
      786D0801010101       2009.08.01 786D081F010101       2009.08.31
      */
      
      
      --increase high value by ONE MONTH
      
      DECLARE
         n_distcnt          NUMBER;
         n_density          NUMBER;
         n_nullcnt          NUMBER;
         rec_srec           DBMS_STATS.statrec;
         datevals           DBMS_STATS.DATEARRAY;
         n_avgclen          NUMBER;
         d_low              DATE;
         d_high             DATE;
      BEGIN
         DBMS_STATS.get_column_stats (USER,
                                      'T_D1',
                                      'D_DATE',
                                      distcnt      => n_distcnt,
                                      density      => n_density,
                                      nullcnt      => n_nullcnt,
                                      srec         => rec_srec,
                                      avgclen      => n_avgclen
                                     );
         
         --handles just the case when HISTOGRAMS are not set -> rec_srec.epc = 2
         IF rec_srec.epc = 2 THEN
           SELECT raw_to_date(low_value), raw_to_date(high_value) INTO d_low, d_high
              FROM user_tab_col_statistics
            WHERE table_name = 'T_D1' AND column_name = 'D_DATE';
           
           d_high := ADD_MONTHS(d_high, 1);
           datevals := DBMS_STATS.DATEARRAY(d_low, d_high);
           rec_srec.minval:=NULL;
           rec_srec.maxval:=NULL;
           rec_srec.bkvals:=NULL;
           rec_srec.novals:=NULL;
           
           --this procedure will set epc.minval, epc.maxval etc in INTERNAL FORMAT
           DBMS_STATS.PREPARE_COLUMN_VALUES(rec_srec, datevals);
           
           --and then just set statistics
           DBMS_STATS.set_column_stats (USER,
                                        'T_D1',
                                        'D_DATE',
                                        distcnt      => n_distcnt,
                                        density      => n_density,
                                        nullcnt      => n_nullcnt,
                                        srec         => rec_srec,
                                        avgclen      => n_avgclen
                                        );
         END IF;    
      END;
      /
      
      --show low/high val , HIGH_VAL IS increased
      SELECT low_value, raw_to_date(low_value), high_value, raw_to_date(high_value)
      FROM user_tab_col_statistics
      WHERE table_name = 'T_D1' AND column_name = 'D_DATE'
      /
      
      
      /*  !!!RESULT!!!
      LOW_VALUE            RAW_TO_DAT HIGH_VALUE           RAW_TO_DAT
      -------------------- ---------- -------------------- ----------
      786D0801010101       2009.08.01 786D091E010101       2009.09.30
      */
      

      Comment by Pavol Babel — August 24, 2009 @ 8:42 pm BST Aug 24,2009 | Reply

      • Can you please tel me what &gt is used for and how ? what do i enter when i get prompt ? thanks

        Comment by Jim — January 20, 2012 @ 8:47 pm BST Jan 20,2012 | Reply

        • This is a side effect of wordpress and it’s handling of the “greater than” sign. In some cases it comes out as “>”. I’ve fixed it in the example above.

          Comment by Jonathan Lewis — January 21, 2012 @ 5:04 pm BST Jan 21,2012

        • thanx jonathan for repairing my example. As far I can remember it had been written before wordpress started to support sourcecode tag (so formating was poor). You have obviously fixed it layter, but &gt remained :) Now it seems to be 100% correct.

          Comment by Pavol Babel — January 21, 2012 @ 10:57 pm BST Jan 21,2012

    • BTW, you could set the RAW value directly into srec.maxval . But you have to write your own RAW TO DATE function.

      RAW representation of DATE in user_tab_columns follows the OCI 7BYTE representation of DATE. You can read the OCI Docmuentation to understand it.

      select d_date,  dump(d_date) dump_date FROM t_d1
      /
      
      D_DATE     DUMP_DATE
      ---------- ------------------------------------
      2009.08.01 Typ=12 Len=7: 120,109,8,1,1,1,1
      2009.08.31 Typ=12 Len=7: 120,109,8,31,1,1,1
      

      If you convert each number to HEXA and concat them, you’ll get RAW from low/high_val from user_tab_columns/user_tab_col_statistics

      Comment by Pavol Babel — August 24, 2009 @ 8:56 pm BST Aug 24,2009 | Reply

  5. I am having problem setting the high value. Does it have to do with histograms already being set ? How can i set it then, i f that is the case. My rec_srec.epic value is 253
    Thanks

    Comment by Jim — June 10, 2014 @ 9:46 pm BST Jun 10,2014 | Reply

    • Jim,

      If you’re using the sample in an earlier comment to set the high value then the pre-existence of a histogram probably is your problem. Since your rec_srec.epc = 253 you can modify the code by:
      a) remove the if .. end if check on rec_srec.epc
      b) add a line after the “datevals :=” line setting rec_srec.epc:= 2;

      The epc (end point count) is the number of elements in the array, if the value doesn’t match the actual array you get an error whose number I forget.

      For further examples on using set_column_stats see: http://jonathanlewis.wordpress.com/2010/03/23/fake-histograms/

      Comment by Jonathan Lewis — June 11, 2014 @ 10:14 am BST Jun 11,2014 | Reply

  6. Hi Jonathan

    thanks very much for getting back to me

    i implemented the change but here is my question and what I am trying to accomplish
    That worked great by the way

    I generaly gather column stats on the “CreateDate” column of my “DTree” table to get the high value for CreateDate

    begin
    dbms_stats.gather_table_stats(
    ownname=> ‘LLPROD’,
    tabname=> ‘DTREE’ ,
    estimate_percent=> DBMS_STATS.AUTO_SAMPLE_SIZE,
    cascade=> DBMS_STATS.AUTO_CASCADE,
    degree=> null,
    no_invalidate=> DBMS_STATS.AUTO_INVALIDATE,
    granularity=> ‘AUTO’,
    method_opt=> ‘FOR COLUMNS CREATEDATE SIZE 254′);
    end;
    /

    Checking, I get 254 buckets from user_tab_col_statistics

    But what I find is that I do not usually get the Highest value for CreateDate. probably because of sampling. I am guessing

    Using your script with the mods you just sent me
    I am setting the High Value for CreateDate to SYSDATE for the table

    I now see the buckets are getting reset to 1
    and the CreateDate does have the SYSDATE value

    My question is do I have to be concerned that the Buckets are now 1 ?
    What happens when the stats are collected by Oracle Enterprise Manager (10g) ?

    Does the High Value get re-valued ?

    Appreciate Your Help

    Jim

    Comment by Jim — June 16, 2014 @ 8:08 pm BST Jun 16,2014 | Reply

    • Jim,

      So what you’re after is code to replace the current highest value in the histogram with a value you select while leaving the histogram intact. (I’d be inclined to run my own code to generate the whole histogram, but that might be duplicating too much run-time resource usage if can’t change the code that’s already there to gather the histogram).

      I don’t think I’ve tried this before – so subject to testing and I may blog it some time over the next couple of weeks – but I think you could include a loop in the pl/sql to populate the date array and the bucket array – then putting in your high value on the last date array entry. You probably need to consider variations for frequency and height-balanced histograms, and the query for the bucket array might require you to use an analytic function to find differences between endpoint values.

      Something like:

      
      	d_array		:= dbms_stats.datearray();
      	srec.bkvals	:= dbms_stats.numarray();
      
      	for r in (
      		select  ...
      		from    user_tab_histograms
      		where   table_name = ... and column_name = ...
      		order by endpoint_value
      	) loop
      
      		d_array.extend;
      		srec.bkvals.extend;
      
      		ct := ct+1;
      		d_array(ct)	:= r.endpoint_value;
      		srec.bkvals(ct)	:= r.{bucket_count_value}
      
      	end loop;
      
      	d_array(ct) := sysdate;
      
      

      Comment by Jonathan Lewis — June 17, 2014 @ 7:48 am BST Jun 17,2014 | Reply

  7. Hi Jonathan

    I may now be getting way over my head with this new code

    I am just trying to ensure that the highest value of the CREATEDATE column for the table DTREE is todays date
    after i collect stats on the column:

    execute dbms_stats.gather_table_stats(ownname=>’LLPROD’,tabname=>’DTREE’,estimate_percent=>NULL,METHOD_OPT=>’FOR COLUMNS CREATEDATE SIZE 254′);

    i check this in user_tab_col_statistics and you can see the date is not the sysdate yet

    col high_value format a16
    col low_value format a16

    select
    low_value,
    raw_to_date(low_value) low_date,
    high_value,
    raw_to_date(high_value) high_date
    from
    user_tab_col_statistics
    where
    table_name=’DTREE’
    and
    column_name=’CREATEDATE';

    LOW_VALUE LOW_DATE HIGH_VALUE HIGH_DATE
    —————- —————- —————- —————
    74650101010101 01-JAN-01 7872060D0B1029 13-JUN-14

    TABLE_NAME COLUMN_NAME LAST_ANALYZED NUM_BUCKETS NUM_DISTINCT
    ———— ———— ———————— ———– ————
    DTREE CREATEDATE 17-JUN-2014 09:57:29 254 2870018

    If I run the previous code I get the sysdate value but I only now have 1 bucket instead of 254

    LOW_VALUE LOW_DATE HIGH_VALUE HIGH_DATE
    —————- —————- —————- —————-
    74650101010101 01-JAN-01 787206110B3A0E 17-JUN-14

    TABLE_NAME COLUMN_NAME LAST_ANALYZED NUM_BUCKETS NUM_DISTINCT
    ———— ———— ———————— ———– ————
    DTREE CREATEDATE 17-JUN-2014 10:57:13 1 2870018

    Trying to implement your code is a bit overwhelming for me
    as I am not a pl/sql expert so I am getting errors tring to incorporate

    I am not sure what you meant by thee lines

    select …
    srec.bkvals(ct) := r.{bucket_count_value}

    Here is what i coded so far and the errors

    I know I am missing the boat

    DECLARE
    ct NUMBER;
    srec DBMS_STATS.statrec;
    d_array DBMS_STATS.DATEARRAY;

    BEGIN
    d_array := dbms_stats.datearray();
    srec.bkvals := dbms_stats.numarray();
    for r in (
    select
    endpoint_value,
    endpoint_number
    from
    user_tab_histograms
    where
    table_name = ‘DTREE’
    and
    column_name = ‘CREATEDATE’
    order by
    endpoint_value
    ) loop

    d_array.extend;
    srec.bkvals.extend;

    ct := ct+1;
    d_array(ct) := r.endpoint_value;
    srec.bkvals(ct) := r.endpoint_number;

    end loop;

    d_array(ct) := sysdate;
    END;
    /

    d_array(ct) := r.endpoint_value;
    *
    ERROR at line 27:
    ORA-06550: line 27, column 20:
    PLS-00382: expression is of wrong type
    ORA-06550: line 27, column 3:
    PL/SQL: Statement ignored

    Any assitance is appreciated
    Jim

    Comment by Jim — June 17, 2014 @ 7:03 pm BST Jun 17,2014 | Reply

    • Jim,

      Sorry, I’d assumed you were proficient in PL/SQL.

      I’ll see if I can find an odd half hour some time later this week to write up an example as a new blog item. It may have to wait until I’m on the plane to Seattle for Kaleidoscope, though.

      Comment by Jonathan Lewis — June 18, 2014 @ 5:56 am BST Jun 18,2014 | Reply

      • JL,
        I wish I was .. I am self taught and I learn as I go. Appreciate when folks like you help out

        I wil keep tring to figure it out though

        Jim

        Comment by Jim — June 18, 2014 @ 10:52 pm BST Jun 18,2014 | Reply

        • Hi Jonathan

          I found some code (not mine) I implemented to get past the error I was getting and I am pretty confident that I am getting the results I need, but I cannot seem to get figure out the next step which is updating that last record in the histogram table.

          Here is what I came up with thanks to your input and Mr. Rajeshwaran at http://rajeshwaranbtech.blogspot.com/2011/01/histograms-on-date-varchar-and-numeric.html

          tested the code in a select statment::

          select 
             endpoint_value,
             TO_CHAR (endpoint_value) endpoint_value_char,
             TO_DATE (FLOOR (endpoint_value) || '.' || TO_CHAR (86400 * MOD (endpoint_value,1), 'FM999999999'),'J,sssss') endpoint_value_date,
             endpoint_number
          from 
             user_tab_histograms
          where 
             table_name = 'DTREE' 
          and
            column_name = 'CREATEDATE'
          order by 
             endpoint_value;
          
          
          
          tried the pl/sql code out
          
          DECLARE
             ct                       NUMBER :=0;
             srec                     DBMS_STATS.statrec;
             d_array                  DBMS_STATS.DATEARRAY;
          
          BEGIN
          	d_array		:= dbms_stats.datearray();
          	srec.bkvals	:= dbms_stats.numarray();
          	for r in (
                           select 
                              endpoint_value,
                              endpoint_number
                           from 
                              user_tab_histograms
                           where 
                              table_name = 'DTREE' 
                           and
                              column_name = 'CREATEDATE'
          		 order by 
                              endpoint_value
          	         ) 
                 loop
          
          		d_array.extend;
          		srec.bkvals.extend;
          
          		ct := ct+1;
                          d_array(ct)	:= TO_DATE (FLOOR (r.endpoint_value) || '.' || TO_CHAR (86400 * MOD (r.endpoint_value,1), 'FM999999999'),'J,sssss');
          		srec.bkvals(ct)	:= r.endpoint_number;
          
                          DBMS_OUTPUT.put_line ('ct(' || ct || ')   ' || 'srec.bkvals='  || srec.bkvals(ct) || '   ' || 
                                                'r.endpoint_value=' || r.endpoint_value || '   ' || 'd_array(' || ct   ||')' || d_array(ct) 
                          );
          
          	end loop;    
          
                  DBMS_OUTPUT.put_line ('Setting Last value to SYSDATE');
          	d_array(ct) := sysdate;
                  DBMS_OUTPUT.put_line ('d_array(' || ct  || ')' || d_array(ct) );
          END;
          /
          
          Sample Output:
          ct(1)   srec.bkvals=0   r.endpoint_value=2305814                    d_array(1)01-JAN-01
          ct(2)   srec.bkvals=1   r.endpoint_value=2449118.58960648   d_array(2)10-MAY-93
          ct(3)   srec.bkvals=2   r.endpoint_value=2449798.59928241   d_array(3)21-MAR-95
          ct(4)   srec.bkvals=3   r.endpoint_value=2450102.125             d_array(4)19-JAN-96
          ct(5)   srec.bkvals=4   r.endpoint_value=2450303.64722222   d_array(5)07-AUG-96
          .
          .
          .
          .
          ct(250)   srec.bkvals=251   r.endpoint_value=2455020.63541667   d_array(250)07-JUL-09
          ct(251)   srec.bkvals=252   r.endpoint_value=2455029.45696759   d_array(251)16-JUL-09
          ct(252)   srec.bkvals=253   r.endpoint_value=2455035.32701389   d_array(252)22-JUL-09
          ct(253)   srec.bkvals=254   r.endpoint_value=2456836.575625       d_array(253)27-JUN-14
          Setting Last value to SYSDATE
          d_array(253)02-JUL-14
          
          
          Your advice and thoughts are appreciated
          I will continue to see if I can find some more info on the internet to complete this
          

          Thanks
          jim

          Comment by Jim — July 2, 2014 @ 8:18 pm BST Jul 2,2014

        • Jim,

          I wrote up an example of how to do this a few days ago, but forgot to publish it. There are a couple of little traps in the method I’ve used (which is just a quick and dirty, so there may be a smarter method). I’ll copy the whole demo script to the blog in a couple of minutes.

          Comment by Jonathan Lewis — July 4, 2014 @ 8:21 pm BST Jul 4,2014

  8. […] is a quick response to a question on an old blog post asking how you can adjust the high value if you’ve already got a height-balanced histogram […]

    Pingback by Adjusting Histograms | Oracle Scratchpad — July 4, 2014 @ 8:32 pm BST Jul 4,2014 | Reply

  9. Hi Jonathan

    will take a look at the code you wrote
    really appreciate your time in this matter

    Thanks
    Jim

    Comment by Jim — July 7, 2014 @ 4:17 pm BST Jul 7,2014 | Reply


RSS feed for comments on this post. TrackBack URI

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

The Rubric Theme. Blog at WordPress.com.

Follow

Get every new post delivered to your Inbox.

Join 3,905 other followers