Oracle Scratchpad

May 5, 2014


Filed under: Histograms,Oracle,Statistics — Jonathan Lewis @ 2:58 pm GMT May 5,2014

I wrote a note a few years ago about translating the endpoint_value for histograms on character columns, and in that note I left the casual comment that you would: fiddle with this bit of code to handle multibyte character sets”. For anyone who has never found time to do that fiddling, here’s a solution.  In fact, it’s also a better solution for the original character problem than the one I originally published – I have no idea why I didn’t use the utl_raw package to simplify the code years ago.

First, a brief explanation of the problem. To store a character value in the numeric endpoint_value column of a histogram, Oracle:

  • Takes the first 15 bytes of the string (after padding the string with zeros (for varchar2) or spaces (for char))
  • Treats the result as a 15 byte (30 digit) hexadecimal number, and converts to decimal
  • Rounds to 15 significant digits (why?!) and stores the result as the endpoint_value

There are two problems with my old code when trying to convert back to a (multi-byte) character format; after converting back to a hexadecimal format it doesn’t know how many consecutive bytes should be used to make up each character of the string – this problem can be circumvented by using the utl_raw.cast_to_[n]varchar2() function. (Note: the dbms_stats package has a number of convert_raw_value() procedures which could also be used, but to take advantage of them I’d have to create a wrapper function for each procedure.)

The second problem is one that’s easy to overlook (and possibly survive with) for some time – some multi-byte characters start with a zero byte – or perhaps more than a single zero byte.  If the first character of a string that’s going into the histogram data starts with such a character then conversion to a number loses the leading zeros, which means you have to find a way of recovering the right number of zeros before doing the conversion from raw to character. The solution to this one is also fairly straightforward: we know that the endpoint_value when converted to a hexadecimal number should be 15 bytes, which means in its hex string presentation it will be 30 characters: if it’s not 30 characters than pad it on the left with zeros up to 30 characters, then call utl_raw.cast_to_[n]varchar2() on the result. Here’s a dataset to demonstrate the point (the results show came from with the varchar2() using WE8MSWIN1252 and nvarchar2() using AL16UTF16):

create table t1
	object_type				v_type,
	cast(object_type as nvarchar2(19))	n_type
	rownum <= 10000

		ownname		 => user,
		tabname		 =>'T1',
		method_opt	 => 'for all columns size 254'

	table_name = 'T1' 
order by
	column_name desc,

Column                 EP no LPAD(TO_CHAR(ENDPOINT_VALUE,'F
-------------------- ------- ------------------------------
V_TYPE                    10 434c5553544566083af75039400000
                          12 434f4e5445585b0be13e9077800000
                          13 45444954494f54acac5b71e6c00000
                          65 46554e4354494bfd6e91eac9000000
                       1,218 494e444557ffec1b8cf7b386200000
                       1,284 494e44455820524115d2f593c00000
                       1,287 4f5045524154487b7a4542a0400000
                       1,459 5041434b41476016d731ed1a200000
                       1,473 50524f43454439c0c7fd90d5c00000
                       1,475 53455155454e581a4a938544400000
                       4,576 53594e4f4e59319bbd81e9d6a00000
                       5,613 5441424c45001de17dde9b57c00000
                       5,685 5441424c45204dd13d0c1786c00000
                       5,717 5441424c45204dd13d0c1786c00000
                       6,425 54595044ffffeda95d6a0f02200000
                      10,000 56494556fffffb4cec1ab3e3600000

N_TYPE                    10 0043004c00550091d5132d65900000
                          12 0043004f004e00699afbb463980000
                          13 0045004400490039662489da280000
                          65 00460055004e0032efa0cbcb200000
                       1,218 0049004e0044000ee521e524f00000
                       1,284 0049004e0044000ee521e524f00000
                       1,287 004f00500045007d7a2958e3400000
                       1,459 0050004100430082c9d92def100000
                       1,473 00500052004f003864f4c314680000
                       1,475 005300450051001bb6610d60c00000
                       4,576 00530059004e00250bbf32dc380000
                       5,613 0054004100420046e721e46a700000
                       5,685 0054004100420046e721e46a700000
                       5,717 0054004100420046e721e46a700000
                       6,425 0054005900500056f579391d680000
                      10,000 0056004900450054a090433f680000

32 rows selected.

As you can see, we’ve got a frequency histogram on both columns. If you’re good with ASCII codes you’ll be able to translate the first row for the v_type data: 43 = C, 4c = L, 55 = U, 53 = S, 54 = T, 45 = E, 66 = B, and then we run into funny characters; the ‘B’ and everything after it is part of the rounding error due to the algorithm Oracle is using. You can also check the first row for the n_type: it’s coming from a fixed width character set, 2 bytes per character: 0043 = C, 004c = L, 0055 = U, 0091 = {rounding error} – the rounding means we get much less information from the nvarchar2() histogram.

Rather than depending on knowing our code pages, though, we can get Oracle to do the translation work:

	substr(lpad(to_char(endpoint_value,'fmxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx'),30,'0'),1,30)  hex_value,
	)	char_value,
	table_name = 'T1'
order by
	column_name desc,

--	Clear graphics mode

execute dbms_output.put_line(chr(15))

Column       EP no HEX_VALUE                      CHAR_VALUE                                   ENDPOINT_VALUE ENDPOINT_ACTUAL_VALUE
---------- ------- ------------------------------ ---------- ------------------------------------------------ --------------------------------
V_TYPE          10 434c5553544566083af75039400000 CLUSTEf     349,432,112,834,658,000,000,000,000,000,000,000 CLUSTER
                12 434f4e5445585b0be13e9077800000 CONTEX[     349,492,405,757,772,000,000,000,000,000,000,000 CONTEXT
                13 45444954494f54acac5b71e6c00000 EDITIOT     359,653,496,833,182,000,000,000,000,000,000,000 EDITION
                65 46554e4354494bfd6e91eac9000000 FUNCTIK     365,190,985,547,816,000,000,000,000,000,000,000 FUNCTION
             1,218 494e444557ffec1b8cf7b386200000 INDEWyi     380,625,107,598,029,000,000,000,000,000,000,000 INDEX
             1,284 494e44455820524115d2f593c00000 INDEX R     380,625,107,598,182,000,000,000,000,000,000,000 INDEX PARTITION
             1,287 4f5045524154487b7a4542a0400000 OPERATH     411,819,536,792,506,000,000,000,000,000,000,000 OPERATOR
             1,459 5041434b41476016d731ed1a200000 PACKAG`     416,707,436,884,205,000,000,000,000,000,000,000 PACKAGE
             1,473 50524f43454439c0c7fd90d5c00000 PROCED9     417,053,186,114,358,000,000,000,000,000,000,000 PROCEDURE
             1,475 53455155454e581a4a938544400000 SEQUENX     432,366,569,392,218,000,000,000,000,000,000,000 SEQUENCE
             4,576 53594e4f4e59319bbd81e9d6a00000 SYNONY1     432,771,978,053,825,000,000,000,000,000,000,000 SYNONYM
             5,613 5441424c45001de17dde9b57c00000 TABLE      437,476,545,404,166,000,000,000,000,000,000,000 TABLE
             5,685 5441424c45204dd13d0c1786c00000 TABLE M     437,476,545,404,318,000,000,000,000,000,000,000 TABLE PARTITION
             5,717 5441424c45204dd13d0c1786c00000 TABLE M     437,476,545,404,318,000,000,000,000,000,000,000 TABLE SUBPARTITION
             6,425 54595044ffffeda95d6a0f02200000 TYPDyyi     437,964,430,179,117,000,000,000,000,000,000,000 TYPE
            10,000 56494556fffffb4cec1ab3e3600000 VIEVyyu     448,023,639,403,471,000,000,000,000,000,000,000 VIEW

N_TYPE          10 0043004c00550091d5132d65900000 CLU?          1,358,944,964,706,820,000,000,000,000,000,000  C L U S T E R
                12 0043004f004e00699afbb463980000 CONi          1,358,945,893,128,790,000,000,000,000,000,000  C O N T E X T
                13 0045004400490039662489da280000 EDI9          1,399,507,307,977,370,000,000,000,000,000,000  E D I T I O N
                65 00460055004e0032efa0cbcb200000 FUN2          1,419,794,978,849,800,000,000,000,000,000,000  F U N C T I O N
             1,218 0049004e0044000ee521e524f00000 IND          1,480,640,041,218,460,000,000,000,000,000,000  I N D E X
             1,284 0049004e0044000ee521e524f00000 IND          1,480,640,041,218,460,000,000,000,000,000,000  I N D E X   P A R T I T I O N
             1,287 004f00500045007d7a2958e3400000 OPE}          1,602,335,117,815,120,000,000,000,000,000,000  O P E R A T O R
             1,459 0050004100430082c9d92def100000 PAC?          1,622,612,885,134,180,000,000,000,000,000,000  P A C K A G E
             1,473 00500052004f003864f4c314680000 PRO8          1,622,618,146,436,010,000,000,000,000,000,000  P R O C E D U R E
             1,475 005300450051001bb6610d60c00000 SEQ          1,683,461,351,951,280,000,000,000,000,000,000  S E Q U E N C E
             4,576 00530059004e00250bbf32dc380000 SYN%          1,683,467,541,637,310,000,000,000,000,000,000  S Y N O N Y M
             5,613 0054004100420046e721e46a700000 TABF          1,703,742,523,544,060,000,000,000,000,000,000  T A B L E
             5,685 0054004100420046e721e46a700000 TABF          1,703,742,523,544,060,000,000,000,000,000,000  T A B L E   P A R T I T I O N
             5,717 0054004100420046e721e46a700000 TABF          1,703,742,523,544,060,000,000,000,000,000,000  T A B L E   S U B P A R T I T I
             6,425 0054005900500056f579391d680000 TYPV          1,703,749,951,250,410,000,000,000,000,000,000  T Y P E
            10,000 0056004900450054a090433f680000 VIET          1,744,309,818,645,610,000,000,000,000,000,000  V I E W

32 rows selected.

I’ve limited the raw conversion to substr(,1,14) for the v_code and substr(,1,16) for the n_code because this translates into 7 and 4 characters respectively – and basically you’re lucky if you always get 6 / 3 characters coming to the right values. I can’t show you exactly what my output was like because there were various hidden characters (line feed, backspaces, “shift” and so on) that made a bit of a mess of the results; that’s why the dbms_output.put_line(chr(15)) is there at the end of the script – I needed it to switch my terminal out of graphic mode.

I’ve included endpoint_actual_value in my output. Because “TABLE PARTITION” and “TABLE SUBPARTITION” came to the same endpoint_value Oracle captured the first 32 bytes of the actual values it had found. (Note (a) it is bytes not characters, (b) this goes up to 64 in 12c, and (c) it’s being stored in a varchar2() column, which is why the values for column n_type appear to have alternating spaces – that’s the impact of all the zeros.


1 Comment »

  1. […] in the histogram data you’ll see they match up to the first 6 bytes (12 digits). Oracle has done its standard processing – take the first 6 bytes of the column, covert to decimal, round to the most significant 15 […]

    Pingback by Descending Indexes | Oracle Scratchpad — July 17, 2015 @ 8:43 am GMT Jul 17,2015 | Reply

RSS feed for comments on this post. TrackBack URI

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Powered by