Oracle Scratchpad

May 14, 2014

Feature Bypass

Filed under: CBO,Oracle,Troubleshooting — Jonathan Lewis @ 1:23 pm BST May 14,2014

Here’s a little tip that might be helpful occasionally when you’re trying to work out why the optimizer transformation you were expecting isn’t appearing

If you’ve ever checked the 10053 trace (and who wants to do that for a complex query) you may have noticed lines like:

SU: SU bypassed: Remote table referenced.

So now you know that SU – Subquery Unnesting – has limitations in distributed queries.

When I first saw a line like this, it crossed my mind that it would be useful to keep a reference list of features that could be reported as bypassed, which I do through a simple unix line:

strings -a oracle | grep -i bypassed > bypassed.txt

If you need a reference for the various short codes for transformations you can find it near the top of the 10053 trace, looking like this:

[sourecode gutter="false"]
CBQT – cost-based query transformation
JPPD – join predicate push-down
OJPPD – old-style (non-cost-based) JPPD
FPD – filter push-down
PM – predicate move-around
CVM – complex view merging
SPJ – select-project-join
SJC – set join conversion
SU – subquery unnesting
OBYE – order by elimination
OST – old style star transformation
ST – new (cbqt) star transformation
CNT – count(col) to count(*) transformation
JE – Join Elimination
JF – join factorization
SLP – select list pruning
DP – distinct placement
… big gap here
AP – adaptive plans
[/sourcecode]

May 12, 2014

Compression Units – 6

Filed under: Exadata,HCC,Oracle — Jonathan Lewis @ 1:34 pm BST May 12,2014

I received an email recently asking me if I knew how Oracle found specific rows and columns in a compression unit. This is a topic that I’ve spoken about a couple of times, and I’ve published several notes on the blog about it, including an image of a critical slide from one of my presentations, and I was expecting to find some notes somewhere about Oracle catalogues all the bits and pieces. Part 4 of this series lists some of the detail but I was slightly surprised to discover that it made the comment: “and somewhere in the CU there has to be a set of pointers to the first byte for each compressed column” – and this was the bit that my interrogator wanted to know. This posting is the answer – and I’ll start with a dump of the first block of a CU:


block_row_dump:
tab 0, row 0, @0x30
tl: 8016 fb: --H-F--N lb: 0x0  cc: 1
nrid:  0x01c06484.0
col  0: [8004]
Compression level: 03 (Archive Low)
 Length of CU row: 8004
kdzhrh: ------PC CBLK: 11 Start Slot: 00
 NUMP: 11
 PNUM: 00 POFF: 7884 PRID: 0x01c06484.0
 PNUM: 01 POFF: 15900 PRID: 0x01c06485.0
 PNUM: 02 POFF: 23916 PRID: 0x01c06486.0
 PNUM: 03 POFF: 31932 PRID: 0x01c06487.0
 PNUM: 04 POFF: 39948 PRID: 0x01c06488.0
 PNUM: 05 POFF: 47964 PRID: 0x01c06489.0
 PNUM: 06 POFF: 55980 PRID: 0x01c0648a.0
 PNUM: 07 POFF: 63996 PRID: 0x01c0648b.0
 PNUM: 08 POFF: 72012 PRID: 0x01c0648c.0
 PNUM: 09 POFF: 80028 PRID: 0x01c0648d.0
 PNUM: 10 POFF: 88044 PRID: 0x01c0648e.0
CU header:
CU version: 0   CU magic number: 0x4b445a30
CU checksum: 0x5c173142
CU total length: 93755
CU flags: NC-U-CRD-OP
ncols: 7
nrows: 4361
algo: 0
CU decomp length: 93159   len/value length: 309310
row pieces per row: 1
num deleted rows: 0
START_CU:
 00 00 1f 44 1f 0b 00 00 00 0b 00 00 1e cc 01 c0 64 84 00 00 00 00 3e 1c 01
 c0 64 85 00 00 00 00 5d 6c 01 c0 64 86 00 00 00 00 7c bc 01 c0 64 87 00 00
 00 00 9c 0c 01 c0 64 88 00 00 00 00 bb 5c 01 c0 64 89 00 00 00 00 da ac 01
 c0 64 8a 00 00 00 00 f9 fc 01 c0 64 8b 00 00 00 01 19 4c 01 c0 64 8c 00 00
 00 01 38 9c 01 c0 64 8d 00 00 00 01 57 ec 01 c0 64 8e 00 00 00 4b 44 5a 30
 42 31 17 5c 00 01 6e 3b eb 06 00 07 11 09 00 04 b8 3e 01 00 00 00 00 00 00
 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
...
...  20 repetitions of the previous line of zeros deleted
...  This is mostly the bitmap identifying deleted rows.
...  nrows=4361, so this will be about 4261, so this will be
...  about 4361/8 = 545 bytes of zeros.
...
 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 02 00 00 02 54 00 00 30
 bc 00 00 4a 82 00 00 64 37 00 00 a6 b8 00 00 e9 39 00 01 2b ba e6 c2 eb 8b
 00 00 2e 68 01 40 01 02 00 00 53 f0 78 9c a5 dc 07 57 23 5b 96 25 e0 ee 24

If you look at the symbolic dump there are lots of numbers that you could try to locate in the binary dump labelled START_CU:, for example the CU magic number 4b3445a30 is very obvious at the end of line 37 above (with the bytes in the right order), and the CU checksum of 5c173142 is visible at the start of line 38 (with the bytes in reverse order – no, don’t ask why me there’s a difference).

Bearing in mind how Oracle frequently uses “self-describing” strategies for data we might search for the value that gives the length of the CU as this could well be the start of the compressed data. But there are two CU lengths – the CU total length (93755) and the CU decomp length (93159). To my mind the latter is the “amount of CU requiring decompression” but let’s check for both: 93,755 (dec) = 0x16e3b, and 93,159 (dec) = 0x16be7 .

Take another look at line 38 – the second 4 bytes are: 00 01 6e 3b, the CU total length. Unfortunately we can’t see the CU decomp length anywhere, so let’s change strategy slightly – the different between the two lengths is 93755 – 93159 =  596 = 0x254, and by a convenient coincidence if you look at line 46, five bytes from the end, you find 02 54. Is this really a coincidence ? Let’s count 596 bytes from the CU total length and see where we get to – the answer is : the last 4 bytes of line 46 (e6 c2 eb 8b) This doesn’t seem to help at first sight, but take a look at where we are and what precedes that 4 bytes. There are 7 groups of 4 bytes that look like increasing values – and since I know more about the data than you do (it’s a table with 7 columns, and the last 4 columns are identical across all rows), I’ve extracted those 28 bytes and reformatted them:


                    	Offset		Delta
                    	------		-----
        00 00 02 54 	  596
        00 00 30 bc 	12476		11880
        00 00 4a 82 	19074		 6598
        00 00 64 37 	25655		 6581
        00 00 a6 b8 	42680		17025
        00 00 e9 39 	59705		17025
        00 01 2b ba 	76730		17025
                    	93755	<----


The “Offset” column is what you get by converting from Hex to decimal, and the “Delta” column is the difference between one value and the next. Notice how the last three deltas are the same, and how, if you add the same delta to the last Offset you get a number which is the CU total length. The “02 54″ I found as the difference between the CU total length and the CU decomp length is actually the starting point for the list of pointers to each of the compressed columns, and the repeated 17,025 is the length of my duplicated final four columns. The e6 c2 eb 8b is the first four bytes of the first (compressed) column.

Having worked out that it’s possible to find the list of pointers quite easily from the dump, the next step (left as an exercise to the interested reader) is to work out the algorithm that Oracle uses to locate the list – since the CU decomp length is clearly derived and not actually stored. As a little clue, you might start by taking another look at line 38: bytes 11/12 are 00 07 – the number of columns in the table, and bytes 13/14 are 11 09 – the number of rows in the CU (and that’s the most significant number at this point because it tells you how long the bitmap for deleted rows will be).

May 7, 2014

Quiz Night

Filed under: Indexing,Oracle — Jonathan Lewis @ 2:18 pm BST May 7,2014

Okay – so it’s not night time in my home time-zone, but I’m in Singapore at the moment so it’s night time.

A very simple little quiz – so I’ve disabled comments for the moment and will re-enable them tomorrow morning to allow more people to have a chance to see the question without the solution.

Explain the anomaly displayed in the following “cut-n-paste” from a session running SQL*Plus on 11.2.0.4:

SQL> create unique index t1_i1 on t1(v1 desc);
create unique index t1_i1 on t1(v1 desc)
                                *
ERROR at line 1:
ORA-01452: cannot CREATE UNIQUE INDEX; duplicate keys found

SQL> create unique index t1_i1 on t1(v1);

Index created.

Answer

Well it didn’t take long for an answer and several bits of related infomration to show up – as Martin pointed out, all I have to do is insert NULL into the table twice.

To create an entry in a descending index, Oracle takes the 1’s-complement of each column and appends an 0xFF byte to each column – except in the case of a null column where the null is replaced with a 0x00. (And, as Sayan points out, funny things happen if you have a varchar2() column which has already reached the 4,000 byte limit)

The point of the 1’s-complement is that if you walk through the stored values in ascending order you’re walking through the original values in descending – provided you have the 0xFF on the end of each non-null entry.

 

May 5, 2014

endpoint_value

Filed under: Histograms,Oracle,Statistics — Jonathan Lewis @ 2:58 pm BST May 5,2014

I wrote a note a few years ago about translating the endpoint_value for histograms on character columns, and in that note I left the casual comment that you would: fiddle with this bit of code to handle multibyte character sets”. For anyone who has never found time to do that fiddling, here’s a solution.  In fact, it’s also a better solution for the original character problem than the one I originally published – I have no idea why I didn’t use the utl_raw package to simplify the code years ago.

First, a brief explanation of the problem. To store a character value in the numeric endpoint_value column of a histogram, Oracle:

  • Takes the first 15 bytes of the string (after padding the string with zeros (for varchar2) or spaces (for char))
  • Treats the result as a 15 byte (30 digit) hexadecimal number, and converts to decimal
  • Rounds to 15 significant digits (why?!) and stores the result as the endpoint_value

There are two problems with my old code when trying to convert back to a (multi-byte) character format; after converting back to a hexadecimal format it doesn’t know how many consecutive bytes should be used to make up each character of the string – this problem can be circumvented by using the utl_raw.cast_to_[n]varchar2() function. (Note: the dbms_stats package has a number of convert_raw_value() procedures which could also be used, but to take advantage of them I’d have to create a wrapper function for each procedure.)

The second problem is one that’s easy to overlook (and possibly survive with) for some time – some multi-byte characters start with a zero byte – or perhaps more than a single zero byte.  If the first character of a string that’s going into the histogram data starts with such a character then conversion to a number loses the leading zeros, which means you have to find a way of recovering the right number of zeros before doing the conversion from raw to character. The solution to this one is also fairly straightforward: we know that the endpoint_value when converted to a hexadecimal number should be 15 bytes, which means in its hex string presentation it will be 30 characters: if it’s not 30 characters than pad it on the left with zeros up to 30 characters, then call utl_raw.cast_to_[n]varchar2() on the result. Here’s a dataset to demonstrate the point (the results show came from 11.2.0.4 with the varchar2() using WE8MSWIN1252 and nvarchar2() using AL16UTF16):

create table t1
as
select
	object_type				v_type,
	cast(object_type as nvarchar2(19))	n_type
from
	all_objects
where
	rownum <= 10000
;

begin
	dbms_stats.gather_table_stats(
		ownname		 => user,
		tabname		 =>'T1',
		method_opt	 => 'for all columns size 254'
	);
end;
/

select 
	column_name,
	endpoint_number,
	lpad(to_char(endpoint_value,'fmxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx'),30,'0')
from 
	user_tab_histograms
where 
	table_name = 'T1' 
order by
	column_name desc,
	endpoint_number
;

Column                 EP no LPAD(TO_CHAR(ENDPOINT_VALUE,'F
-------------------- ------- ------------------------------
V_TYPE                    10 434c5553544566083af75039400000
                          12 434f4e5445585b0be13e9077800000
                          13 45444954494f54acac5b71e6c00000
                          65 46554e4354494bfd6e91eac9000000
                       1,218 494e444557ffec1b8cf7b386200000
                       1,284 494e44455820524115d2f593c00000
                       1,287 4f5045524154487b7a4542a0400000
                       1,459 5041434b41476016d731ed1a200000
                       1,473 50524f43454439c0c7fd90d5c00000
                       1,475 53455155454e581a4a938544400000
                       4,576 53594e4f4e59319bbd81e9d6a00000
                       5,613 5441424c45001de17dde9b57c00000
                       5,685 5441424c45204dd13d0c1786c00000
                       5,717 5441424c45204dd13d0c1786c00000
                       6,425 54595044ffffeda95d6a0f02200000
                      10,000 56494556fffffb4cec1ab3e3600000

N_TYPE                    10 0043004c00550091d5132d65900000
                          12 0043004f004e00699afbb463980000
                          13 0045004400490039662489da280000
                          65 00460055004e0032efa0cbcb200000
                       1,218 0049004e0044000ee521e524f00000
                       1,284 0049004e0044000ee521e524f00000
                       1,287 004f00500045007d7a2958e3400000
                       1,459 0050004100430082c9d92def100000
                       1,473 00500052004f003864f4c314680000
                       1,475 005300450051001bb6610d60c00000
                       4,576 00530059004e00250bbf32dc380000
                       5,613 0054004100420046e721e46a700000
                       5,685 0054004100420046e721e46a700000
                       5,717 0054004100420046e721e46a700000
                       6,425 0054005900500056f579391d680000
                      10,000 0056004900450054a090433f680000


32 rows selected.

As you can see, we’ve got a frequency histogram on both columns. If you’re good with ASCII codes you’ll be able to translate the first row for the v_type data: 43 = C, 4c = L, 55 = U, 53 = S, 54 = T, 45 = E, 66 = B, and then we run into funny characters; the ‘B’ and everything after it is part of the rounding error due to the algorithm Oracle is using. You can also check the first row for the n_type: it’s coming from a fixed width character set, 2 bytes per character: 0043 = C, 004c = L, 0055 = U, 0091 = {rounding error} – the rounding means we get much less information from the nvarchar2() histogram.

Rather than depending on knowing our code pages, though, we can get Oracle to do the translation work:


select
	column_name,
	endpoint_number,
	substr(lpad(to_char(endpoint_value,'fmxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx'),30,'0'),1,30)  hex_value,
	decode(
		column_name,
			'V_TYPE',
				utl_raw.cast_to_varchar2(
					substr(lpad(to_char(endpoint_value,'fmxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx'),30,'0'),1,14) 
				),
			'N_TYPE',
				utl_raw.cast_to_nvarchar2(
					substr(lpad(to_char(endpoint_value,'fmxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx'),30,'0'),1,16) 
				)
	)	char_value,
	endpoint_value,
	endpoint_actual_value
from
	user_tab_histograms
where
	table_name = 'T1'
order by
	column_name desc,
	endpoint_number
;

--
--	Clear graphics mode
--

execute dbms_output.put_line(chr(15))

Column       EP no HEX_VALUE                      CHAR_VALUE                                   ENDPOINT_VALUE ENDPOINT_ACTUAL_VALUE
---------- ------- ------------------------------ ---------- ------------------------------------------------ --------------------------------
V_TYPE          10 434c5553544566083af75039400000 CLUSTEf     349,432,112,834,658,000,000,000,000,000,000,000 CLUSTER
                12 434f4e5445585b0be13e9077800000 CONTEX[     349,492,405,757,772,000,000,000,000,000,000,000 CONTEXT
                13 45444954494f54acac5b71e6c00000 EDITIOT     359,653,496,833,182,000,000,000,000,000,000,000 EDITION
                65 46554e4354494bfd6e91eac9000000 FUNCTIK     365,190,985,547,816,000,000,000,000,000,000,000 FUNCTION
             1,218 494e444557ffec1b8cf7b386200000 INDEWyi     380,625,107,598,029,000,000,000,000,000,000,000 INDEX
             1,284 494e44455820524115d2f593c00000 INDEX R     380,625,107,598,182,000,000,000,000,000,000,000 INDEX PARTITION
             1,287 4f5045524154487b7a4542a0400000 OPERATH     411,819,536,792,506,000,000,000,000,000,000,000 OPERATOR
             1,459 5041434b41476016d731ed1a200000 PACKAG`     416,707,436,884,205,000,000,000,000,000,000,000 PACKAGE
             1,473 50524f43454439c0c7fd90d5c00000 PROCED9     417,053,186,114,358,000,000,000,000,000,000,000 PROCEDURE
             1,475 53455155454e581a4a938544400000 SEQUENX     432,366,569,392,218,000,000,000,000,000,000,000 SEQUENCE
             4,576 53594e4f4e59319bbd81e9d6a00000 SYNONY1     432,771,978,053,825,000,000,000,000,000,000,000 SYNONYM
             5,613 5441424c45001de17dde9b57c00000 TABLE      437,476,545,404,166,000,000,000,000,000,000,000 TABLE
             5,685 5441424c45204dd13d0c1786c00000 TABLE M     437,476,545,404,318,000,000,000,000,000,000,000 TABLE PARTITION
             5,717 5441424c45204dd13d0c1786c00000 TABLE M     437,476,545,404,318,000,000,000,000,000,000,000 TABLE SUBPARTITION
             6,425 54595044ffffeda95d6a0f02200000 TYPDyyi     437,964,430,179,117,000,000,000,000,000,000,000 TYPE
            10,000 56494556fffffb4cec1ab3e3600000 VIEVyyu     448,023,639,403,471,000,000,000,000,000,000,000 VIEW

N_TYPE          10 0043004c00550091d5132d65900000 CLU?          1,358,944,964,706,820,000,000,000,000,000,000  C L U S T E R
                12 0043004f004e00699afbb463980000 CONi          1,358,945,893,128,790,000,000,000,000,000,000  C O N T E X T
                13 0045004400490039662489da280000 EDI9          1,399,507,307,977,370,000,000,000,000,000,000  E D I T I O N
                65 00460055004e0032efa0cbcb200000 FUN2          1,419,794,978,849,800,000,000,000,000,000,000  F U N C T I O N
             1,218 0049004e0044000ee521e524f00000 IND          1,480,640,041,218,460,000,000,000,000,000,000  I N D E X
             1,284 0049004e0044000ee521e524f00000 IND          1,480,640,041,218,460,000,000,000,000,000,000  I N D E X   P A R T I T I O N
             1,287 004f00500045007d7a2958e3400000 OPE}          1,602,335,117,815,120,000,000,000,000,000,000  O P E R A T O R
             1,459 0050004100430082c9d92def100000 PAC?          1,622,612,885,134,180,000,000,000,000,000,000  P A C K A G E
             1,473 00500052004f003864f4c314680000 PRO8          1,622,618,146,436,010,000,000,000,000,000,000  P R O C E D U R E
             1,475 005300450051001bb6610d60c00000 SEQ          1,683,461,351,951,280,000,000,000,000,000,000  S E Q U E N C E
             4,576 00530059004e00250bbf32dc380000 SYN%          1,683,467,541,637,310,000,000,000,000,000,000  S Y N O N Y M
             5,613 0054004100420046e721e46a700000 TABF          1,703,742,523,544,060,000,000,000,000,000,000  T A B L E
             5,685 0054004100420046e721e46a700000 TABF          1,703,742,523,544,060,000,000,000,000,000,000  T A B L E   P A R T I T I O N
             5,717 0054004100420046e721e46a700000 TABF          1,703,742,523,544,060,000,000,000,000,000,000  T A B L E   S U B P A R T I T I
             6,425 0054005900500056f579391d680000 TYPV          1,703,749,951,250,410,000,000,000,000,000,000  T Y P E
            10,000 0056004900450054a090433f680000 VIET          1,744,309,818,645,610,000,000,000,000,000,000  V I E W

32 rows selected.

I’ve limited the raw conversion to substr(,1,14) for the v_code and substr(,1,16) for the n_code because this translates into 7 and 4 characters respectively – and basically you’re lucky if you always get 6 / 3 characters coming to the right values. I can’t show you exactly what my output was like because there were various hidden characters (line feed, backspaces, “shift” and so on) that made a bit of a mess of the results; that’s why the dbms_output.put_line(chr(15)) is there at the end of the script – I needed it to switch my terminal out of graphic mode.

I’ve included endpoint_actual_value in my output. Because “TABLE PARTITION” and “TABLE SUBPARTITION” came to the same endpoint_value Oracle captured the first 32 bytes of the actual values it had found. (Note (a) it is bytes not characters, (b) this goes up to 64 in 12c, and (c) it’s being stored in a varchar2() column, which is why the values for column n_type appear to have alternating spaces – that’s the impact of all the zeros.

 

May 4, 2014

Extended stats

Filed under: 12c,Histograms,Oracle,Statistics — Jonathan Lewis @ 1:24 pm BST May 4,2014

Like the recent article on deleting histograms this is another draft that I rediscovered while searching for some notes I had written on a different topic – so I’ve finally finished it off and published it.

Here’s a quirky little detail of extended stats that came up in an OTN thread earlier on this week [ed: actually 8th Jan 2014]. When you create column group stats, Oracle uses an undocumented function sys_op_combined_hash() to create a hash value, and if you gather simple stats on the column (i.e. no histogram) you can get some idea of the range of values that Oracle generates through the hash function. For example:


create table t1 as
select  1 n1, 2 n2
from dual
connect by level<=5000
union all
select  2, 1
from dual
connect by level<=5000
;

select dbms_stats.create_extended_stats(user,'t1','(n1, n2)') name from dual;

begin
	dbms_stats.gather_table_stats(
		ownname		 => user,
		tabname		 =>'T1',
		method_opt 	 => 'for all columns size 1'
	);
end;
/

break on column_name skip 1

column column_name format a32
column endpoint_number heading "EP No."
column endpoint_value  heading "Value" format 999,999,999,999,999,999,999

select
	column_name, endpoint_number, endpoint_value
from
	user_tab_histograms
where
	table_name = 'T1'
order by
	column_name, endpoint_value
;

COLUMN_NAME                          EP No.                        Value
-------------------------------- ---------- ----------------------------
N1                                        0                            1
                                          1                            2

N2                                        0                            1
                                          1                            2

SYS_STUBZH0IHA7K$KEBJVXO5LOHAS            0      298,332,787,864,733,000
                                          1    8,095,781,421,167,520,000

I could have selected low_value and high_value from user_tab_cols, using utl_raw.cast_to_number() to display them in numeric format, but the view user_tab_histograms display the low and high as a two-bucket histogram if there is no actual histogram data for the column in the histogram (histgrm$) table.

We probably don’t need to worry about what the low and high values might be because taking hash values destroys any meaning that a range might have (the optimizer can’t use column group stats in range-based predicates, only in equality predicates). However, we might collect a frequency histogram (or Top-N histogram in 12c) on the column group because there might be some data skew in the sets of values that we need to tell the optimizer about – so let’s gather a histogram with 2 buckets on our sample data set and see what we get:


begin
	dbms_stats.gather_table_stats(
		ownname		 => user,
		tabname		 =>'T1',
		method_opt 	 => 'for all columns size 2'
	);
end;
/

select
	column_name, endpoint_number, endpoint_value
from
	user_tab_histograms
where
	table_name = 'T1'
order by
	column_name, endpoint_value
;

COLUMN_NAME                          EP No.                        Value
-------------------------------- ---------- ----------------------------
N1                                     5000                            1
                                      10000                            2

N2                                     5000                            1
                                      10000                            2

SYS_STUBZH0IHA7K$KEBJVXO5LOHAS         5000                1,977,102,303
                                      10000                7,894,566,276

The histogram values have changed! As one of the posters on the OTN thread points out, what Oracle has actually stored in this case is mod(sys_op_combined_hash() ,9999999999).

So far I haven’t done any exhaustive testing to work out whether the change in the strategy for storing numbers makes any difference to the typical optimizer arithmetic – but I do have at least one case (relating to “missing values” behaviour where the presence or absence of a column group histogram does make a difference to the estimated cardinality in a way that seems inconsistent with other patterns of behaviour: I doubt if it’s actually due to the change in what’s stored, and one day I may come across a client where I actually need to work out what’s going on and how best to work with the anomaly.

Bonus thought:

In 12c Oracle collects column stats automatically as it loads data into an empty table; but not if it’s got extended stats defined on it.  (This is bug no 18425876, fixed in 12.2). Here’s some code modelling a client scenario where we truncate and reload a table every day. In the first part of the demonstration I’ve loaded the table twice to show that after truncating and reloading I get new stats on the table – the first load is 10,000 rows, the second is 20,000 rows and the stats reflect this automatically. In the second part of the code, after adding a set of column group stats, truncating and loading 5,000 rows, the stats from the previous cycle are still in place. (The code is only relevant to 12c, of course)

create table t1 (n1 number, n2 number);

insert	/*+ append */
into	t1
select	object_id, data_object_id
from	all_objects
where
	rownum <= 10000
;
commit;

-- stats have appeared without a call to dbms_stats to gather them.

select blocks, num_rows from user_tables where table_name = 'T1';
select column_name, num_distinct, num_nulls, density, low_value, high_value from user_tab_cols where table_name = 'T1';

truncate table t1;

insert	/*+ append */
into	t1
select	object_id, data_object_id
from	all_objects
where
	rownum <= 20000
;
commit;

-- Stats now show the latest data 

select blocks, num_rows from user_tables where table_name = 'T1';
select column_name, num_distinct, num_nulls, density, low_value, high_value from user_tab_cols where table_name = 'T1';

-- Add a column group to the stats 

select dbms_stats.create_extended_stats(user,'t1','(n1, n2)') name from dual;

truncate table t1;

insert	/*+ append */
into	t1
select	object_id, data_object_id
from	all_objects
where
	rownum <= 5000
;
commit;

-- The stats have not been updated to reflect the new data, and the column group stats are empty

select blocks, num_rows from user_tables where table_name = 'T1';
select column_name, num_distinct, num_nulls, density, low_value, high_value from user_tab_cols where table_name = 'T1';

--------------------------------------------------------
-- Here are the three consecutive sets of results
--------------------------------------------------------

    BLOCKS   NUM_ROWS
---------- ----------
        16      10000

COLUMN_NAME                      NUM_DISTINCT  NUM_NULLS    DENSITY LOW_VALUE                  HIGH_VALUE
-------------------------------- ------------ ---------- ---------- -------------------------- --------------------------
N1                                      10000          0      .0001 C103                       C3020C60
N2                                       2534       7429 .000394633 C103                       C30B2929

    BLOCKS   NUM_ROWS
---------- ----------
        32      20000

COLUMN_NAME                      NUM_DISTINCT  NUM_NULLS    DENSITY LOW_VALUE                  HIGH_VALUE
-------------------------------- ------------ ---------- ---------- -------------------------- --------------------------
N1                                      20000          0     .00005 C103                       C30A4553
N2                                       3115      16848 .000321027 C103                       C30B2929

    BLOCKS   NUM_ROWS
---------- ----------
        32      20000

COLUMN_NAME                      NUM_DISTINCT  NUM_NULLS    DENSITY LOW_VALUE                  HIGH_VALUE
-------------------------------- ------------ ---------- ---------- -------------------------- --------------------------
N1                                      20000          0     .00005 C103                       C30A4553
N2                                       3115      16848 .000321027 C103                       C30B2929
SYS_STUBZH0IHA7K$KEBJVXO5LOHAS

The workaround given in the bug is “add the extended stats after loading the table” – but if you’re constantly truncating and reloading that means you have to drop and add the extended stats and do a tablescan to gather the column group stats every time you reload.

Note: the limitation applies whether you create a column group, “ordinary” extended stats, a virtual column, or an implicit virtual column underlying a function-based index.

Just as a little aside – when I first wrote the demo script I forgot to put in the commit; after the insert/append – which meant I was trying to create column group stats on a table which should have given me Oracle error: “ORA-12838: cannot read/modify an object after modifying it in parallel”; instead this has been trapped by the dbms_stats package and shows up as a slightly confusing:


select dbms_stats.create_extended_stats(user,'t1','(n1, n2)') name from dual
       *
ERROR at line 1:
ORA-20001: Error when processing extension -  resource busy and acquire with NOWAIT specified or timeout expired
ORA-06512: at "SYS.DBMS_STATS", line 12977
ORA-06512: at "SYS.DBMS_STATS", line 44967
ORA-06512: at "SYS.DBMS_STATS", line 44986

May 2, 2014

Costing Bug

Filed under: Bugs,CBO,Execution plans,Oracle,subqueries — Jonathan Lewis @ 8:53 am BST May 2,2014

It’s amazing how you can find little bugs (or anomalies) as soon as you start to look closely at how things work in Oracle. I started to write an article for All Things Oracle last night about execution plans with subqueries, so wrote a little script to generate some sample data, set up the first sample query, checked the execution plan, and stopped because the final cost didn’t make sense. Before going on I should point out that this probably doesn’t matter and probably wouldn’t cause a change in the execution plan if the calculation were corrected – but it is just an interesting indication of the odd things that can happen when sections of modular code are combined in an open-ended way. Here’s the query (running on 11.2.0.4) with execution plan:


update t1 set 
	n1 = (
		select	max(mod100)
		from	t2
		where	t2.id = t1.id
	),
	n2 = (
		select	max(trunc100)
		from	t3
		where	t3.id = t1.id
	)
where
	id between 101 and 200
;

---------------------------------------------------------------------------------------
| Id  | Operation                     | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------
|   0 | UPDATE STATEMENT              |       |   101 |  1212 |   812  (25)| 00:00:05 |
|   1 |  UPDATE                       | T1    |       |       |            |          |
|*  2 |   INDEX RANGE SCAN            | T1_I1 |   101 |  1212 |     2   (0)| 00:00:01 |
|   3 |   SORT AGGREGATE              |       |     1 |     7 |            |          |
|   4 |    FIRST ROW                  |       |     1 |     7 |     3   (0)| 00:00:01 |
|*  5 |     INDEX RANGE SCAN (MIN/MAX)| T2_I1 |     1 |     7 |     3   (0)| 00:00:01 |
|   6 |   SORT AGGREGATE              |       |     1 |     7 |            |          |
|   7 |    FIRST ROW                  |       |     1 |     7 |     3   (0)| 00:00:01 |
|*  8 |     INDEX RANGE SCAN (MIN/MAX)| T3_I1 |     1 |     7 |     3   (0)| 00:00:01 |
---------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("ID">=101 AND "ID"<=200)
   5 - access("T2"."ID"=:B1)
   8 - access("T3"."ID"=:B1)

So the cost of running each of the subqueries is 3 – there are two of them, and we expect to run each of the 101 times: for a total cost of 606. So how do we get to 812 as the total cost of the query ?

Further testing:

  • the cost of the plan for updating the two columns with constants is just 4.
  • rebuild the indexes with different values for pctfree to see how the cost changes
  • vary the number of columns updated by subquery
  • check the 10053 trace – for issues or presentation vs. rounding, particularly

Ultimately I decided that for each column updated by subquery the optimizer added 1 to the cost of accessing the table for each row; or, to view it another way, the optimizer used “sum(subquery costs + 1) * number of rows to be updated” so (4 + 4) * 101 + a little bit for the driving table access =  812. This doesn’t seem entirely reasonable – given that a cost is essentially equivalent to assuming that a single block visit is a disk read when we know that when we update multiple columns of the same row we need only read the block into memory at most once. As I said at the start, though this anomaly in costing probably doesn’t matter – there are no further steps to be taken after the update so there’s nothing the optimizer might do differently if the cost of the update had been calculated as 612 rather then 812.

Footnote:

If you want to play about with this query, here’s the code to create the tables – with one proviso, the plan above happens to be one I produced after rebuilding the indexes on t2 and t3 with pctfree 99


create table t1
as
with generator as (
	select  --+ materialize
		rownum id
	from dual
	connect by
		level <= 1e4
)
select
	rownum				id,	
	mod(rownum-1,100)		mod100,
	trunc((rownum - 1)/100)		trunc100,
	rownum				n1,
	rownum				n2,
	lpad(rownum,6,'0')		vc1,
	rpad('x',100)			padding
from
	generator
where
	rownum <= 10000
;

create table t2 as select * from t1;
create table t3 as select * from t1;

create index t1_i1 on t1(id);
create index t2_i1 on t2(id,mod100);
create index t3_i1 on t3(id,trunc100);

begin
	dbms_stats.gather_table_stats(user,'t1');
	dbms_stats.gather_table_stats(user,'t2');
	dbms_stats.gather_table_stats(user,'t3');
end;
/

May 1, 2014

Delete Histogram

Filed under: Histograms,Oracle,Statistics — Jonathan Lewis @ 1:27 pm BST May 1,2014

Here’s a note which I drafted in November 201then 0, and didn’t publish. I found it earlier on this morning while looking for another note I’d written about histograms so, even though it may not be something that people need so much these days, I thought: better late than never.

I’ve pointed out in the past that I’m not keen on seeing lots of histograms on a system and tend to delete them if I think they are not needed. Here’s an example of the type of code I use to delete a histogram.

declare

	srec			dbms_stats.statrec;
	m_distcnt		number;
	m_density		number;
	m_nullcnt		number;
	m_avgclen		number;

	n_array                 dbms_stats.numarray;

begin

	dbms_stats.get_column_stats(
		ownname		=> user,
		tabname		=> 't1',
		colname		=> 'n1',
		distcnt		=> m_distcnt,
		density		=> m_density,
		nullcnt		=> m_nullcnt,
		srec		=> srec,
		avgclen		=> m_avgclen
	); 

	srec.bkvals := null;
	srec.novals :=	dbms_stats.numarray(
				utl_raw.cast_to_number(srec.minval),
				utl_raw.cast_to_number(srec.maxval)
			);
	srec.epc := 2;
	dbms_stats.prepare_column_values(srec, srec.novals);

	m_density := 1/m_distcnt;

	dbms_stats.set_column_stats(
		ownname		=> user,
		tabname		=> 't1',
		colname		=> 'n1',
		distcnt		=> m_distcnt,
		density		=> m_density,
		nullcnt		=> m_nullcnt,
		srec		=> srec,
		avgclen		=> m_avgclen
	); 

exception
	when others then
		raise;		-- should handle div/0

end;
/

The code basically reads the column stats, resets the histogram figures to just the low and high values for the column, setting the endpoint-count to two, then adjusts the density to the standard for a column with no histogram. This specific example is for a numeric column.

Footnote: my preferred method of collecting statistics is to use method_opt => ‘for all columns size 1′ (i.e. no histograms) and then run scripts to create the histograms I want. This means that after any stats collection I need to run code that checks to see which tables have new stats, and then re-run any histogram code that I’ve written for that table.

To move from Oracle’s default histogram collection to this strategy, you could start by switching to method_opt => ‘for all columns size repeat’ (i.e. recreate existing histograms, don’t create new ones), then simply delete histograms as you find that you don’t need them, and introduce scripts to recreate the histograms that you do need. When you’ve finally got to the point where every histogram is scripted you can then switch to method_opt => ‘for all columns size 1′.

 Footnote 2: Since 2010 when I drafted this note Oracle 12c has launched, and the changes it has introduced for frequency and Top-N histograms means that I’m far less stringent in my demand that if a histogram is worth having it’s better to write code to create it. There’s a series of three articles about 12c histograms in particular at this link.

April 29, 2014

Bigfiles

Filed under: Bugs,Infrastructure,Oracle — Jonathan Lewis @ 2:12 pm BST Apr 29,2014

It’s always the combinations that catch you out.

Bigfile tablespaces have their uses – especially in big systems

Materialized views have their users – especially in big systems

There’s absolutely no reason why the two technologies should interfere with each other … until you find a bug !

Running an example, stripped to the bare minimum, and doing a couple of things that I personally don’t like doing, on 11.2.0.4:

drop materialized view t1_mv;
drop table t1;

create table t1 (
	id	number constraint t1_pk primary key,
	n1	number,
	n2	number
)
tablespace bigfile_ts
;

create materialized view log on t1
with
	rowid		-- ugh!
;

create materialized view t1_mv
refresh fast on demand
with rowid		-- ugh!
enable query rewrite
as
select
	id, n1
from
	t1
where
	n2 is not null
;

-- Another ugh coming up !
insert into t1 values(1,null,null);
update t1 set n1 = 2, n2 = 1 where id = 1;
commit;

begin
        dbms_mview.refresh(
                list           => 'T1_MV',
                method         => 'F',
                atomic_refresh  => true
        );
end;
/

Things I don’t like:

  • Doing anything that is strongly dependent on rowid – I’d rather do my materialized view stuff by primary key … but, as Mick Jagger once told us: “You can’t always get what you want”.
  • Processes that insert an “empty” row and then update it – it’s very inefficient, generates excessive undo and redo, and often leads to row migration

Most significantly I don’t like operations that have worked for ages suddenly crashing when someone adds a new piece of code to the system.

If you run the fragment about, the refresh will fail with the follow string of errors:


ERROR at line 1:
ORA-12008: error in materialized view refresh path
ORA-00001: unique constraint (TEST_USER.I_SNAP$_T1_MV) violated
ORA-06512: at "SYS.DBMS_SNAPSHOT", line 2809
ORA-06512: at "SYS.DBMS_SNAPSHOT", line 3025
ORA-06512: at "SYS.DBMS_SNAPSHOT", line 2994
ORA-06512: at line 2

When you try to find out what the unique constraint is, it’s based on a hidden column (M_ROW$$) of type rowid that has been added to the materialized view to emulate a primary key; and, for some reason, if your base table is in a bigfile tablespace and you insert a row and then update it before you’ve run a refresh then you can’t do a fast refresh again until you clean up the mess (e.g. with a complete refresh).

Note – if you complete all your inserts, then refresh, you can update a row as many times as you like without causing subsequent refresh problems – it looks like it’s just “insert it, update it, refresh” (or “insert it, delete it, refresh”) that has a problem because the only way to hit the issue is to get a correct rowid (from an update or delete) and an incorrect rowid (from an insert) for the same row in the log at the same time.

Workaround: if you move the base table to a smallfile tablespace you might just find you get what you need.

This shows up on MoS as bug 17603987, fixed in 12.2.

If you want to dig in a little bit more, you can start by comparing the contents of t1 and mlog$_t1 (the materialized view log table). Critically:

SQL> select rowid from t1;

ROWID
------------------
AAAXN2AAAAAAAJEAAA

1 row selected.

SQL> select m_row$$, dmltype$$ from mlog$_t1;

M_ROW$$                          D
-------------------------------- -
AAAXN2AQAAAAAJEAAA               I
AAAXN2AAAAAAAJEAAA               U

2 rows selected.

SQL> select rowidtochar(m_row$$) m_row$$, dmltype$$ from mlog$_t1;

M_ROW$$                          D
-------------------------------- -
AAAXN2AAAAAAAJEAAA               I
AAAXN2AAAAAAAJEAAA               U

2 rows selected.

If you compare the rowid from t1 with the m_row$$ column (which should hold the rowids of rows from t1) from mlog$_t1 you’ll notice that the insert and the update have been given different values for the source row. Somehow the logging code for the insert has generated the wrong rowid value; however, when you apply a rowidtochar() to the wrong value the return value is the right value – and the merge statement that transfers modified data from the source table into the materialized view uses the rowidtochar() function to join the materialized view log back to the source table, with the result that the same rowid can be inserted twice – except the attempt results in a “duplicate key” error.

 

April 25, 2014

Modify PK – 2

Filed under: Indexing,Oracle — Jonathan Lewis @ 5:48 pm BST Apr 25,2014

In an earlier posting I described how we can play games with primary key indexes in 12c because you can create multiple indexes on a table for the same (ordered) column list provided they have some differences in attributes and only one of them is visible at a time. But how, if you’re not on 12c, can you change a primary key index from unique to non-unique (or vice versa, as this question on OTN wants) without any down-time ?

Of course you can’t “change” the uniqueness of an index – that attribute embedded in the way that Oracle stores rowids for the index – but you can create a new index for the constraint; and you can’t avoid the little bit of “not quite down”-time that it takes to start and finish an online rebuild. But how do you get around the limitation of Oracle error: ORA-01408: such column list already indexed

Easy – though it requires you to do some work you would prefer to avoid – just work through an intermediate step using an index with an extra column. Here’s an example where I start with a non-unique index supporting the PK and end up with a unique index.


create table t1 as select * from all_objects where rownum <= 10000;

create index t1_pk on t1(object_id);

alter table t1 add constraint t1_pk primary key(object_id);

create index t1_i1 on t1(object_id, 0) online;

alter table t1 modify primary key using index t1_i1;

drop index t1_pk;

create unique index t1_pk on t1(object_id) online;

alter table t1 modify primary key using index t1_pk;

drop index t1_i1;

April 23, 2014

NL History

Filed under: Execution plans,Oracle — Jonathan Lewis @ 6:43 pm BST Apr 23,2014

Even the simplest things change – here’s a brief history of nested loop joins, starting from 8i, based on the following query (with some hints):

select
	t2.n1, t1.n2
from
	t2,t1
where
	t2.n2 = 45
and	t2.n1 = t1.n1
;

There’s an index to support the join from t2 to t1, and I’ve forced an (unsuitable) index scan for the predicate on t2.

Basic plan for 8i (8.1.7.4)

As reported by $ORACLE_HOME/rdbms/admin/utlxpls.sql.
Note the absence of a Predicate Information section.

Plan Table
--------------------------------------------------------------------------------
| Operation                 |  Name    |  Rows | Bytes|  Cost  | Pstart| Pstop |
--------------------------------------------------------------------------------
| SELECT STATEMENT          |          |   225 |    3K|   3038 |       |       |
|  NESTED LOOPS             |          |   225 |    3K|   3038 |       |       |
|   TABLE ACCESS BY INDEX RO|T2        |    15 |  120 |   3008 |       |       |
|    INDEX FULL SCAN        |T2_I1     |    15 |      |      8 |       |       |
|   TABLE ACCESS BY INDEX RO|T1        |     3K|   23K|      2 |       |       |
|    INDEX RANGE SCAN       |T1_I1     |     3K|      |      1 |       |       |
--------------------------------------------------------------------------------

Basic plan for 9i (9.2.0.8)

As reported by a call to a home-grown version of dbms_xplan.display_cursor() with statistics_level set to all.

Note the “prefetch” shape of the body of the plan but the inconsistency in the numbers reported for Rows, Bytes, and Cost seem to be reporting the “traditional” 8i values transposed to match the new arrangement of the operations. There’s also a little oddity in the A-rows column in line 2 which looks as if it is the sum of its children plus 1 when the size of the rowsource is (presumably) the 225 rowids used to access the table.

-----------------------------------------------------------------------------------------------------------
| Id  | Operation                     |  Name       | Rows  | Bytes | Cost  | Starts  | A-Rows  | Buffers |
-----------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT              |             |   225 |  3600 |  3038 |         |         |         |
|   1 |  TABLE ACCESS BY INDEX ROWID  | T1          |    15 |   120 |     2 |     1   |    225  |   3061  |
|   2 |   NESTED LOOPS                |             |   225 |  3600 |  3038 |     1   |    241  |   3051  |
|*  3 |    TABLE ACCESS BY INDEX ROWID| T2          |    15 |   120 |  3008 |     1   |     15  |   3017  |
|   4 |     INDEX FULL SCAN           | T2_I1       |  3000 |       |     8 |     1   |   3000  |     17  |
|*  5 |    INDEX RANGE SCAN           | T1_I1       |    15 |       |     1 |    15   |    225  |     34  |
-----------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   3 - filter("T2"."N2"=45)
   5 - access("T2"."N1"="T1"."N1")

Basic plan for 10g (10.2.0.5)

As reported by a call to dbms_xplan.display_cursor() with statistics_level set to all.

No change from 9i.

-------------------------------------------------------------------------------------------------
| Id  | Operation                     | Name  | Starts | E-Rows | A-Rows |   A-Time   | Buffers |
-------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT              |       |      0 |        |      0 |00:00:00.01 |       0 |
|   1 |  TABLE ACCESS BY INDEX ROWID  | T1    |      1 |     15 |    225 |00:00:00.03 |    3061 |
|   2 |   NESTED LOOPS                |       |      1 |    225 |    241 |00:00:00.03 |    3051 |
|*  3 |    TABLE ACCESS BY INDEX ROWID| T2    |      1 |     15 |     15 |00:00:00.03 |    3017 |
|   4 |     INDEX FULL SCAN           | T2_I1 |      1 |   3000 |   3000 |00:00:00.01 |      17 |
|*  5 |    INDEX RANGE SCAN           | T1_I1 |     15 |     15 |    225 |00:00:00.01 |      34 |
-------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   3 - filter("T2"."N2"=45)
   5 - access("T2"."N1"="T1"."N1")

Basic plan for 11g (11.2.0.4)

As reported by a call to dbms_xplan.display_cursor() with statisics_level set to all

Note how the nested loop has now turned into two NESTED LOOP operations – potentially opening the way for a complete decoupling of index access and table access. This has an interesting effect on the number of starts of the table access by rowid for t1, of course. The number of buffer gets for this operation looks surprisingly low (given that it started 225 times) but can be explained by the pattern of the data distribution – and cross-checked by looking at the “buffer is pinned count” statistic which accounts for most of the table visits.


-------------------------------------------------------------------------------------------------
| Id  | Operation                     | Name  | Starts | E-Rows | A-Rows |   A-Time   | Buffers |
-------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT              |       |      1 |        |    225 |00:00:00.01 |    3048 |
|   1 |  NESTED LOOPS                 |       |      1 |    225 |    225 |00:00:00.01 |    3048 |
|   2 |   NESTED LOOPS                |       |      1 |    225 |    225 |00:00:00.01 |    3038 |
|*  3 |    TABLE ACCESS BY INDEX ROWID| T2    |      1 |     15 |     15 |00:00:00.01 |    3013 |
|   4 |     INDEX FULL SCAN           | T2_I1 |      1 |   3000 |   3000 |00:00:00.01 |      13 |
|*  5 |    INDEX RANGE SCAN           | T1_I1 |     15 |     15 |    225 |00:00:00.01 |      25 |
|   6 |   TABLE ACCESS BY INDEX ROWID | T1    |    225 |     15 |    225 |00:00:00.01 |      10 |
-------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   3 - filter("T2"."N2"=45)
   5 - access("T2"."N1"="T1"."N1")

There is, however, a second possible plan for 11g. The one above is the “NLJ Batching” plan, but I could have hinted the “NLJ prefetch” strategy, which takes us back to the 9i execution plan (with a very small variation in buffer visits).

-------------------------------------------------------------------------------------------------
| Id  | Operation                     | Name  | Starts | E-Rows | A-Rows |   A-Time   | Buffers |
-------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT              |       |      0 |        |      0 |00:00:00.01 |       0 |
|   1 |  TABLE ACCESS BY INDEX ROWID  | T1    |      1 |     15 |    225 |00:00:00.01 |    3052 |
|   2 |   NESTED LOOPS                |       |      1 |    225 |    241 |00:00:00.01 |    3042 |
|*  3 |    TABLE ACCESS BY INDEX ROWID| T2    |      1 |     15 |     15 |00:00:00.01 |    3017 |
|   4 |     INDEX FULL SCAN           | T2_I1 |      1 |   3000 |   3000 |00:00:00.01 |      17 |
|*  5 |    INDEX RANGE SCAN           | T1_I1 |     15 |     15 |    225 |00:00:00.01 |      25 |
-------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   3 - filter("T2"."N2"=45)
   5 - access("T2"."N1"="T1"."N1")

Base plan for 12c (12.1.0.1)

As reported by a call to dbms_xplan.display_cursor() with statistics_level set to all.
Note that the table access to t2 in line 3 is described as “batched” (a feature that can be disabled by the /*+ no_batch_table_access_by_rowid(alias) */  hint) otherwise the plan matches the 11g plan.


---------------------------------------------------------------------------------------------------------
| Id  | Operation                             | Name  | Starts | E-Rows | A-Rows |   A-Time   | Buffers |
---------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                      |       |      1 |        |    225 |00:00:00.01 |    3052 |
|   1 |  NESTED LOOPS                         |       |      1 |        |    225 |00:00:00.01 |    3052 |
|   2 |   NESTED LOOPS                        |       |      1 |    225 |    225 |00:00:00.01 |    3042 |
|*  3 |    TABLE ACCESS BY INDEX ROWID BATCHED| T2    |      1 |     15 |     15 |00:00:00.01 |    3017 |
|   4 |     INDEX FULL SCAN                   | T2_I1 |      1 |   3000 |   3000 |00:00:00.01 |      17 |
|*  5 |    INDEX RANGE SCAN                   | T1_I1 |     15 |     15 |    225 |00:00:00.01 |      25 |
|   6 |   TABLE ACCESS BY INDEX ROWID         | T1    |    225 |     15 |    225 |00:00:00.01 |      10 |
---------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   3 - filter("T2"."N2"=45)
   5 - access("T2"."N1"="T1"."N1")

Of course 12c also has the “prefetch” version of the plan available; and again “batched” access appears – for both tables in this case – and again the feature can be disabled individually by hints addressed at the tables:


---------------------------------------------------------------------------------------------------------
| Id  | Operation                             | Name  | Starts | E-Rows | A-Rows |   A-Time   | Buffers |
---------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                      |       |      0 |        |      0 |00:00:00.01 |       0 |
|   1 |  TABLE ACCESS BY INDEX ROWID BATCHED  | T1    |      1 |     15 |    225 |00:00:00.01 |    3052 |
|   2 |   NESTED LOOPS                        |       |      1 |    225 |    225 |00:00:00.01 |    3042 |
|*  3 |    TABLE ACCESS BY INDEX ROWID BATCHED| T2    |      1 |     15 |     15 |00:00:00.01 |    3017 |
|   4 |     INDEX FULL SCAN                   | T2_I1 |      1 |   3000 |   3000 |00:00:00.01 |      17 |
|*  5 |    INDEX RANGE SCAN                   | T1_I1 |     15 |     15 |    225 |00:00:00.01 |      25 |
---------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   3 - filter("T2"."N2"=45)
   5 - access("T2"."N1"="T1"."N1")

In these examples the difference in work done by the different variations and versions is negligible, but there may be cases where the pattern of data distribution may change the pattern of logical I/Os and buffer pins – which may affect the physical I/O. In this light it’s interesting to note the hint /*+ cluster_by_rowid(alias) */ that was introduced in 11.2.0.4 but disappeared by 12c [Ed: wrong, it wasn't listed in v$sql_hints in the beta, but is in the production version] changing the 11g plan as follows:


----------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                     | Name  | Starts | E-Rows | A-Rows |   A-Time   | Buffers |  OMem |  1Mem | Used-Mem |
----------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT              |       |      0 |        |      0 |00:00:00.01 |       0 |       |       |          |
|   1 |  TABLE ACCESS BY INDEX ROWID  | T1    |      1 |     15 |    225 |00:00:00.01 |     134 |       |       |          |
|   2 |   NESTED LOOPS                |       |      1 |    225 |    241 |00:00:00.01 |     124 |       |       |          |
|*  3 |    TABLE ACCESS BY INDEX ROWID| T2    |      1 |     15 |     15 |00:00:00.01 |      99 |       |       |          |
|   4 |     SORT CLUSTER BY ROWID     |       |      1 |   3000 |   3000 |00:00:00.01 |       8 |   142K|   142K|  126K (0)|
|   5 |      INDEX FULL SCAN          | T2_I1 |      1 |   3000 |   3000 |00:00:00.01 |       8 |       |       |          |
|*  6 |    INDEX RANGE SCAN           | T1_I1 |     15 |     15 |    225 |00:00:00.01 |      25 |       |       |          |
----------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   3 - filter("T2"."N2"=45)
   6 - access("T2"."N1"="T1"."N1")

Note the effect appearing at line 4 – and the extraordinary effect this has on the buffer visits (so significant that I did a follow-up check on v$mystat to see if the figures were consistent). This type of rowid sorting is, of course, an important fix for an Exadata issue I described some time ago, and I had assumed that the “batched” concept in the 12c plan was in some way enabling it – although the 12c rowsource execution stats don’t seem to bear that idea out.

Footnote:

You may also want to read the following note by Timur Akhmadeev of Pythian on the 12c batched rowid.

April 19, 2014

Coincindences

Filed under: Bugs,Oracle — Jonathan Lewis @ 9:22 am BST Apr 19,2014

I had another of those odd timing events today that make me think that Larry Ellison has access to a time machine. I found (yet another example of a) bug that had been reported on MoS just a few days before it appeared on an instance I was running. How is it possible that someone keeps doing things that I’m doing, but just a few days before I do them !

For no good reason I happened to browse through a load of background trace files on an 11.2.0.4 instance and found the following in an “m000″ file:

*** SERVICE NAME:(SYS$BACKGROUND) 2014-04-19 08:55:20.617
*** MODULE NAME:(MMON_SLAVE) 2014-04-19 08:55:20.617
*** ACTION NAME:(Auto-Purge Slave Action) 2014-04-19 08:55:20.617

*** KEWROCISTMTEXEC - encountered error: (ORA-06525: Length Mismatch for CHAR or RAW data
ORA-06512: at "SYS.DBMS_STATS", line 29022
ORA-06512: at line 1
)
  *** SQLSTR: total-len=93, dump-len=93,
      STR={begin dbms_stats.copy_table_stats('SYS', :bind1, :bind2, :bind3, flags=>1, force=>TRUE); end;}

Before trying to track down what had gone wrong I did a quick check on MoS, searching for “copy_table_stats” and “29022” and found bug 17079301 – fixed in 12.2, and 12.1.0.2, with a patch for 12.1.0.1 (and some back-ports for 11.2.0.4). The description of the bug in the note was basically: “it happens”.

I may get around to looking more closely at what’s gone wrong but as an initial though I’m guessing that, even though the action name is “auto-purge slave action”, this may be something to do with adding a partition to some of the AWR history tables and rolling stats forward – so at some point I’ll probably start by checking for partitions with missing stats in the SYS schema.

The bug note, by the way, was published (last updated, on second thoughts) on 14th April 2014 – just 5 days before I first happened to spot the occurrence of the bug.

April 18, 2014

Bitmap loading

Filed under: bitmaps,Indexing,Oracle — Jonathan Lewis @ 12:43 pm BST Apr 18,2014

Everyone “knows” that bitmap indexes are a disaster (compared to B-tree indexes) when it comes to DML. But at an event I spoke at recently someone made the point that they had observed that their data loading operations were faster when the table being loaded had bitmap indexes on it than when it had the equivalent B-tree indexes in place.

There’s a good reason why this can be the case.  No prizes for working out what it is – and I’ll supply an answer in a couple of days time.  (Hint – it may also be the reason why Oracle doesn’t use bitmap indexes to avoid the “foreign key locking” problem).

Answer

As Martin (comment 3) points out, there’s a lot of interesting information in the statistics once you start doing the experiment. So here’s some demonstration code, first we create a table with one of two possible indexes:


create table t1
nologging
as
with generator as (
	select	--+ materialize
		rownum id
	from dual
	connect by
		level <= 1e4
)
select
	rownum			id,
	mod(rownum,1000)	btree_col,
	mod(rownum,1000)	bitmap_col,
	rpad('x',100)		padding
from
	generator	v1,
	generator	v2
where
	rownum <= 1e6
;

begin
	dbms_stats.gather_table_stats(
		ownname		 => user,
		tabname		 =>'T1',
		method_opt	 => 'for all columns size 1'
	);
end;
/

create        index t1_btree on t1(btree_col) nologging;
-- create bitmap index t1_bitmap on t1(bitmap_col) nologging;

You’ll note that the two columns I’m going to build indexes on hold the same data in the same order – and it’s an order with maximum scatter because of the mod() function I’ve used to create it. It’s also very repetitive data, having 1000 distinct values over 1,000,0000 rows. With the data and (one of) the indexes in place I’m going to insert another 10,000 rows:

execute snap_my_stats.start_snap

insert /* append */ into t1
with generator as (
	select	--+ materialize
		rownum id
	from dual
	connect by
		level <= 1e4
)
select
	1e6 + rownum		id,
	mod(rownum,1000)	btree_col,
	mod(rownum,1000)	bitmap_col,
	rpad('x',100)		padding
from
	generator
;

execute snap_my_stats.end_snap

You’ll note that I’ve got an incomplete append hint in the code – I’ve tested the mechanism about eight different ways, and left the append in as a convenience, but the results I want to talk about (first) are with the hint disabled so that the insert is a standard insert. The snap_my_stats calls are my standard mechanism to capture deltas of my session statistics (v$mystat) – one day I’ll probably get around to using Tanel’s snapper routine everywhere – and here are some of the key results produced in the two tests:


11.2.0.4 with btree
===================
Name                                                                     Value
----                                                                     -----
session logical reads                                                   31,403
DB time                                                                     64
db block gets                                                           31,195
consistent gets                                                            208
db block changes                                                        21,511
redo entries                                                            10,873
redo size                                                            3,591,820
undo change vector size                                                897,608
sorts (memory)                                                               2
sorts (rows)                                                                 1

11.2.0.4 with bitmap
====================
Name                                                                     Value
----                                                                     -----
session logical reads                                                   13,204
DB time                                                                     42
db block gets                                                            8,001
consistent gets                                                          5,203
db block changes                                                         5,911
redo entries                                                             2,880
redo size                                                            4,955,896
undo change vector size                                              3,269,932
sorts (memory)                                                               3
sorts (rows)                                                            10,001

As Martin has pointed out, there are a number of statistics that show large differences between the B-tree and bitmap approaches, but the one he didn’t mention was the key: sorts (rows). What is this telling us, and why could it matter so much ? If the B-tree index exists when the insert takes place Oracle locates the correct place for the new index entry as each row is inserted which is why you end up with so many redo entries, block gets and block changes; if the bitmap index exists, Oracle postpones index maintenance until the table insert is complete, but accumulates the keys and rowids as it goes then sorts them to optimize the rowid to bitmap conversion and walks the index in order updating each modified key just once.

The performance consequences of the two different strategies depends on the number of indexes affected, the number of rows modified, the typical number of rows per key value, and the ordering of the new data as it arrives; but it’s possible that the most significant impact could come from ordering.  As each row arrives, the relevant B-tree indexes are modified – but if you’re unlucky, or have too many indexes on the table, then each index maintenance operation could result in a random disk I/O to read the necessary block (how many times have you seen complaints like: “we’re only inserting 2M rows but it’s taking 45 minutes and we’re always waiting on db file sequential reads”). If Oracle sorts the index entries before doing the updates it minimises the random I/O because it need only update each index leaf block once and doesn’t run the risk of re-reading many leaf blocks many times for a big insert.

Further Observations

The delayed maintenance for bitmap indexes (probably) explains why they aren’t used to avoid the foreign key locking problem.  On a large insert, the table data will be arriving, the b-tree indexes will be maintained in real time, but a new child row of some parent won’t appear in the bitmap index until the entire insert is complete – so another session could delete the parent of a row that exists, is not yet committed, but is not yet visible. Try working out a generic strategy to deal with that type of problem.

It’s worth noting, of course, that when you add the /*+ append */ hint to the insert then Oracle uses exactly the same optimization strategy for B-trees as it does for bitmaps – i.e. postpone the index maintenance, remember all the keys and rowids, then sort and bulk insert them.  And when you’ve remembered that, you may also remember that the hint is (has to be) ignored if there are any enabled foreign key constraints on the table. The argument for why the hint has to be ignored and why bitmap indexes don’t avoid the locking problem is (probably) the same argument.

You may also recall, by the way, that when you have B-tree indexes on a table you can choose the optimal update or delete strategy by selecting a tablescan or index range scan as the execution path.  If you update or delete through an index range scan the same “delayed maintenance” trick is used to optimize the index updates … except for any indexes being used to support foreign key constraints, and they are maintained row by row.

In passing, while checking the results for this note I re-ran some tests that I had originally done in 2006 and added one more test that I hadn’t considered at the time; as a result I can also point out that index will see delayed maintenance if you drive the update or delete with an index() hint, but not if you drive it with an index_desc() hint.

 

April 4, 2014

NVL() change

Filed under: CBO,Oracle,Troubleshooting,Upgrades — Jonathan Lewis @ 6:10 pm BST Apr 4,2014

One of the problems of functions is that the optimizer generally doesn’t have any idea on how a predicate based on function(col) might affect the cardinality. However,  the optimizer group are constantly refining the algorithms to cover an increasing number of special cases more accurately. This is a good thing, of course – but it does mean that you might be unlucky on an upgrade where a better cardinality estimate leads to a less efficient execution plan. Consider for example the simple query (where d1 is column of type date):

select	*
from	t1
where	nvl(d1,to_date('01-01-1900','dd-mm-yyyy')) < sysdate

Now, there are many cases in many versions of Oracle, where the optimizer will appear to calculate the cardinality of

nvl(columnX,{constant}) operator {constant}

as if it were:

columnX is null or columnX operator {constant}

Unfortunately this doesn’t seem to be one of them – until you get to 11.2.something. Here’s a little demonstration code:

create table t1
as
with generator as (
	select	--+ materialize
		rownum id
	from dual
	connect by
		level <= 1e4
)
select
	case
		when rownum > 100 then null else sysdate - rownum
	end	d1
from
	generator	v1,
	generator	v2
where
	rownum <= 50000
;

begin
	dbms_stats.gather_table_stats(
		ownname		 => user,
		tabname		 =>'T1',
		method_opt	 => 'for all columns size 1'
	);
end;
/

set autotrace traceonly explain

prompt	query with NVL

select	*
from	t1
where	nvl(d1,to_date('01-01-1900','dd-mm-yyyy')) < sysdate
;

prompt	query with OR clause

select	*
from	t1
where	d1 is null or d1 < sysdate
;

If you run this code in 11.1.0.7 you get the following – with numeric variations for cost (which I’m interested not in at the moment):


query with NVL
==============
--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      |  2500 |  2500 |    18  (39)| 00:00:01 |
|*  1 |  TABLE ACCESS FULL| T1   |  2500 |  2500 |    18  (39)| 00:00:01 |
--------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter(NVL("D1",TO_DATE(' 1900-01-01 00:00:00', 'syyyy-mm-dd
              hh24:mi:ss'))<SYSDATE@!)

query with OR clause
====================
--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      | 50000 | 50000 |    13  (16)| 00:00:01 |
|*  1 |  TABLE ACCESS FULL| T1   | 50000 | 50000 |    13  (16)| 00:00:01 |
--------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter("D1" IS NULL OR "D1"<SYSDATE@!)

Take note particularly of the difference in the estimated cardinality for the tablescans.

When you upgrade to 11.2.0.4 (possibly earlier – though there are some nvl() related patches that appeared only in 11.2.0.4), you get this:


query with NVL
==============
--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      | 50000 | 50000 |    18  (39)| 00:00:01 |
|*  1 |  TABLE ACCESS FULL| T1   | 50000 | 50000 |    18  (39)| 00:00:01 |
--------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter(NVL("D1",TO_DATE(' 1900-01-01 00:00:00', 'syyyy-mm-dd
              hh24:mi:ss'))<SYSDATE@!)

query with OR clause
====================
--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      | 49900 | 49900 |    13  (16)| 00:00:01 |
|*  1 |  TABLE ACCESS FULL| T1   | 49900 | 49900 |    13  (16)| 00:00:01 |
--------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter("D1" IS NULL OR "D1"<SYSDATE@!)

As you can see the estimate for the “NVL()” example is now correct – which means vastly different from the estimate in 11.1.0.7 which seemed to be using the standard “5% for range-based predicate on function(col)”.

It’s interesting to note that a (relatively) small error has crept in to the “OR” example – interestingly the size of the error is exactly the number of rows where d1 is not null (which looks like enough of a coincidence to be a bug – but maybe there’s a good rationale for it)

Conclusion

Yet again, a simple upgrade has the capacity to make a dramatic change to a cardinality estimate – which could mean a significant change to an execution plan and major change in performance. If you’ve read this note, though, you may recognise the pattern that is the driving feature behind the problem.

Footnote:

If you have access to versions 11.2.0.1 through 11.2.0.3 and find that the test data produces different cardinalities please publish the results in the comments – it would be nice to know exactly when this change appears.  (You need only show the body of the execution plans  with labels, not the entire output of the script).

 

April 3, 2014

Cache anomaly

Filed under: Bugs,Oracle,Performance — Jonathan Lewis @ 1:27 pm BST Apr 3,2014

Just a quick heads-up for anyone who likes to play around with the Keep and Recycle caches.

In 11g Oracle introduced the option for serial direct path reads for tablescans on tables that was sufficiently large – which meant more than the small_table_threshold – provided the table wasn’t already sufficient well cached.  (The rules mean that the choice of mechanism can appear to be a little random in the production environment for tables that are near the threshold size – but if you try testing by doing “alter system flush buffer_cache” you find that you always get direct path reads in testing.)

I’ve just discovered a little oddity about this, though.  I have a table of about 50MB which is comfortably over the threshold for direct path reads. But if I create a KEEP cache (db_keep_cache_size) that is a little larger than the table and then assign the table to the KEEP cache (alter table xxx storage(buffer_pool keep)) then 11.2.0.4 stops doing direct path reads, and caches the table.

Now this doesn’t seem unreasonable – if you’ve assigned an object to the KEEP cache you probably want it (or once wanted it) to be kept in cache as much as possible; so using the KEEP cache if it’s defined and specified makes sense. The reason I mention this as an oddity, though, is that it doesn’t reproduce in 11.1.0.7.

I think I saw a bug note about this combination a few months ago- I was looking for something else at the time and, almost inevitably, I can’t find it when I want it – but I don’t remember whether it was the 11.1 or 11.2 behaviour that was deemed to be correct.

 Update

See comments 1 and 2 below.  I’ve written about this previously, and the caching bechaviour is the correct behaviour. The patch is in 11.2.0.2 and backports are available for 11.1.0.7 and 11.2.0.1. The patch ensures that the table will be read into the cache if it is smaller than the db_keep_cache_size.  (Although we might want to check – see Tanel’s notes – whether this is based on the high water mark recorded in the segment header or on the optimizer stats for the table; and I suppose it might be worth checking that the same feature applies to index fast full scans). From the MoS document:

With the bug fix applied, any object in the KEEP buffer pool, whose size is less than DB_KEEP_CACHE_SIZE, is considered as a small or medium sized object. This will cache the read blocks and avoid subsequent direct read for these objects.

 

 

April 2, 2014

Easy – Oops.

Filed under: Bugs,Function based indexes,Indexing,Oracle — Jonathan Lewis @ 7:47 pm BST Apr 2,2014

A question came up on OTN today asking for suggestions on how to enforce uniqueness on a pair of columns only when the second column was not null. There’s an easy and obvious solution – but I decided to clone the OP’s example and check that I’d typed my definition up before posting it; and the result came as a bit of a surprise. Here’s a demo script (not using the OP’s table):


create table t1  
(  
	col1	int not null,
	col2	varchar2(1)
);  

create unique index t1_i1 on t1( 
--	case col2 when null then cast(null as int) else col1 end,
--	case when col2 is null then cast(null as int) else col1 end,
	case when col2 is not null then col1 end,
	col2
)
;

insert into t1 values(1,null);
insert into t1 values(1,null);
insert into t1 values(1,'x');
insert into t1 values(1,'y');
insert into t1 values(1,'y');

commit;

column ind1_is   format a5
column ind1_when format 9999

set null N/A

select
	case when col2 is null then cast (null as int) else col1 end	ind1_is,
	case col2 when null then cast (null as int)  else col1 end	ind1_when
from 
	t1
;

The strategy is simple, you create a unique function-based index with two columns; the first column of the index id defined to show the first column of the table if the second column of the table is not null, the second column of the index is simply the second column of the table. So if the second column of the table is null, both columns in the index are null and there is no entry in the index; but if the second column of the table is not null then the index copies both columns from the table and a uniqueness test applies.

Based on the requirement and definition you would expect the first 4 of my insert statements to succeed and the last one to fail. The index will then have two entries, corresponding to my 3rd and 4th insertions.

I’ve actually shown three ways to use the case statement to produce the first column of the index. The last version is the cleanest, but the first option is the one I first thought of – it’s virtually a literal translation the original requirement. The trouble is, with my first definition the index acquired an entry it should not have got, and the second insert raised a “duplicate key” error; the error didn’t appear when I switched the syntax of the case statement to the second version.

That’s why the closing query of the demo is there – when you run it the two values reported should be the same as each other for all four rows in the table – but they’re not. This is what I got on 11.2.0.4:


IND1_IS IND1_WHEN
------- ---------
N/A             1
N/A             1
      1         1
      1         1


I’m glad I did a quick test before suggesting my original answer.

Anyone who has other versions of Oracle available is welcome to repeat the test and report back which versions they finding working correctly (or not).

Update

It’s not a bug (see note 2 below from Jason Bucata), it’s expected behaviour.

 

« Previous PageNext Page »

Theme: Rubric. Get a free blog at WordPress.com

Follow

Get every new post delivered to your Inbox.

Join 4,084 other followers