Bug fixed by 19.11.0.0 – but new defect found (jump to update)
Here’s a worrying bug that showed up a couple of days ago on the Oracle-L mailing list. It’s a problem that I’ve tested against 12.2.0.1 and 19.3.0.0 – it may be present on earlier versions of Oracle. One of the nastiest things about it is that you might not notice it until you get an “out of space” error from the operating system. You won’t get any wrong results from it, but it may well be adding an undesirable performance overhead.
Basically it seems that (under some circumstances, at least) Oracle is setting the “block guess” component of the secondary index on Index Organized Tables (IOTs) to point to blocks in the overflow segment instead of blocks in the primary key segment. As a result, when you execute a query that accesses the IOT through the secondary index and has to do reads from disc to satisfy the query – your session goes through the following steps:
- Identify index entry from secondary index – acquire “block guess”
- Read indicated block and discover the object number on the block is wrong, and the block type is wrong
- Write a (silent) ORA-01410 error and do a block dump into the trace file
- Use the “logical rowid” from the secondary index (i.e. the stored primary key value) to access the primary key index by key value
So your query runs to completion and you get the right result because Oracle eventually gets there using the primary key component stored in the secondary index, but it always starts with the guess[see footnote] and for every block you read into the cache because of the guess you get a dump to the trace file.
Here’s a little code to demonstrate. The problem with this code is that everything appears to works perfectly, you have to be able to find the trace file for your session to see what’s gone wrong. First we create some data – this code is largely copied from the original posting on Oracle-L, with a few minor changes:
rem rem Script: iot_bug_12c.sql rem Author: Jonathan Lewis rem Dated: Nov 2019 rem Purpose: rem rem Last tested rem 19.3.0.0 rem 12.2.0.1 rem rem Notes rem The original author had tested on 19.5.0.0 to get the same effect, see: rem //www.freelists.org/post/oracle-l/IOT-cannot-get-valid-consensus-bug-or-unexplained-behavior rem drop table randomload purge; create table randomload( roll number, name varchar2(40), mark1 number, mark2 number, mark3 number, mark4 number, mark5 number, mark6 number, primary key (roll) ) organization index including mark3 overflow ; create index randomload_idx on randomload(mark6); insert into randomload select rownum, dbms_random.string(0,40) name, round(dbms_random.value(0,100)), round(dbms_random.value(0,100)), round(dbms_random.value(0,100)), round(dbms_random.value(0,100)), round(dbms_random.value(0,100)), round(dbms_random.value(0,10000)) from dual connect by level < 1e5 -- > comment to avoid wordpress format issue ; commit; exec dbms_stats.gather_table_stats(null,'randomload', cascade=>true); prompt ================================================== prompt pct_direct_access should be 100 for randomload_idx prompt ================================================== select table_name, index_name, num_rows, pct_direct_access, iot_redundant_pkey_elim from user_indexes where table_name = 'RANDOMLOAD' ;
It should take just a few seconds to build the data set and you should check that the pct_direct_access is 100 for the index called randomload_idx.
The next step is to run a query that will do an index range scan on the secondary index.
column mark6 new_value m_6 select mark6, count(*) from randomload group by mark6 order by count(*) fetch first 5 rows only ; alter system flush buffer_cache; alter session set events '10046 trace name context forever, level 8'; set serveroutput off select avg(mark3) from randomload where mark6 = &m_6 ; select * from table(dbms_xplan.display_cursor); alter session set events '10046 trace name context off'; set serveroutput on
I’ve started by selecting one of the least frequently occurring values of m_6 (a column I know to be in the overflow); then I’ve flushed the buffer cache so that any access I make to the data will have to start with disk reads (the original poster suggested restarting the database at this point, but that’s not necessary).
Then I’ve enabled sql_trace to show wait states (to capture details of what blocks were read and which object they belong to), and I’ve run a query for m_3 (a column that is in the primary key (TOP) segment of the IOT) and pulled its execution plan from memory to check that the query did use a range scan of the secondary index. Here’s the plan:
---------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | ---------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | | | 11 (100)| | | 1 | SORT AGGREGATE | | 1 | 7 | | | |* 2 | INDEX UNIQUE SCAN| SYS_IOT_TOP_77298 | 10 | 70 | 11 (0)| 00:00:01 | |* 3 | INDEX RANGE SCAN| RANDOMLOAD_IDX | 10 | | 1 (0)| 00:00:01 | ---------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 2 - access("MARK6"=1316) 3 - access("MARK6"=1316)
As you can see the plan shows what we are hoping to see – an index range scan of the secondary index that let’s it follow up with a unique scan of the primary key segment. It’s just a little odd that the access predicate reported for operation 2 (unique scan of IOT_TOP) suggests that the access is on a column that isn’t in the primary key and isn’t even in the IOT_TOP segment.
So the query works and gives the right answer. But what do we find in the trace directory? If you’re running 12c (possibly only relevant to 12.2), each time the error occurs the following pattern of information will be written to the alert log (it didn’t appear in 19.3)
ORCL(3):Hex dump of (file 22, block 16747) in trace file /u01/app/oracle/diag/rdbms/orcl12c/orcl12c/trace/orcl12c_ora_7888.trc ORCL(3): ORCL(3):Corrupt block relative dba: 0x0580416b (file 22, block 16747) ORCL(3):Bad header found during multiblock buffer read (logical check) ORCL(3):Data in bad block: ORCL(3): type: 6 format: 2 rdba: 0x0580416b ORCL(3): last change scn: 0x0000.0b86.0e36484c seq: 0x1 flg: 0x06 ORCL(3): spare3: 0x0 ORCL(3): consistency value in tail: 0x484c0601 ORCL(3): check value in block header: 0x4408 ORCL(3): computed block checksum: 0x0 ORCL(3):
And the following pattern of information is written to the trace file [Update: a follow-up test on 11.2.0.4 suggests that the basic “wrong block address” error also happens in that version of Oracle, but doesn’t result in a dump to the trace file]:
kcbzibmlt:: encounter logical error ORA-1410, try re-reading from other mirror.. cursor valid? 1 makecr 0 line 15461 ds_blk (22, 16747) bh_blk (22, 16747) kcbds 0x7ff1ca8c0b30: pdb 3, tsn 8, rdba 0x0580416b, afn 22, objd 135348, cls 1, tidflg 0x8 0x80 0x0 dsflg 0x108000, dsflg2 0x0, lobid 0x0:0, cnt 0, addr 0x0, exf 0x10a60af0, dx 0x0, ctx 0 whr: 'qeilwh03: qeilbk' env [0x7ff1ca8e3e54]: (scn: 0x00000b860e364893 xid: 0x0000.000.00000000 uba: 0x00000000.0000.00 statement num=0 parent xid: 0x0000.000.00000000 st-scn: 0x0000000000000000 hi-scn: 0x0000000000000000 ma-scn: 0x00000b860e364879 flg: 0x00000660) BH (0xb1fd6278) file#: 22 rdba: 0x0580416b (22/16747) class: 1 ba: 0xb1c34000 set: 10 pool: 3 bsz: 8192 bsi: 0 sflg: 2 pwc: 763,14 dbwrid: 0 obj: 135348 objn: 135348 tsn: [3/8] afn: 22 hint: f hash: [0x9eff0528,0x77cff808] lru: [0xb1fd2578,0x9ff84658] ckptq: [NULL] fileq: [NULL] objq: [0xb6f654c0,0x9ff84680] objaq: [0xb6f654d0,0x9ff84690] use: [0x77b78128,0x77b78128] wait: [NULL] st: READING md: EXCL tch: 0 flags: only_sequential_access Printing buffer operation history (latest change first): cnt: 5 01. sid:10 L122:zgb:set:st 02. sid:10 L830:olq1:clr:WRT+CKT 03. sid:10 L951:zgb:lnk:objq 04. sid:10 L372:zgb:set:MEXCL 05. sid:10 L123:zgb:no:FEN 06. sid:10 L083:zgb:ent:fn 07. sid:08 L192:kcbbic2:bic:FBD 08. sid:08 L191:kcbbic2:bic:FBW 09. sid:08 L604:bic2:bis:REU 10. sid:08 L190:kcbbic2:bic:FAW 11. sid:08 L602:bic1_int:bis:FWC 12. sid:08 L822:bic1_int:ent:rtn 13. sid:08 L832:oswmqbg1:clr:WRT 14. sid:08 L930:kubc:sw:mq 15. sid:08 L913:bxsv:sw:objq 16. sid:08 L608:bxsv:bis:FBW Hex dump of (file 22, block 16747) ... etc. Corrupt block relative dba: 0x0580416b (file 22, block 16747) Bad header found during multiblock buffer read (logical check) Data in bad block: type: 6 format: 2 rdba: 0x0580416b last change scn: 0x0000.0b86.0e36484c seq: 0x1 flg: 0x06 spare3: 0x0 consistency value in tail: 0x484c0601 check value in block header: 0x4408 computed block checksum: 0x0 TRCMIR:kcf_reread :start: 16747:0:/u01/app/oracle/oradata/orcl12c/orcl/test_8k_assm.dbf TRCMIR:kcf_reread :done : 16747:0:/u01/app/oracle/oradata/orcl12c/orcl/test_8k_assm.dbf
The threat, of course, is the bit I’ve removed and replaced with just “etc.”: it’s a complete block dump (raw and symbolic) which in my example was somthing like 500 lines and 35KB in size.
It’s not immediately obvious exactly what’s going on and why but the 10046 trace helps a little. From another run of the test (on 19.3.0.0) I got the following combination of details – which is an extract showing the bit of the wait state trace leading into the start of the first block dump:
WAIT #140478118667016: nam='db file scattered read' ela= 108 file#=13 block#=256 blocks=32 obj#=77313 tim=103574529210 WAIT #140478118667016: nam='db file scattered read' ela= 2236 file#=13 block#=640 blocks=32 obj#=77313 tim=103574531549 WAIT #140478118667016: nam='db file scattered read' ela= 534 file#=13 block#=212 blocks=32 obj#=77312 tim=103574532257 kcbzibmlt: encounter logical error ORA-1410, try re-reading from other mirror.. cursor valid? 1 warm_up abort 0 makecr 0 line 16082 ds_blk (13, 212) bh_blk (13, 212)
Object 77313 is the secondary index, object 77312 is the primary key index (IOT_TOP). It may seem a little odd that Oracle is using db file scattered reads of 32 blocks to read the indexes but this is a side effect of flushing the buffer – Oracle may decide to pre-fetch many extra blocks of an object to “warmup” the cache (statistic “physical reads prefetch warmup”) just after an instance startup or a flush of the buffer cache. The thing I want to check, though, is what’s wrong with the blocks that Oracle read from object 77312:
alter system dump datafile 13 block min 212 block max 243; BH (0xc8f68e68) file#: 13 rdba: 0x034000d4 (13/212) class: 1 ba: 0xc8266000 set: 10 pool: 3 bsz: 8192 bsi: 0 sflg: 2 pwc: 0,15 dbwrid: 0 obj: 77311 objn: 77311 tsn: [3/6] afn: 13 hint: f BH (0xa7fd6c38) file#: 13 rdba: 0x034000d4 (13/212) class: 1 ba: 0xa7c2a000 set: 12 pool: 3 bsz: 8192 bsi: 0 sflg: 2 pwc: 0,15 dbwrid: 0 obj: 77311 objn: 77311 tsn: [3/6] afn: 13 hint: f BH (0xa5f75780) file#: 13 rdba: 0x034000d5 (13/213) class: 0 ba: 0xa5384000 set: 11 pool: 3 bsz: 8192 bsi: 0 sflg: 2 pwc: 0,15 dbwrid: 0 obj: 77311 objn: 77311 tsn: [3/6] afn: 13 hint: f BH (0xdafe9220) file#: 13 rdba: 0x034000d5 (13/213) class: 1 ba: 0xdadcc000 set: 9 pool: 3 bsz: 8192 bsi: 0 sflg: 2 pwc: 0,15 dbwrid: 0 obj: 77311 objn: 77311 tsn: [3/6] afn: 13 hint: f ...
I’ve reported the first few lines of the symbolic dump for the first few blocks of the resulting trace file. Look at the third line of each group of BH lines: it’s reporting object 77311 (the overflow segment), not 77312 (the IOT_TOP segment). And every single block reported in the db file scattered read of 32 blocks for object 77312 reports itself, when dumped, as being part of object 77311. And that’s possibly the immediate cause of the ORA-01410.
We can take the investigation a little further by dumping a leaf block or two from the secondary index.
alter session set events 'immediate trace name treedump level 77313'; ----- begin tree dump branch: 0x3400104 54526212 (0: nrow: 542, level: 1) leaf: 0x340010d 54526221 (-1: row:278.278 avs:2479) leaf: 0x340075e 54527838 (0: row:132.132 avs:5372) leaf: 0x34005fb 54527483 (1: row:41.41 avs:7185) alter system dump datafile 13 block 1886 -- leaf: 0x340075e BH (0xd5f5d090) file#: 13 rdba: 0x0340075e (13/1886) class: 1 ba: 0xd5158000 set: 9 pool: 3 bsz: 8192 bsi: 0 sflg: 2 pwc: 0,15 dbwrid: 0 obj: 77313 objn: 77313 tsn: [3/6] afn: 13 hint: f ... row#6[5796] flag: K------, lock: 0, len=18 col 0; len 2; (2): c1 1d col 1; len 4; (4): c3 07 41 5c tl: 8 fb: --H-FL-- lb: 0x0 cc: 1 col 0: [ 4] 03 40 05 7c
I’ve done a treedump of the secondary index and picked a leaf block address from the treedump and dumped that leaf block, and from that leaf block I’ve extracted one index entry to show you the three components: the key value (c1 1d), the primary key for the row (c3 07 41 5c), and the block guess (03 40 05 75). Read the block guess as a 4 byte hex number, and it translates to file 13, block 1397 – which should belong to the IOT_TOP segment. So the exciting question is – what object does block (13, 1397) think it belongs to ?
alter system dump datafile 13 block 1397; Block header dump: 0x03400575 Object id on Block? Y seg/obj: 0x12dff csc: 0x00000b860e308c46 itc: 2 flg: E typ: 1 - DATA brn: 0 bdba: 0x3400501 ver: 0x01 opc: 0 inc: 0 exflg: 0
Converting from Hex to Decimal: obj: 0x12dff = 77311 which is the overflow segment. The secondary index block guess is pointing at a block in the overflow segment.
There are two ways to handle this problem – you could simply rebuild the index (alter index rebuild) or, as the original poster did, use the “update block references” command to correct all the block guesses: “alter index randomload_idx update block references;”. Neither is desirable, but if you’re seeing a lot of large trace files following the pattern above then it may be necessary.
There was one particular inconsistency in the tests – which I ran many times – occasionally the pct_direct_access for the secondary index would be reported as zero (which, technically, should always happen given the error). If it did, of course, Oracle wouldn’t follow the guess but would go straight to the step where it used the primary key “logical rowid” – thus bypassing the error and block dump.
tl;dr
In some circumstances the block guesses in the secondary indexes of IOTs may be pointing to the overflow segment instead of the primary key (TOP) segment. If this happens then queries will still run and give the right answers, but whenever they read a “guessed” block from disc they will report an ORA-01410 error and dump a block trace. This will affect performance and may cause space problems at the O/S level.
Footnote
An entry in the secondary index of an Index Organized Table (IOT) consists of three parts, which intially we can think in the form:
({key-value}, {logical rowid}, {block guess})
Since IOTs don’t have real rowids the “logical rowid” is actually the primary key of the row where the {key value} will be found. As a short cut for efficient execution Oracle includes the block address (4 bytes) where that primary key value was stored when the row was inserted. Because an IOT is an index “rows” in the IOT can move as new data is inserted and leaf blocks split, so eventually any primary key may move to a different block – this is why we refer to the block address as a guess – a few days, hours, or minutes after you’ve inserted the row the block address may no longer be correct.)
To help the runtime engine do the right thing Oracle collects a statistic called pct_direct_access for secondary indexes of IOTs. This is a measure of what percentage of the block guesses are still correct at the time that the statistics are gathered. If this value is high enough the run-time engine will choose to try using the block guesses while executing a query (falling back to using the logical rowid if it turns out that the guess is invalid), but if the value drops too low the optimizer will ignore the block guesses and only use the logical rowid.
Not relevant to this note – but a final point about secondary indexes and logical rowids – if the definition of the index includes some of the columns from the primary keys Oracle won’t store those columns twice (in more recent version, that is) – the code is clever enough to use the values stored in the (key value) component when it needs to use the (logical rowid) component.
Update (Jan 2020)
I passed this example on to Oracle, and there are now two (non-visible) bugs recorded for it:
- Bug 30733525 – ALERT LOG ENTRIES RE BLOCK GUESSES IN THE SECONDARY INDEXES OF IOTS POINTING TO OVERFLOW SEGMENT INSTEAD OF INDEX SEGMENT
- Bug 30733563 – WRONG GUESS DBA IN INDEX
Update (Nov 2022)
Checking the alert log immediately after starting up an instance of 19.11.0.0 I can see in the section (Dumping current patch information) the bug number 30733563 reported in the list of bugs fixed by Patch Id: 32545013 (the 19.11.0.0 patch).
I’ve also repeated the test using 21.3.0.0 (the most recent patch I had at hand), and the results of the test suggested that it didn’t suffer from the same bug. Unfortunately I noticed another oddity (that hadn’t been present in the 19.3/12.2 run) in the trace file, and checking back to the 19.11.0.0 trace file the oddity was there as well.
Here’s the set of waits due to the query, taken from the two trace files, 19.11 first then 12.3:
WAIT #139944562146024: nam='db file sequential read' ela= 26 file#=36 block#=260 blocks=1 obj#=156326 tim=8771776217 WAIT #139944562146024: nam='db file sequential read' ela= 7 file#=36 block#=358 blocks=1 obj#=156326 tim=8771776320 WAIT #139944562146024: nam='db file sequential read' ela= 6 file#=36 block#=1693 blocks=1 obj#=156325 tim=8771776357 WAIT #139944562146024: nam='db file scattered read' ela= 144 file#=36 block#=1312 blocks=32 obj#=156324 tim=8771780673 WAIT #139623483390184: nam='db file sequential read' ela= 21 file#=13 block#=1915 blocks=1 obj#=87942 tim=1529853216 WAIT #139623483390184: nam='db file sequential read' ela= 9 file#=13 block#=9725 blocks=1 obj#=87942 tim=1529853286 WAIT #139623483390184: nam='db file sequential read' ela= 7 file#=13 block#=11083 blocks=1 obj#=87941 tim=1529853337 WAIT #139623483390184: nam='db file scattered read' ela= 78 file#=13 block#=10736 blocks=8 obj#=87940 tim=1529857981
In both cases the object number (obj#=) shows me:
- Two single block reads from the secondary index
- One single block read from the primary key (IOT_TOP) index
- A multi-block read from the overflow
You’ll notice that Oracle is not doing the multi -block (warmup) reads of the two indexes that it had been doing in the older versions; but the multi-block read of the overflow segment is redundant. The query is for a column which, by virtue of the including clause of the table, is in the IOT_TOP segment, so the code should be capable of avoiding the visit the overflow.
My test case selected just one row from the table, so I thought I’d select for a different value of mark6 to see what happened if I returned more rows scattered randomly through the overflow. In the 21.3 test the predicate “mark6 = 1312” returned 14 rows – here’s the trace file extract for the disk waits (again preceded by a flush of the buffer cache).
WAIT #139727331718368: nam='SQL*Net message to client' ela= 4 driver id=1952673792 #bytes=1 p3=0 obj#=87942 tim=4367518599 WAIT #139727331718368: nam='db file scattered read' ela= 7938 file#=13 block#=9816 blocks=8 obj#=87941 tim=4367526882 WAIT #139727331718368: nam='db file sequential read' ela= 869 file#=13 block#=9709 blocks=1 obj#=87940 tim=4367528218 WAIT #139727331718368: nam='db file sequential read' ela= 25 file#=13 block#=9821 blocks=1 obj#=87941 tim=4367528479 WAIT #139727331718368: nam='db file sequential read' ela= 812 file#=13 block#=9868 blocks=1 obj#=87940 tim=4367529392 WAIT #139727331718368: nam='db file scattered read' ela= 935 file#=13 block#=9848 blocks=8 obj#=87941 tim=4367530560 WAIT #139727331718368: nam='db file scattered read' ela= 725 file#=13 block#=9869 blocks=3 obj#=87940 tim=4367531477 WAIT #139727331718368: nam='db file scattered read' ela= 1153 file#=13 block#=9776 blocks=8 obj#=87941 tim=4367532887 WAIT #139727331718368: nam='db file sequential read' ela= 926 file#=13 block#=9890 blocks=1 obj#=87940 tim=4367534087 WAIT #139727331718368: nam='db file scattered read' ela= 1073 file#=13 block#=9768 blocks=8 obj#=87941 tim=4367535342 WAIT #139727331718368: nam='db file sequential read' ela= 858 file#=13 block#=9900 blocks=1 obj#=87940 tim=4367536501 WAIT #139727331718368: nam='db file scattered read' ela= 5982 file#=13 block#=10232 blocks=8 obj#=87941 tim=4367542650 WAIT #139727331718368: nam='db file sequential read' ela= 1047 file#=13 block#=9914 blocks=1 obj#=87940 tim=4367544052 WAIT #139727331718368: nam='db file scattered read' ela= 1000 file#=13 block#=10144 blocks=8 obj#=87941 tim=4367545254 WAIT #139727331718368: nam='db file sequential read' ela= 722 file#=13 block#=9926 blocks=1 obj#=87940 tim=4367546176 WAIT #139727331718368: nam='db file scattered read' ela= 943 file#=13 block#=10328 blocks=8 obj#=87941 tim=4367547275 WAIT #139727331718368: nam='db file sequential read' ela= 767 file#=13 block#=9937 blocks=1 obj#=87940 tim=4367548341 WAIT #139727331718368: nam='db file scattered read' ela= 973 file#=13 block#=10272 blocks=8 obj#=87941 tim=4367549561 WAIT #139727331718368: nam='db file sequential read' ela= 642 file#=13 block#=9960 blocks=1 obj#=87940 tim=4367550383 WAIT #139727331718368: nam='db file scattered read' ela= 892 file#=13 block#=10280 blocks=8 obj#=87941 tim=4367551481 WAIT #139727331718368: nam='db file scattered read' ela= 753 file#=13 block#=9961 blocks=7 obj#=87940 tim=4367552395 WAIT #139727331718368: nam='db file scattered read' ela= 1271 file#=13 block#=10600 blocks=8 obj#=87941 tim=4367553876 WAIT #139727331718368: nam='db file sequential read' ela= 1145 file#=13 block#=10731 blocks=1 obj#=87940 tim=4367555305 WAIT #139727331718368: nam='db file scattered read' ela= 779 file#=13 block#=10960 blocks=8 obj#=87941 tim=4367556371 WAIT #139727331718368: nam='db file sequential read' ela= 569 file#=13 block#=10714 blocks=1 obj#=87940 tim=4367557135 WAIT #139727331718368: nam='db file scattered read' ela= 730 file#=13 block#=11080 blocks=8 obj#=87941 tim=4367557962 WAIT #139727331718368: nam='db file sequential read' ela= 384 file#=13 block#=10630 blocks=1 obj#=87940 tim=4367558471 WAIT #139727331718368: nam='db file scattered read' ela= 597 file#=13 block#=11128 blocks=8 obj#=87941 tim=4367559150 WAIT #139727331718368: nam='db file sequential read' ela= 373 file#=13 block#=10664 blocks=1 obj#=87940 tim=4367559606
The multiblock prefetches on the IOT_TOP segment (obj# – 87941) aren’t unreasonable – although they weren’t actually recorded as prefetches or warmups. But the 14 single block reads of the overflow segment are most unreasonable.
I’m not going to dig any deeper into this – but at some stage I (or someone) will have to go back to earlier versions of Oracle to see if Oracle has always been doing redundant reads of overflow segments, or whether this is a new defect introduced in recent versions of Oracle.
Arguably Oracle has to visit the overflow segment because sometimes the columns kept in the IOT_TOP segment are so long that it’s not possible to keep all the included columns there – but the reads should surely be an exception rather than the rule; anything else changes the dynamics (and relevance) of IOTs dramatically.