Here’s an execution plan pulled from v$sql_plan in 10.2.0.3 with the “gather_plan_statistics” option enabled:
---------------------------------------------------------------------------------------- | Id | Operation | Name | Starts | A-Rows | A-Time | Buffers | ---------------------------------------------------------------------------------------- | 1 | NESTED LOOPS | | 1 | 0 |00:00:00.01 | 608 | | 2 | TABLE ACCESS BY INDEX ROWID| PARENT | 1 | 200 |00:00:00.01 | 6 | | 3 | INDEX RANGE SCAN | PAR_I1 | 1 | 200 |00:00:00.01 | 2 | | 4 | TABLE ACCESS BY INDEX ROWID| CHILD | 200 | 0 |00:00:00.01 | 602 | | 5 | INDEX UNIQUE SCAN | CHI_I1 | 200 | 200 |00:00:00.01 | 402 | ----------------------------------------------------------------------------------------
The code defining the two tables is the same:
create table &m_table_name
as
select
rownum id,
trunc(dbms_random.value(1,&m_rowcount)) n1,
lpad(rownum,10,'0') small_vc,
rpad('x',100) padding
from
big_row_source
where
rownum <= &m_rowcount
;
The indexes par_i1 and chi_i1 are on the id column and, as you can see for chi_i1, have been created as unique indexes. Unlike the table, though, the code defining them isn’t necessarily identical.
Please state, with justification, what you think the blevel is for index chi_i1 ?
(Note – to avoid any risk of confusion, remember that height = blevel + 1; and just in case anyone thinks it makes any difference, the tables and indexes were all created in the same tablespace which is my usual setup of 1MB uniform extents, locally managed, 8K block size and freelist management.)
The quiz isn’t intended to be cunning, devious and tricksy, by the way. It’s all about observing and understanding a fairly ordinary situation. On the other hand I’m sure there are interesting variations with strange side effects that could be introduced if you really want to get clever.
Answer 23rd Jan:
Rather than writing my answer into a new post I’ve added it below – but to give new visitors a chance to work out the answer before seeing mine I’ve left a big gap before the answer.
Reading the comments I think all the interesting points have been covered. We have comments covering special treatment of root blocks in nested loop joins, the “buffer is pinned count” statistic, possible changes in 11g, the hypothesis that I had set a large pctfree for the child index,and the link between “A-rows” in one line of the plan and “Starts” in another line.
For reference, my query looked like this:
select par.small_vc, chi.small_vc from parent par, child chi where par.id between 301 and 500 and chi.id = par.n1 and chi.n1 = 0 ;
The mechanics of the plan are:
Line 3 – index range scan of par_i1 (which finds ultimately 200 matching rowids).
Line 2 – for each rowid from line 3 check the matching table row (which ultimately returns 200 rows)
Then two hundred calls to:
Line 5 – index unique scan of chi_i1, which always finds it single row – hence returning 200 rowids in total
Line 4- check the matching row in the table – which always fails – hence returning zero rows in total
The critical counts that allow you to answer my question are the 402 and 602 buffer gets in lines 5 and 4. Ignoring, temporarily, the odd 2 these numbers are clearly 200 times something – which is rather nice given that we know that we are doing something 200 times. The simplest solution, of course is that the (602 – 402 =) 200 gets due to line 4 represents the visits to the 200 table blocks, leaving us with 400 (plus an odd bit) to account for.
Now consider the possibilities:
- Blevel = 0 – we would visit the only index block (which would be the root, but look like a leaf) then visit the table.
- Blevel = 1 – we would visit the root, the leaf which is the next layer down, then the table
- Blevel = 2 – we would visit the root, the branch level, the leaf, then the table.
So, at first sight we might decide the only way to get 400 buffer visits from the index is to have Blevel = 1, visit the root and the leaf. But that’s not the way it works (apart from a couple of versions which had a bug).
When running a nested loop join, Oracle “pins” the root block of the inner (second table) index, which means it attaches a structure to the “users’ list” in the buffer header that links the session’s state object directly to the buffer header. So, in my nested loop join, Oracle gets the root block once and keeps it pinnned, then gets branch and leaf blocks 200 times. The blevel on the child index is 2. (The blevel of the parent index is only 1 – I had set the pctfree on the child index to 90 to make it much larger than it needed to be.)
This “root block pin” isn’t the only pinning optimisation in the query, though. In detail, the steps are as follows – and this is something you would be table to see in the trace file if you enabled event 10200 (one of the ‘consistent reads’ traces).
- 1) Get the root block of parent index and pin it
- 2) Get the first relevant leaf block of the parent index and pin it
- 3) Get the first relevant table block of the parent table and pin it.
- 4) Get the root block of the child index and pin it
- 5) Get the relevant branch block of the child index (no pin)
- 6) Get the relevant leaf block of the child index (no pin)
- 7) Get the table block of the child table (no pin) — row does not match final predicate and is rejected
- 8) Revisit the parent index leaf block through the pin (add one to “buffer is pinned count”)
- 9) Revisit the parent table block through the pin (add one to “buffer is pinned count”)
- 10) Revisit the child index root block through the pin (add one to “buffer is pinned count”)
- 11) Get a new child branch
- 12) Get a new child leaf
- 13) Get a new table block
- 14) Repeat for a total of 200 cycles from step 8
There a couple of deviations from this cycle, of course. The rows I needed from parent were spread across 4 consecutive blocks in the table so, roughly every 50 rows from parent, line (9) above would become “release current pinned parent block, get new parent block and pin it”. The other little oddity that I can’t explain is that Oracle does “get” the child root block on the second visit to the index as well as the first visit – and then pins it from that moment onwards. So the counts are:
- Gets on the parent index 2 (root and leaf)
- Gets on the parent table 4 (one for each table block visited)
- Gets on the child index 402 (two on the root block, 200 on branch blocks, 200 on leaf blocks)
- Gets on the child table (200 for each row/block accessed)
Inevitably, things change – Oracle keeps getting smarter about things like ‘consistent gets – examination’, ‘buffer is pinned count’ and, in 11g, “fastpath” access to buffered blocks. This is a clue to the difference in gets that Charles Hooper recorded in 11.2.0.2 – and explains why I chose to use 10.2.0.3 with a unique index for my example. If you want to investigate other variations all it takes is snapshots of v$mystat, calls to “alter system flush buffer_cache”, and event 10200; the treedump can also be very helpful for identifying block addresses.

Jonathan,
The BLEVEL of CHI_I1 index will be 2.
Reading from explain plan, your query will fetch 200 rows from index PAR_I1 and reading PARENT TALBE won’t discard any of them. That’s why we have 200 reads on CHI_I1. And here is the interesting part of story beginning. If BLEVEL is 2, we have to read 3 blocks when perfroming UNIQUE SCAN (one root, one branch and one leaf). But in case of NESTED LOOPS, Oracle is clever enough not to read the ROOT BLOCK every time. It will cache it somehow (I have to admitt I’m not able to describe whole process), spending one extra consistent read. So we have 4 consistent reads on index CHI_I1 for the first row, but only 2 consistent reads on next 199. 4 + 2 * 199 = 4 + 398 = 402 (402 matches buffer gets on row 5 of your exececution plan). Then we need 1 another read from table CHILD, for every row. 402 + 200 = 602.
The second interesting observetion is that only 2 reads were needed on index PAR_I1. As far I can tell, it is counsequeunce of your query, which is obviously not returning any rows and some clever prefeteching alghoritm. It would be really nice, if you would be able to provide some extra explanation.
Comment by Pavol Babel — January 21, 2011 @ 9:32 pm UTC Jan 21,2011 |
Pavol,
I was thinking on the same lines but I could not understand your following 2 statements
If BLEVEL is 2, we have to read 3 blocks when perfroming UNIQUE SCAN (one root, one branch and one leaf) and So we have 4 consistent reads on index CHI_I1 for the first row
Shouldn’t that be 3 consistent reads on index CHI_I1 for the first row instead? Or is the extra consistent get for reading index segment header?
Comment by Narendra — January 22, 2011 @ 10:01 am UTC Jan 22,2011 |
Narednra,
Oracle does not have to read index segment header in case of index unique or range scan (It think oracle is able to derive ROOT BLOCK address directly from address of segment header without inspecting segment header block).
Now I will try to explain my statment once more time. I wrote: “It will cache it (the ROOT BLOCK) somehow (I have to admitt I’m not able to describe whole process), spending one extra consistent read .” . I think the extra consistent read is spent due ROOT BLOCK CACHING, but I’m not sure, just guessing Oracle will create new consistent read version of ROOT BLOCK in BUFFER CACHE, pinned by server process during nested loops join.
Comment by Pavol Babel — January 22, 2011 @ 3:56 pm UTC Jan 22,2011 |
Hi Joanathan,
I think blevel will be 2, but the interesting thing is in the execution plan at step 5 which shows index unique scans and A-rows are 200 (i.e. instead of 1) since the index is define as unique or may be i am missing
something
Also my assumption is that query is joinning parent and child table and there will be range condition on parent.id which why there is index range scans i think it may be 200 range scan and for each row it will lookup in child table
If my above assumption is right than is 200 is the cummulative total in step 5 for chi_i1?
Thanks
Comment by Henish — January 21, 2011 @ 9:53 pm UTC Jan 21,2011 |
Jonathan,
Without testing, I would suggest that the BLEVEL is 1 (I will leave the explanation for later, but basically every start should visit the root block).
Just out of curiosity, is there an optimization for this in 11.2.0.2? My test results for the CHILD table (T2) differed from the explanation that I would have provided to explain the number of buffer visits (note that OPTIMIZER_FEATURES_ENABLE was set to 10.2.0.4 to avoid two NESTED LOOPS joins):
Comment by Charles Hooper — January 21, 2011 @ 11:23 pm UTC Jan 21,2011 |
By testing on 10.2.0.5 I found that the number of consistent gets for the index access were much closer to those at the top of this blog article. However, that testing also demonstrated that the BLEVEL of the CHI_I1 index is not 1 as I suggested earlier. I still find it interesting that the number of consistent gets for the CHI_I1 index on 11.2.0.2 is significantly less than that achieved on 10.2.0.5.
Comment by Charles Hooper — January 22, 2011 @ 3:46 am UTC Jan 22,2011 |
I think, this is effect of Buffer Pinning (related statistic “buffer is pinned count”). With reverse index, for example, Oracle must visit different leaf block in each iteration, therefore can’t get advantage of this optimisation.
Comment by Vyacheslav Rasskazov — January 22, 2011 @ 5:11 am UTC Jan 22,2011 |
Let’s see… We have either blevel 1 and two extra buffer gets to explain, or blevel 2 and 198 missing buffer gets. Pavol’s theory accounts nicely for 199 missing CRs, leaving only the question of why would Oracle need an extra CR to pin (or copy to private memory) the root block buffer? Unless this only happens on the second start of the unique scan.
On the other hand, 2 extra CRs (hence a blevel 1) would be easy to explain with a concurrent change or delayed cleanouts if the “no tricks” premise didn’t stand in the way.
Also, PAR_I1 obviously has a blevel of 1. Why would an index on the same column type need a second branch level? Either CHILD has many more rows, or a large PCTFREE was specified. So a blevel of 2 is still possible.
Can’t wait for the solution…
Comment by Flado — January 22, 2011 @ 2:08 pm UTC Jan 22,2011 |
I will submit my very little oracle knowledge to critics
We have the following information on line 5 of the explain plan
Starts = 200, A-Rows = 200 and Buffers = 402
The Operation 5 has been started 200 times and has produced 200 rows; which means that A-Rows/Starts = 200/200 = 1 row for each execution
Buffers definition : The number of logical reads the operation performed during its executions = 402 almost 402/200 = 2 buffers
Hence, the operation 5 performed 2 consistent gets for each CHI_I1 Index access; so its blevel = 0
Best Regards
Comment by Mohamed — January 23, 2011 @ 9:37 am UTC Jan 23,2011 |
The answer is 2.
I’ve added the explanation to the original posting.
Comment by Jonathan Lewis — January 23, 2011 @ 2:15 pm UTC Jan 23,2011 |
Jonathan,
I see that you have already posted the answer to the quiz – very thought provoking.
For comparison with 11.2.0.2, this is the execution plan that I see on 10.2.0.5 when the BLEVEL of both indexes is 1, as was the case on 11.2.0.2:
As can be seen by the above, the number of consistent gets for the CHI_I1 was 202 rather than 402 as shown at the top of this blog article. Process of elimination would suggest that the BLEVEL must be 2 (of course process of elimination was not the desired path of finding the answer).
With 300,000 rows in each of the tables I used the following SQL statement that resulted in the generation of the above execution plans:
Vyacheslav Rasskazov’s comment jogged my memory a bit regarding what is likely happening. While waiting for your answer to this post I performed a couple of tests. Running this SQL statement before and after executing my test query:
SELECT NAME, VALUE FROM V$MYSTAT M, V$STATNAME S WHERE (NAME LIKE 'consistent%' OR NAME LIKE 'buffer%') AND M.STATISTIC#=S.STATISTIC#; NAME VALUE -------------------------------------- ---------- consistent gets 456 /* +408 */ consistent gets from cache 456 /* +408 */ consistent gets - examination 405 /* +402 */ consistent gets direct 0 consistent changes 0 buffer is pinned count 630 /* +593 */ buffer is not pinned count 421 /* +406 */The consistent gets count increase matches my execution plan, and the buffer is pinned count increased by 593.
I then created a 10200 trace:
And a simple Excel macro to process the trace file:
Sub ReadTrace() Dim intFileNum As Integer '10200 trace file Dim intFileNum2 As Integer 'Output file Dim strInput As String 'Line read from the 10200 trace file Dim strOutput As String 'Line to be written to the output file Dim strBlock(2000) As String 'Block read from the trace file Dim strBlockCounter(2000) As Integer 'Number of times read Dim intBlocks As Integer 'Total number of blocks Dim i As Integer 'Loop counter Dim intFound As Integer 'Indicates whether or not the block was found intFileNum = FreeFile Open "c:\or10s_ora_4256_watch_consistent.trc" For Input As #intFileNum intFileNum2 = FreeFile Open "c:\watch_consistent.txt" For Output As #intFileNum2 Do While Not EOF(intFileNum) Line Input #intFileNum, strInput If InStr(strInput, "Consistent read started for block") > 0 Then strOutput = Trim(Right(strInput, Len(strInput) - InStr(strInput, ":"))) 'Find the number of times the block was accessed intFound = 0 For i = 1 To intBlocks If strOutput = strBlock(i) Then intFound = i strBlockCounter(i) = strBlockCounter(i) + 1 Exit For End If Next i 'If the block was not found, record it If intFound = 0 Then intBlocks = intBlocks + 1 intFound = intBlocks strBlockCounter(intBlocks) = 1 strBlock(intBlocks) = strOutput End If Print #intFileNum2, strOutput; vbTab; strBlockCounter(intFound) End If Loop Print #intFileNum2, "" For i = 1 To intBlocks Print #intFileNum2, strBlock(i); vbTab; strBlockCounter(i) Next i Close #intFileNum Close #intFileNum2 End SubThe output of the Excel macro lists every consistent block read, outputs that block number to another file, along with a running count of the number of times that block had been accessed. A summary is produced at the bottom of the output, listing the blocks accessed in order and the total number of consistent gets for that block. The summary from my test follows:
The read of blocks 0206e214 and 0206e215 were at the very start of the trace file. How are those block numbers from the 10200 trace file translated into the following from your explanation:
Comment by Charles Hooper — January 23, 2011 @ 3:55 pm UTC Jan 23,2011 |
Charles,
Since you have blevel 1 on the child index you can ignore the comments about the branch block. Given that I know what my data looks like I know that you have 57 rows per table block in the parent and child tables. So your output shows:
Note that: 57 + 57 + 57 + 29 = 200
Comment by Jonathan Lewis — January 23, 2011 @ 5:51 pm UTC Jan 23,2011 |
Simplified version for UNIX:
#!/bin/ksh INFILE=$1 OUTFILE=$2 grep "Consistent read started" $INFILE | awk -F ':' '{print $2}' | sort | uniq -c | sort -rn > $OUTFILEShould do the same and work even on windows (powered by cygwin :) :) )
Comment by Pavol Babel — October 6, 2011 @ 12:05 pm UTC Oct 6,2011 |
Jonathan,
Great job gain. I would like to two additional questions:
1. I have never heard of 10200 event. How did you know such an event exists? But meanwhile, I have discovered $ORACLE_HOME/rdbms/mesg/oraus.msg. i didn’t know this file was providing also list of events and their names! :-)
2. Very interesting information that Oracle is pinning blocks in the outer table, too (in 10g). If we give away condition “and chi.n1 = 0″ making your query retuning 200 rows, we will obviously have slightly more consistent gets in rows 2 and 3.
The change of behaviour in 11g is very interesting, too.
Comment by Pavol Babel — January 23, 2011 @ 4:21 pm UTC Jan 23,2011 |
Or in any oracle client:
begin dbms_output.put_line(sqlerrm(-10200));end;
/
Here’s the output from “SQL Commands” at apex.oracle.com:
ORA-10200: consistent read buffer status
Statement processed.
0.00 seconds
Comment by Flado — January 23, 2011 @ 5:05 pm UTC Jan 23,2011 |
Pavol,
My original source (some 15 to 20 years ago) was the oraus.msg file. From time to time you find notes on Metalink or on the internet about them.
Another interesting little detail is how more and more “consistent gets” have gradually changed to “consistent gets – examinations” over the versions. Once upon a time it seemed to be only undo blocks, single table hash cluster blocks, and root blocks of indexes – in recent versions you can even see ordinary heap table blocks subject to examinations.
Comment by Jonathan Lewis — January 23, 2011 @ 5:54 pm UTC Jan 23,2011 |
[...] happened again, another blog article that forced me to stop, think, and … hey, why did Oracle Database 11.2.0.2 do something [...]
Pingback by Watching Consistent Gets – 10200 Trace File Parser « Charles Hooper's Oracle Notes — January 24, 2011 @ 6:01 am UTC Jan 24,2011 |