Here’s an example of ANSI SQL that does something in a fashion that arguably looks a little tidier than the strategy you have to adopt in Oracle. As so often when I compare Oracle syntax and ANSI syntax it’s an example that relates to an outer join. We start with two tables – as usual I have locally managed tablespaces, 8KB blocks, 1MB uniform extents and freelist management. I’ve also disabled system statistics (CPU costing):
January 31, 2011
January 28, 2011
Fix Control
There’s a very useful posting from Coskan Gundogar about tracking down a problem to do with an 11g upgrade.
The method basically revolves around a quick check for “known issues” that might be relevant by looking at the dynamic performance views v$system_fix_control.
When I read Coskan’s notes I had forgotten that I had written a short item about this myself about a year ago where I listed the relatively small number of items available in 10.2.0.3. The list is up to 551 in my copy of 11.2.0.2.
January 25, 2011
Shared Server – 4
In earlier posts we looked at v$reqdist and v$queue, which report time spent running tasks, and time spent waiting in the COMMON and DISPATCHER queues.
I mentioned in the previous article that if we see too much time spent in the COMMON queue(s) then perhaps we needed more shared servers. Moving to the other end of the dialogue, one of the reasons why we might spend too much time waiting in a DISPATCHER queue for the result to go back to the user is that we don’t have enough dispatchers – and we can get a clue about this from the view v$dispatcher:
column messages format 999,999,999,999
column bytes format 999,999,999,999
column idle format 999,999,999,999
column busy format 999,999,999,999
column total_time format 999,999,999,999
column busy_percent format 999.99
select
name, /* network, */ messages, bytes,
idle, busy,
idle + busy total_time,
100 * round(busy/nullif(idle+busy,0),4) busy_percent
from
v$dispatcher
/
NAME MESSAGES BYTES IDLE BUSY TOTAL_TIME BUSY_PERCENT
---- ---------------- ---------------- ---------------- ---------------- ---------------- ------------
D000 341,902,864 -1,875,035,869 498,676,896 897,199,945 1,395,876,841 64.28
D001 351,090,918 -860,132,585 1,672,899,708 216,085,537 1,888,985,245 11.44
D002 3,543,576 1,820,366,239 6,602,929 82,744 6,685,673 1.24
D003 5,994,180 -1,007,460,261 6,539,742 145,927 6,685,669 2.18
Unfortunately, it looks as if the critical columns in this view are recorded as 32-bit signed, which means they wrap from positive to negative at about 2,000,000,000 – and this means the figures for D000 and D001 are complete garbage. In my last note I pointed out that I had started up two extra dispatchers on a system that had been running for quite a long time – which is why dispatchers D002 and D003 have such small number compared to the others – they’ve only been running about 18 hours (66,857 seconds).
Clearly, to get some sensible figures, you really need to play around with snapshots and deltas and worry about all the usual problems of collecting information for the right interval. Even so, these figures do show you that D002 and D003 have been idle for most of the time they’ve been up – but you’ll have to take it from me that the 827 seconds and 1,459 seconds they’ve recorded as busy time was a small fraction of a soak test that we were running. It’s not obvious from the absolute figures, but with the background information I have I can say that there was a small benefit from having four dispatchers, but nothing significant.
Note: if we were able to trust the 64.28% figure for dispatcher D000 we could be reasonably confident that we needed at least the second dispatcher simply on the basis of the work being done by D000; but we might also worry about it for another reason – if the dispatcher is very busy, it’s possible that this is just a symptom of the whole machine being busy, in which case it’s possible that the dispatcher isn’t able to get CPU time to do its work.
January 21, 2011
Quiz Night
Here’s an execution plan pulled from v$sql_plan in 10.2.0.3 with the “gather_plan_statistics” option enabled:
---------------------------------------------------------------------------------------- | Id | Operation | Name | Starts | A-Rows | A-Time | Buffers | ---------------------------------------------------------------------------------------- | 1 | NESTED LOOPS | | 1 | 0 |00:00:00.01 | 608 | | 2 | TABLE ACCESS BY INDEX ROWID| PARENT | 1 | 200 |00:00:00.01 | 6 | | 3 | INDEX RANGE SCAN | PAR_I1 | 1 | 200 |00:00:00.01 | 2 | | 4 | TABLE ACCESS BY INDEX ROWID| CHILD | 200 | 0 |00:00:00.01 | 602 | | 5 | INDEX UNIQUE SCAN | CHI_I1 | 200 | 200 |00:00:00.01 | 402 | ----------------------------------------------------------------------------------------
The code defining the two tables is the same:
create table &m_table_name
as
select
rownum id,
trunc(dbms_random.value(1,&m_rowcount)) n1,
lpad(rownum,10,'0') small_vc,
rpad('x',100) padding
from
big_row_source
where
rownum <= &m_rowcount
;
The indexes par_i1 and chi_i1 are on the id column and, as you can see for chi_i1, have been created as unique indexes. Unlike the table, though, the code defining them isn’t necessarily identical.
Please state, with justification, what you think the blevel is for index chi_i1 ?
(Note – to avoid any risk of confusion, remember that height = blevel + 1; and just in case anyone thinks it makes any difference, the tables and indexes were all created in the same tablespace which is my usual setup of 1MB uniform extents, locally managed, 8K block size and freelist management.)
The quiz isn’t intended to be cunning, devious and tricksy, by the way. It’s all about observing and understanding a fairly ordinary situation. On the other hand I’m sure there are interesting variations with strange side effects that could be introduced if you really want to get clever.
Answer 23rd Jan:
Rather than writing my answer into a new post I’ve added it below – but to give new visitors a chance to work out the answer before seeing mine I’ve left a big gap before the answer.
Reading the comments I think all the interesting points have been covered. We have comments covering special treatment of root blocks in nested loop joins, the “buffer is pinned count” statistic, possible changes in 11g, the hypothesis that I had set a large pctfree for the child index,and the link between “A-rows” in one line of the plan and “Starts” in another line.
For reference, my query looked like this:
select par.small_vc, chi.small_vc from parent par, child chi where par.id between 301 and 500 and chi.id = par.n1 and chi.n1 = 0 ;
The mechanics of the plan are:
Line 3 – index range scan of par_i1 (which finds ultimately 200 matching rowids).
Line 2 – for each rowid from line 3 check the matching table row (which ultimately returns 200 rows)
Then two hundred calls to:
Line 5 – index unique scan of chi_i1, which always finds it single row – hence returning 200 rowids in total
Line 4- check the matching row in the table – which always fails – hence returning zero rows in total
The critical counts that allow you to answer my question are the 402 and 602 buffer gets in lines 5 and 4. Ignoring, temporarily, the odd 2 these numbers are clearly 200 times something – which is rather nice given that we know that we are doing something 200 times. The simplest solution, of course is that the (602 – 402 =) 200 gets due to line 4 represents the visits to the 200 table blocks, leaving us with 400 (plus an odd bit) to account for.
Now consider the possibilities:
-
Blevel = 0 – we would visit the only index block (which would be the root, but look like a leaf) then visit the table.
Blevel = 1 – we would visit the root, the leaf which is the next layer down, then the table
Blevel = 2 – we would visit the root, the branch level, the leaf, then the table.
So, at first sight we might decide the only way to get 400 buffer visits from the index is to have Blevel = 1, visit the root and the leaf. But that’s not the way it works (apart from a couple of versions which had a bug).
When running a nested loop join, Oracle “pins” the root block of the inner (second table) index, which means it attaches a structure to the “users’ list” in the buffer header that links the session’s state object directly to the buffer header. So, in my nested loop join, Oracle gets the root block once and keeps it pinnned, then gets branch and leaf blocks 200 times. The blevel on the child index is 2. (The blevel of the parent index is only 1 – I had set the pctfree on the child index to 90 to make it much larger than it needed to be.)
This “root block pin” isn’t the only pinning optimisation in the query, though. In detail, the steps are as follows – and this is something you would be table to see in the trace file if you enabled event 10200 (one of the ‘consistent reads’ traces).
-
1) Get the root block of parent index and pin it
2) Get the first relevant leaf block of the parent index and pin it
3) Get the first relevant table block of the parent table and pin it.
4) Get the root block of the child index and pin it
5) Get the relevant branch block of the child index (no pin)
6) Get the relevant leaf block of the child index (no pin)
7) Get the table block of the child table (no pin) — row does not match final predicate and is rejected
8) Revisit the parent index leaf block through the pin (add one to “buffer is pinned count”)
9) Revisit the parent table block through the pin (add one to “buffer is pinned count”)
10) Revisit the child index root block through the pin (add one to “buffer is pinned count”)
11) Get a new child branch
12) Get a new child leaf
13) Get a new table block
14) Repeat for a total of 200 cycles from step 8
There a couple of deviations from this cycle, of course. The rows I needed from parent were spread across 4 consecutive blocks in the table so, roughly every 50 rows from parent, line (9) above would become “release current pinned parent block, get new parent block and pin it”. The other little oddity that I can’t explain is that Oracle does “get” the child root block on the second visit to the index as well as the first visit – and then pins it from that moment onwards. So the counts are:
-
Gets on the parent index 2 (root and leaf)
Gets on the parent table 4 (one for each table block visited)
Gets on the child index 402 (two on the root block, 200 on branch blocks, 200 on leaf blocks)
Gets on the child table (200 for each row/block accessed)
Inevitably, things change – Oracle keeps getting smarter about things like ‘consistent gets – examination’, ‘buffer is pinned count’ and, in 11g, “fastpath” access to buffered blocks. This is a clue to the difference in gets that Charles Hooper recorded in 11.2.0.2 – and explains why I chose to use 10.2.0.3 with a unique index for my example. If you want to investigate other variations all it takes is snapshots of v$mystat, calls to “alter system flush buffer_cache”, and event 10200; the treedump can also be very helpful for identifying block addresses.
January 19, 2011
ASSM Again
While checking my backlog of drafts (currently 75 articles in note form) I came across this one from August 2009 and was a little upset that I hadn’t finished it sooner – it’s a nice example of geek stuff that has the benefit of being useful.
From the Oracle newsgroup comp.databases.oracle.server, here’s an example of how to recreate a performance problem due to maintenance on ASSM bitmaps in 10.2.0.4.
Create a table in a tablespace with an 8KB block size, locally managed tablespace with uniform 1MB extents, and automatic segment space management (ASSM). Check the newsgroup thread if you want complete details on reproducing the test:
Session 1: Insert 100,000 rows of about 490 bytes into a table using a pl/sql loop and commit at end.
Session 1: Insert 1,000 rows into the table with single SQL inserts and no commits
Session 1: delete all data from the table with a single statement – but do not commit
Session 2: Insert 1,000 rows into the table with single SQL inserts and no commits – it’s very slow.
As one person on the thread pointed out – it looks as if Oracle is doing a full tablescan of the table, one block at a time showing “db file sequential read” waits all the way through the table. (If your db_cache_size is large enough you might not see this symptom).
I simplified the test – inserting just 100,000 of the rows (with the commit), then deleting them all (without the commit), then inserting one row from another session. Taking a snapshot of x$kcbsw and x$kcbwh, I got the following figures for the activity that took place inserting that one extra row (this was on Oracle 10.2.0.3):
---------------------------------
Buffer Cache - 23-Dec 11:50:36
Interval:- 0 seconds
---------------------------------
Why0 Why1 Why2 Other Wait
---- ---- ---- ----------
1,457 0 0 0 ktspfwh10: ktspscan_bmb
8 0 0 0 ktspswh12: ktspffc
1 0 0 0 ktsphwh39: ktspisc
7,061 0 0 0 ktspbwh1: ktspfsrch
1 0 0 0 ktuwh01: ktugus
7,060 0 0 0 ktuwh05: ktugct
1 0 0 0 ktuwh09: ktugfb
2 0 0 0 kdswh02: kdsgrp
2 0 0 0 kdiwh06: kdifbk
2 0 0 0 kdiwh07: kdifbk
---- ---- ---- ----------
15,595 0 0 0 Total: 10 rows
The figures tell us how much work Oracle has to do to find a table block that could accept a new row. The idea is simple – Oracle checks the first “level 3″ bitmap block (which is actually part of the segment header block) to find a pointer to the current “level 2″ bitmap block; it checks this level 2 bitmap block to find a pointer to the current “level 1″ bitmap block; and finally it checks the “level 1″ bitmap block to find a pointer to a data block that shows some free space.
Unfortunately every block in our table is apparently empty – but that’s only going to be true once session 1 commits. In this version of Oracle the blocks are all visible as “x% free” in the level 1 bitmaps – but when Oracle visits each block (“ktspbwh1: ktspfsrch”) it checks the ITL entry, which points it to the transaction table slot in the related undo segment header block to Get the Commit Time for the transaction (“ktuwh05: ktugct”) and finds that the transaction is not committed so the space is not really free. So Oracle has to visit the next block shown as free in the bitmap.
In our “bulk delete / no commit” case, we end up visitng every (or nearly every) block in the entire table before we find a block we can actually use – and, given the nature of the ASSM bitmap implementation, the order of the block visits is the “natural” table order, so we see something that looks like a full tablescan operating through single blocks reads (apart, perhaps, from a few blocks that are still cached).
I can’t explain why we do 1,457 visits to bitmap blocks (“ktspfwh10: ktspscan_bmb”) in this version of Oracle, but perhaps it’s simply an indication that Oracle picks (say) five “free block” entries from the bitmap block each time it visits it and therefore has to visit each bitmap block about 12 times if it doesn’t find a data block with space that really is free in it search.
Note – these results will be hugely dependent on version of Oracle – in an earlier version of Oracle the bitmap blocks were not updated by the delete until some time after the commit – and this variation of delayed block cleanout produced other unpleasant anomalies; and, just to make like difficult in later versions of Oracle, the x$kcbsw / x$kcbwh objects are not populated properly in 11g.
Footnote: In case you hadn’t realised, ASSM is a mechanism aimed at OLTP systems with a reasonable degree of concurrency – so it’s not too surprising that you can find problems and weak spots if you hit it with processing which is biased towards the DW and batch processing end of the spectrum.
January 17, 2011
Subquery Selectivity – 2
Here’s an item I thought I’d published a few years ago as a follow-up to an article on a 10g bug-fix for subquery selectivity. I was reminded of my oversight when a question came up on OTN that looked like an example of the bug introduced by the bug-fix – and I found that I couldn’t simply supply a link to explain the problem.
We start with some simple SQL to create a test data set:
January 16, 2011
Ignoring hints
Yes, finally, really ignoring hints – but it’s a sort of bug, of course.
Thanks to Timur Akhmadeev for telling us about bug 8432870 (you’ll need an account on MOS for the link) in his reply to Charles Hooper’s recent post.
In the upgrade from 9i to 10g there was a change in the “hint parser”. If you put a valid SQL keyword inside the hint delimiters (the note says /*+ */ but doesn’t mention the –+ alternative for specifying a hint, thought it’s probably still true there) when the keyword is not a valid hint – for example the word NOLOGGING which I have seen people use as if it were a hint – then Oracle will ignore all the hints.
Earlier versions of Oracle simply noticed that you had embedded something that wasn’t a valid hint, but that didn’t stop the parser from reading the rest of the hints correctly.
If the invalid hint is not a valid SQL keyword then there are no nasty side effects.
This might explain why I ran into an odd problem a little while ago when I added a comment to my hint list and found that the hints stopped working. I can’t remember the exact details any more but I think my comment was something along the lines of: “Do not … because …”, and this broke the hints until I changed it to “Don’t … because …”.
[Further reading on "ignoring hints"]
January 14, 2011
Statspack on RAC
Some time ago I was on a client site which was busy setting up a RAC cluster using Oracle 10g. Although I wasn’t involved with that particular system there were a couple of coffee-breaks where I ended up chatting with the consultant that the client had hired to install the cluster. In one of our breaks he surprised me by making the casual comment: “Of course, you can’t run Statspack on RAC because it locks up the system.”
(more…)
January 12, 2011
Fake Baselines
SQL Baslines in 11g are the new Stored Outlines – and one of the nicest features of SQL Baselines is that you are allowed to fake them; or rather, it’s legal to generate an execution plan for one query and transfer its execution plan to another query using the packaged procedure dbms_spm.load_plans_from_cursor_cache(). This posting is a demonstration of the technique.
(more…)
January 11, 2011
Shared Server – 3
The previous post in this series showed you how v$reqdist summarised the time taken by tasks running through shared servers – but there are several other ways we need to look at what’s going on with shared servers. One of the more important ones is to find out how much time a task is queueing before it gets to a shared server to start running – and Oracle gives us v$queue as the place to find this information:
(more…)
January 10, 2011
Cost – again
Browsing through some postings on Tony Hasler’s blog a little while ago I found this response to a note he had posted on some anomalies (i.e. bugs) in the costing of the “(min/max)” index scans:
My current understanding is it is not valid to try to compare costs across different queries (even if you just alter it by adding a hint). In general a better plan will have a lower cost but you cannot rely on this metric. The metric is really for the CBO to choose between alternative plans for this specific query, not to compare plans generated for different queries.
January 7, 2011
SQL Plan Baselines
Here’s another of my little catalogues of articles – this one on SQL Plan Baselines.
Be a little careful as you read through these notes – there are various changes in internal mechanisms, storage, etc. as you go through different versions of Oracle, so check which version the author is writing about.
(more…)
January 6, 2011
clonedb
I’ve lost count of the number of times I’ve said something like: “Ideally you really need to do your testing on the production system”. Kevin Closson has recently written some interesting notes (with more to come) about clonedb – an 11.2 feature that may make it possible to get pretty close to this ideal.
With a follow-up article from Tim Hall (Feb 2011)
Footnote: for purposes of removing any confusion, please note that actually testing on your production system is not a good idea
January 5, 2011
Shared Server – 2
Although they are becoming increasingly rare (thanks, largely, to Web-based applications taking over the world) a few of the systems I get called in to review are still using Shared Server technology (formerly known as Mutli-threaded Server / MTS); and I have to say that there are a couple of nice “overview” features supporting this technology that I would like to see in the AWR or Statspack reports. These are the views which allow you to see how the workload is being shared out and what the time distribution looks like, and I’ll be taking a look at these views over the course of three or four blog notes.
(more…)
January 3, 2011
Redo
A couple of days ago I published a link to some comments I had made on OTN about differences in redo generation between 10g and earlier versions of Oracle. This raised a few questions that suggested a follow-up (or perhaps “prequel”) note might be a little helpful. So I’ve created a simple SQL script to help demonstrate the differences and show how some of the pieces hang together.
(more…)

