Oracle Scratchpad

January 31, 2011

ANSI Outer

Filed under: ANSI Standard,CBO,Execution plans,lateral view,Oracle — Jonathan Lewis @ 6:59 pm UTC Jan 31,2011

Here’s an example of ANSI SQL that does something in a fashion that arguably looks a little tidier than the strategy you have to adopt in Oracle. As so often when I compare Oracle syntax and ANSI syntax it’s an example that relates to an outer join. We start with two tables – as usual I have locally managed tablespaces, 8KB blocks, 1MB uniform extents and freelist management. I’ve also disabled system statistics (CPU costing):

(more…)

January 28, 2011

Fix Control

Filed under: Troubleshooting — Jonathan Lewis @ 3:35 pm UTC Jan 28,2011

There’s a very useful posting from Coskan Gundogar about tracking down a problem to do with an 11g upgrade.

The method basically revolves around a quick check for “known issues” that might be relevant by looking at the dynamic performance views v$system_fix_control.

When I read Coskan’s notes I had forgotten that I had written a short item about this myself about a year ago where I listed the  relatively small number of  items available in 10.2.0.3. The list is up to 551 in my copy of 11.2.0.2.

 

January 25, 2011

Shared Server – 4

Filed under: Shared Server / MTS — Jonathan Lewis @ 6:52 pm UTC Jan 25,2011

In earlier posts we looked at v$reqdist and v$queue, which report time spent running tasks, and time spent waiting in the COMMON and DISPATCHER queues.

I mentioned in the previous article that if we see too much time spent in the COMMON queue(s) then perhaps we needed more shared servers. Moving to the other end of the dialogue, one of the reasons why we might spend too much time waiting in a DISPATCHER queue for the result to go back to the user is that we don’t have enough dispatchers – and we can get a clue about this from the view v$dispatcher:


column messages     format 999,999,999,999
column bytes        format 999,999,999,999
column idle         format 999,999,999,999
column busy         format 999,999,999,999
column total_time   format 999,999,999,999
column busy_percent format 999.99

select
        name, /* network, */ messages, bytes,
        idle, busy,
        idle + busy total_time,
        100 * round(busy/nullif(idle+busy,0),4) busy_percent
from
        v$dispatcher
/

NAME         MESSAGES            BYTES             IDLE             BUSY       TOTAL_TIME BUSY_PERCENT
---- ---------------- ---------------- ---------------- ---------------- ---------------- ------------
D000      341,902,864   -1,875,035,869      498,676,896      897,199,945    1,395,876,841        64.28
D001      351,090,918     -860,132,585    1,672,899,708      216,085,537    1,888,985,245        11.44
D002        3,543,576    1,820,366,239        6,602,929           82,744        6,685,673         1.24
D003        5,994,180   -1,007,460,261        6,539,742          145,927        6,685,669         2.18

Unfortunately, it looks as if the critical columns in this view are recorded as 32-bit signed, which means they wrap from positive to negative at about 2,000,000,000 – and this means the figures for D000 and D001 are complete garbage. In my last note I pointed out that I had started up two extra dispatchers on a system that had been running for quite a long time – which is why dispatchers D002 and D003 have such small number compared to the others – they’ve only been running about 18 hours (66,857 seconds).

Clearly, to get some sensible figures, you really need to play around with snapshots and deltas and worry about all the usual problems of collecting information for the right interval. Even so, these figures do show you that D002 and D003 have been idle for most of the time they’ve been up – but you’ll have to take it from me that the 827 seconds and 1,459 seconds they’ve recorded as busy time was a small fraction of a soak test that we were running. It’s not obvious from the absolute figures, but with the background information I have I can say that there was a small benefit from having four dispatchers, but nothing significant.

Note: if we were able to trust the 64.28% figure for dispatcher D000 we could be reasonably confident that we needed at least the second dispatcher simply on the basis of the work being done by D000; but we might also worry about it for another reason – if the dispatcher is very busy, it’s possible that this is just a symptom of the whole machine being busy, in which case it’s possible that the dispatcher isn’t able to get CPU time to do its work.

[Further reading on Shared Server / MTS]

January 21, 2011

Quiz Night

Filed under: Infrastructure — Jonathan Lewis @ 7:04 pm UTC Jan 21,2011

Here’s an execution plan pulled from v$sql_plan in 10.2.0.3 with the “gather_plan_statistics” option enabled:

----------------------------------------------------------------------------------------
| Id  | Operation                    | Name   | Starts | A-Rows |   A-Time   | Buffers |
----------------------------------------------------------------------------------------
|   1 |  NESTED LOOPS                |        |      1 |      0 |00:00:00.01 |     608 |
|   2 |   TABLE ACCESS BY INDEX ROWID| PARENT |      1 |    200 |00:00:00.01 |       6 |
|   3 |    INDEX RANGE SCAN          | PAR_I1 |      1 |    200 |00:00:00.01 |       2 |
|   4 |   TABLE ACCESS BY INDEX ROWID| CHILD  |    200 |      0 |00:00:00.01 |     602 |
|   5 |    INDEX UNIQUE SCAN         | CHI_I1 |    200 |    200 |00:00:00.01 |     402 |
----------------------------------------------------------------------------------------

The code defining the two tables is the same:

create table &m_table_name
as
select
	rownum					id,
	trunc(dbms_random.value(1,&m_rowcount))	n1,	
	lpad(rownum,10,'0')			small_vc,
	rpad('x',100)				padding
from
	big_row_source
where
	rownum <= &m_rowcount
;

The indexes par_i1 and chi_i1 are on the id column and, as you can see for chi_i1, have been created as unique indexes. Unlike the table, though, the code defining them isn’t necessarily identical.

Please state, with justification, what you think the blevel is for index chi_i1 ?

(Note – to avoid any risk of confusion, remember that height = blevel + 1; and just in case anyone thinks it makes any difference, the tables and indexes were all created in the same tablespace which is my usual setup of 1MB uniform extents, locally managed, 8K block size and freelist management.)

The quiz isn’t intended to be cunning, devious and tricksy, by the way. It’s all about observing and understanding a fairly ordinary situation. On the other hand I’m sure there are interesting variations with strange side effects that could be introduced if you really want to get clever.

Answer 23rd Jan:
Rather than writing my answer into a new post I’ve added it below – but to give new visitors a chance to work out the answer before seeing mine I’ve left a big gap before the answer.

Reading the comments I think all the interesting points have been covered. We have comments covering special treatment of root blocks in nested loop joins, the “buffer is pinned count” statistic, possible changes in 11g, the hypothesis that I had set a large pctfree for the child index,and the link between “A-rows” in one line of the plan and “Starts” in another line.

For reference, my query looked like this:

select
	par.small_vc,
	chi.small_vc
from
	parent	par,
	child	chi
where
	par.id between 301 and 500
and	chi.id = par.n1
and	chi.n1 = 0
;

The mechanics of the plan are:
Line 3 – index range scan of par_i1 (which finds ultimately 200 matching rowids).
Line 2 – for each rowid from line 3 check the matching table row (which ultimately returns 200 rows)

Then two hundred calls to:
Line 5 – index unique scan of chi_i1, which always finds it single row – hence returning 200 rowids in total
Line 4- check the matching row in the table – which always fails – hence returning zero rows in total

The critical counts that allow you to answer my question are the 402 and 602 buffer gets in lines 5 and 4. Ignoring, temporarily, the odd 2 these numbers are clearly 200 times something – which is rather nice given that we know that we are doing something 200 times. The simplest solution, of course is that the (602 – 402 =) 200 gets due to line 4 represents the visits to the 200 table blocks, leaving us with 400 (plus an odd bit) to account for.

Now consider the possibilities:

    Blevel = 0 – we would visit the only index block (which would be the root, but look like a leaf) then visit the table.
    Blevel = 1 – we would visit the root, the leaf which is the next layer down, then the table
    Blevel = 2 – we would visit the root, the branch level, the leaf, then the table.

So, at first sight we might decide the only way to get 400 buffer visits from the index is to have Blevel = 1, visit the root and the leaf. But that’s not the way it works (apart from a couple of versions which had a bug).

When running a nested loop join, Oracle “pins” the root block of the inner (second table) index, which means it attaches a structure to the “users’ list” in the buffer header that links the session’s state object directly to the buffer header. So, in my nested loop join, Oracle gets the root block once and keeps it pinnned, then gets branch and leaf blocks 200 times. The blevel on the child index is 2. (The blevel of the parent index is only 1 – I had set the pctfree on the child index to 90 to make it much larger than it needed to be.)

This “root block pin” isn’t the only pinning optimisation in the query, though. In detail, the steps are as follows – and this is something you would be table to see in the trace file if you enabled event 10200 (one of the ‘consistent reads’ traces).

    1) Get the root block of parent index and pin it
    2) Get the first relevant leaf block of the parent index and pin it
    3) Get the first relevant table block of the parent table and pin it.
    4) Get the root block of the child index and pin it
    5) Get the relevant branch block of the child index (no pin)
    6) Get the relevant leaf block of the child index (no pin)
    7) Get the table block of the child table (no pin) — row does not match final predicate and is rejected
    8) Revisit the parent index leaf block through the pin (add one to “buffer is pinned count”)
    9) Revisit the parent table block through the pin (add one to “buffer is pinned count”)
    10) Revisit the child index root block through the pin (add one to “buffer is pinned count”)
    11) Get a new child branch
    12) Get a new child leaf
    13) Get a new table block
    14) Repeat for a total of 200 cycles from step 8

There a couple of deviations from this cycle, of course. The rows I needed from parent were spread across 4 consecutive blocks in the table so, roughly every 50 rows from parent, line (9) above would become “release current pinned parent block, get new parent block and pin it”. The other little oddity that I can’t explain is that Oracle does “get” the child root block on the second visit to the index as well as the first visit – and then pins it from that moment onwards. So the counts are:

    Gets on the parent index 2 (root and leaf)
    Gets on the parent table 4 (one for each table block visited)
    Gets on the child index 402 (two on the root block, 200 on branch blocks, 200 on leaf blocks)
    Gets on the child table (200 for each row/block accessed)

Inevitably, things change – Oracle keeps getting smarter about things like ‘consistent gets – examination’, ‘buffer is pinned count’ and, in 11g, “fastpath” access to buffered blocks. This is a clue to the difference in gets that Charles Hooper recorded in 11.2.0.2 – and explains why I chose to use 10.2.0.3 with a unique index for my example. If you want to investigate other variations all it takes is snapshots of v$mystat, calls to “alter system flush buffer_cache”, and event 10200; the treedump can also be very helpful for identifying block addresses.

[Some notes which say a little more about gets and pins]

January 19, 2011

ASSM Again

Filed under: ASSM,Infrastructure,Oracle,Troubleshooting — Jonathan Lewis @ 6:21 pm UTC Jan 19,2011

While checking my backlog of drafts (currently 75 articles in note form) I came across this one from August 2009 and was a little upset that I hadn’t finished it sooner – it’s a nice example of geek stuff that has the benefit of being useful.

From the Oracle newsgroup comp.databases.oracle.server, here’s an example of how to recreate a performance problem due to maintenance on ASSM bitmaps in 10.2.0.4.

Create a table in a tablespace with an 8KB block size, locally managed tablespace with uniform 1MB extents, and automatic segment space management (ASSM). Check the newsgroup thread if you want complete details on reproducing the test:

Session 1: Insert 100,000 rows of about 490 bytes into a table using a pl/sql loop and commit at end.
Session 1: Insert 1,000 rows into the table with single SQL inserts and no commits
Session 1: delete all data from the table with a single statement – but do not commit

Session 2: Insert 1,000 rows into the table with single SQL inserts and no commits – it’s very slow.

As one person on the thread pointed out – it looks as if Oracle is doing a full tablescan of the table, one block at a time showing “db file sequential read” waits all the way through the table. (If your db_cache_size is large enough you might not see this symptom).

I simplified the test – inserting just 100,000 of the rows (with the commit), then deleting them all (without the commit), then inserting one row from another session. Taking a snapshot of x$kcbsw and x$kcbwh, I got the following figures for the activity that took place inserting that one extra row (this was on Oracle 10.2.0.3):

---------------------------------
Buffer Cache - 23-Dec 11:50:36
Interval:-  0 seconds
---------------------------------
          Why0          Why1          Why2    Other Wait
          ----          ----          ----    ----------
         1,457             0             0             0 ktspfwh10: ktspscan_bmb
             8             0             0             0 ktspswh12: ktspffc
             1             0             0             0 ktsphwh39: ktspisc
         7,061             0             0             0 ktspbwh1: ktspfsrch
             1             0             0             0 ktuwh01: ktugus
         7,060             0             0             0 ktuwh05: ktugct
             1             0             0             0 ktuwh09: ktugfb
             2             0             0             0 kdswh02: kdsgrp
             2             0             0             0 kdiwh06: kdifbk
             2             0             0             0 kdiwh07: kdifbk
          ----          ----          ----    ----------
        15,595             0             0             0 Total: 10 rows

The figures tell us how much work Oracle has to do to find a table block that could accept a new row. The idea is simple – Oracle checks the first “level 3″ bitmap block (which is actually part of the segment header block) to find a pointer to the current “level 2″ bitmap block; it checks this level 2 bitmap block to find a pointer to the current “level 1″ bitmap block; and finally it checks the “level 1″ bitmap block to find a pointer to a data block that shows some free space.

Unfortunately every block in our table is apparently empty – but that’s only going to be true once session 1 commits. In this version of Oracle the blocks are all visible as “x% free” in the level 1 bitmaps – but when Oracle visits each block (“ktspbwh1: ktspfsrch”) it checks the ITL entry, which points it to the transaction table slot in the related undo segment header block to Get the Commit Time for the transaction (“ktuwh05: ktugct”) and finds that the transaction is not committed so the space is not really free. So Oracle has to visit the next block shown as free in the bitmap.

In our “bulk delete / no commit” case, we end up visitng every (or nearly every) block in the entire table before we find a block we can actually use – and, given the nature of the ASSM bitmap implementation, the order of the block visits is the “natural” table order, so we see something that looks like a full tablescan operating through single blocks reads (apart, perhaps, from a few blocks that are still cached).

I can’t explain why we do 1,457 visits to bitmap blocks (“ktspfwh10: ktspscan_bmb”) in this version of Oracle, but perhaps it’s simply an indication that Oracle picks (say) five “free block” entries from the bitmap block each time it visits it and therefore has to visit each bitmap block about 12 times if it doesn’t find a data block with space that really is free in it search.

Note – these results will be hugely dependent on version of Oracle – in an earlier version of Oracle the bitmap blocks were not updated by the delete until some time after the commit – and this variation of delayed block cleanout produced other unpleasant anomalies; and, just to make like difficult in later versions of Oracle, the x$kcbsw / x$kcbwh objects are not populated properly in 11g.

Footnote: In case you hadn’t realised, ASSM is a mechanism aimed at OLTP systems with a reasonable degree of concurrency – so it’s not too surprising that you can find problems and weak spots if you hit it with processing which is biased towards the DW and batch processing end of the spectrum.

January 17, 2011

Subquery Selectivity – 2

Filed under: CBO,Execution plans,Troubleshooting — Jonathan Lewis @ 6:38 pm UTC Jan 17,2011

Here’s an item I thought I’d published a few years ago as a follow-up to an article on a 10g bug-fix for subquery selectivity. I was reminded of my oversight when a question came up on OTN that looked like an example of the bug introduced by the bug-fix – and I found that I couldn’t simply supply a link to explain the problem.

We start with some simple SQL to create a test data set:

(more…)

January 16, 2011

Ignoring hints

Filed under: Hints,Ignoring Hints,Oracle — Jonathan Lewis @ 10:32 pm UTC Jan 16,2011

Yes, finally, really ignoring hints – but it’s a sort of bug, of course.

Thanks to Timur Akhmadeev for telling us about bug 8432870 (you’ll need an account on MOS for the link) in his reply to Charles Hooper’s recent post.

In the upgrade from 9i to 10g there was a change in the “hint parser”. If you put a valid SQL keyword inside the hint delimiters (the note says /*+  */ but doesn’t mention the –+ alternative for specifying a hint, thought it’s probably still true there) when the keyword is not a valid hint – for example the word NOLOGGING which I have seen people use as if it were a hint – then Oracle will ignore all the hints.

Earlier versions of Oracle simply noticed that you had embedded something that wasn’t a valid hint, but that didn’t stop the parser from reading the rest of the hints correctly.

If the invalid hint is not a valid SQL keyword then there are no nasty side effects.

This might explain why I ran into an odd problem a little while ago when I added a comment to my  hint list and found that the hints stopped working. I can’t remember the exact details any more but I think my comment was something along the lines of: “Do not … because …”, and this broke the hints until I changed it to “Don’t … because …”.

[Further reading on "ignoring hints"]

 

January 14, 2011

Statspack on RAC

Filed under: Statspack — Jonathan Lewis @ 6:46 pm UTC Jan 14,2011

Some time ago I was on a client site which was busy setting up a RAC cluster using Oracle 10g. Although I wasn’t involved with that particular system there were a couple of coffee-breaks where I ended up chatting with the consultant that the client had hired to install the cluster. In one of our breaks he surprised me by making the casual comment: “Of course, you can’t run Statspack on RAC because it locks up the system.”
(more…)

January 12, 2011

Fake Baselines

Filed under: Execution plans,Troubleshooting — Jonathan Lewis @ 6:33 pm UTC Jan 12,2011

SQL Baslines in 11g are the new Stored Outlines – and one of the nicest features of SQL Baselines is that you are allowed to fake them; or rather, it’s legal to generate an execution plan for one query and transfer its execution plan to another query using the packaged procedure dbms_spm.load_plans_from_cursor_cache(). This posting is a demonstration of the technique.
(more…)

January 11, 2011

Shared Server – 3

Filed under: Shared Server / MTS — Jonathan Lewis @ 6:43 pm UTC Jan 11,2011

The previous post in this series showed you how v$reqdist summarised the time taken by tasks running through shared servers – but there are several other ways we need to look at what’s going on with shared servers. One of the more important ones is to find out how much time a task is queueing before it gets to a shared server to start running – and Oracle gives us v$queue as the place to find this information:
(more…)

January 10, 2011

Cost – again

Filed under: CBO,Execution plans,Tuning — Jonathan Lewis @ 6:47 pm UTC Jan 10,2011

Browsing through some postings on Tony Hasler’s blog a little while ago I found this response to a note he had posted on some anomalies (i.e. bugs) in the costing of the “(min/max)” index scans:

My current understanding is it is not valid to try to compare costs across different queries (even if you just alter it by adding a hint). In general a better plan will have a lower cost but you cannot rely on this metric. The metric is really for the CBO to choose between alternative plans for this specific query, not to compare plans generated for different queries.

(more…)

January 7, 2011

SQL Plan Baselines

Filed under: CBO,Execution plans,Troubleshooting — Jonathan Lewis @ 8:05 pm UTC Jan 7,2011

Here’s another of my little catalogues of articles – this one on SQL Plan Baselines.

Be a little careful as you read through these notes – there are various changes in internal mechanisms, storage, etc. as you go through different versions of Oracle, so check which version the author is writing about.
(more…)

January 6, 2011

clonedb

Filed under: Infrastructure — Jonathan Lewis @ 9:07 am UTC Jan 6,2011

I’ve lost count of the number of times I’ve said something like: “Ideally you really need to do your testing on the production system”. Kevin Closson has recently written some interesting notes (with more to come) about clonedb – an 11.2 feature that may make it possible to get pretty close to this ideal.

With a follow-up article from Tim Hall (Feb 2011)

Footnote: for purposes of removing any confusion, please note that actually testing on your production system is not a good idea

January 5, 2011

Shared Server – 2

Filed under: Shared Server / MTS — Jonathan Lewis @ 8:26 pm UTC Jan 5,2011

Although they are becoming increasingly rare (thanks, largely, to Web-based applications taking over the world) a few of the systems I get called in to review are still using Shared Server technology (formerly known as Mutli-threaded Server / MTS); and I have to say that there are a couple of nice “overview” features supporting this technology that I would like to see in the AWR or Statspack reports. These are the views which allow you to see how the workload is being shared out and what the time distribution looks like, and I’ll be taking a look at these views over the course of three or four blog notes.
(more…)

January 3, 2011

Redo

Filed under: Infrastructure,redo — Jonathan Lewis @ 7:40 pm UTC Jan 3,2011

A couple of days ago I published a link to some comments I had made on OTN about differences in redo generation between 10g and earlier versions of Oracle. This raised a few questions that suggested a follow-up (or perhaps “prequel”) note might be a little helpful. So I’ve created a simple SQL script to help demonstrate the differences and show how some of the pieces hang together.
(more…)

Next Page »

Theme: Rubric. Blog at WordPress.com.

Follow

Get every new post delivered to your Inbox.

Join 819 other followers