Oracle Scratchpad

April 3, 2014

Cache anomaly

Filed under: Bugs,Oracle,Performance — Jonathan Lewis @ 1:27 pm BST Apr 3,2014

Just a quick heads-up for anyone who likes to play around with the Keep and Recycle caches.

In 11g Oracle introduced the option for serial direct path reads for tablescans on tables that was sufficiently large – which meant more than the small_table_threshold – provided the table wasn’t already sufficient well cached.  (The rules mean that the choice of mechanism can appear to be a little random in the production environment for tables that are near the threshold size – but if you try testing by doing “alter system flush buffer_cache” you find that you always get direct path reads in testing.)

I’ve just discovered a little oddity about this, though.  I have a table of about 50MB which is comfortably over the threshold for direct path reads. But if I create a KEEP cache (db_keep_cache_size) that is a little larger than the table and then assign the table to the KEEP cache (alter table xxx storage(buffer_pool keep)) then stops doing direct path reads, and caches the table.

Now this doesn’t seem unreasonable – if you’ve assigned an object to the KEEP cache you probably want it (or once wanted it) to be kept in cache as much as possible; so using the KEEP cache if it’s defined and specified makes sense. The reason I mention this as an oddity, though, is that it doesn’t reproduce in

I think I saw a bug note about this combination a few months ago- I was looking for something else at the time and, almost inevitably, I can’t find it when I want it – but I don’t remember whether it was the 11.1 or 11.2 behaviour that was deemed to be correct.


See comments 1 and 2 below.  I’ve written about this previously, and the caching bechaviour is the correct behaviour. The patch is in and backports are available for and The patch ensures that the table will be read into the cache if it is smaller than the db_keep_cache_size.  (Although we might want to check – see Tanel’s notes – whether this is based on the high water mark recorded in the segment header or on the optimizer stats for the table; and I suppose it might be worth checking that the same feature applies to index fast full scans). From the MoS document:

With the bug fix applied, any object in the KEEP buffer pool, whose size is less than DB_KEEP_CACHE_SIZE, is considered as a small or medium sized object. This will cache the read blocks and avoid subsequent direct read for these objects.



January 23, 2014


Filed under: Execution plans,Oracle,Performance — Jonathan Lewis @ 6:05 pm BST Jan 23,2014

Here’s a recent request from the OTN database forum – how do you make this query go faster (tkprof output supplied):

 select a.rowid
   from  a, b
   where A.MARK IS NULL
     and a.cntry_code = b.cntry_code and b.dir_code='XX' and b.numb_type='XXX'
     and upper(Trim(replace(replace(replace(replace(replace(replace(replace(a.co_name,'*'),'&'),'-'),'/'),')'),'('),' '))) like
         upper(Trim(substr(replace(replace(replace(replace(replace(replace(replace(b.e_name,'*'),'&'),'-'),'/'),')'),'('),' '),1,25)))||'%';

call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        1      0.01       0.01          0          0          0           0
Execute      2      0.00       0.00          0          0          0           0
Fetch        3   3025.53    3260.11       8367       7950          0          31
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total        6   3025.54    3260.13       8367       7950          0          31

Misses in library cache during parse: 1
Optimizer goal: CHOOSE
Parsing user id: 74

Rows     Row Source Operation
-------  ---------------------------------------------------
     31  HASH JOIN
 302790   INDEX FAST FULL SCAN OBJ#(39024) (object id 39024)   -- B 500,000 in the table
  55798   TABLE ACCESS FULL OBJ#(78942)                        -- A 175,000 in the table

-- and from some "explain plan" tool
SELECT STATEMENT  CHOOSE Cost: 333  Bytes: 52,355,940  Cardinality: 608,790 
3 HASH JOIN  Cost: 333  Bytes: 52,355,940  Cardinality: 608,790 
  1 INDEX FAST FULL SCAN UNIQUE B_PK Cost: 4  Bytes: 503,022  Cardinality: 12,898 
    2 TABLE ACCESS FULL A Cost: 215  Bytes: 3,150,034  Cardinality: 67,022

One thing you might note from the spartan tkprof output – this is an old version of Oracle ( to be exact).

The first thing to do is note that most of the time is spent on the CPU – and maybe that multiply cascading replace() has something to do with it.  Now replace() and translate() are things I use so rarely that I usually get them wrong first time, but I think the predicate could be replaced by:

upper(translate(a.co_name, 'x*&-/)( ', 'x')) like upper(substr(translate(b.e_name, 'x*&-/)( ', 'x'),1,25))||'%'

Beyond making the code slightly less eye-boggling, though, I don’t think this is going to help much. Consider the information we have about the sizes of the rowsources involved.

If we can trust the tkprof row counts as being the complete output from the first execution of the statement (there seem to have been 2 in the trace file) – we selected 300,000 rows from one table and 56,000 rows from the other and then joined them with a hash join. A hash join requires equality predicates, and the only join predicate in the query that could be used is the one “a.cntry_code = b.cntry_code”.

Now, if cntry_code is short for “country code” we have a scaling problem: there are only about 600 countries in the world, so on average each row in the A table (56,000 rows acquired) is going to find roughly 500 rows in the B table (300,000 rows divided across 600 countries). So at run time the hash join will generate a rowsource of at least 56,000 * 500 = 28 Million rows; then Oracle is going to do that complicated bit of textual manipulation on two columns, compare them, and find that ultimately only 31 rows match !

So how can we do less work ?

If we’re lucky we can make the hash join much more efficient by thinking about what that nasty textual predicate means. We compare to see if one string looks like it’s starting with the first 25 characters of the other string – but if it does then the two strings have to be identical on the first 25 characters, and a hash join works with equality. So let’s just add in a new predicate to see what effect it has:

upper(substr(translate(a.co_name, 'x*&-/)( ', 'x'),1,25)) = upper(substr(translate(b.e_name, 'x*&-/)( ', 'x'),1,25))

I’ve made the suggestion on the forum – now I’m waiting to see if it has a beneficial effect (or whether I’ve made a silly mistake in my logic or guesswork)

January 14, 2014

Single block reads

Filed under: Infrastructure,Oracle,Performance — Jonathan Lewis @ 6:52 pm BST Jan 14,2014

When a “cache read” tablescan (or index fast full scan) takes place we generally expect to see waits on “db file scattered read” as Oracle performs multi-block reads to do the scan. But we all know that Oracle will skip over blocks that are already in the cache and can therefore end up doing multi-block reads of many different sizes, even down to the point where it does single block reads (waiting for “db file sequential read”).

A quirky little question came up on OTN a little while ago: “for a large table we expect multiblock reads to be positioned at the end of the LRU for prompt re-use; but when Oracle does a single block read as part of a tablescan does it go to the end of the LRU (because it’s part of a large tablescan) or does it go to the mid-point of the LRU (because it’s a single block read)?”

The description of how blocks are treated in a tablescan has been simplified, of course, but the question is still valid – so what’s the answer, and how (without going into an extreme level of detail) would you demonstrate it ?


January 8, 2014

CR Trivia

Filed under: Infrastructure,Oracle,Performance — Jonathan Lewis @ 6:44 am BST Jan 8,2014

Everybody “knows” that when you do a tablescan of a table that it starts with two buffer gets on the segment header, though older versions (ca. 8i and earlier) of Oracle used to do 4 buffer gets on the segment header. The upshot of this is that many people still say that if you create a table and insert a single row then you’re going to get 3 buffer gets when you tablescan a table: two for the segment header and one for the data block:

So here’s a test, with the second set of autotrace stats which, for reasons I’ll describe shortly, may not be immediately reproducible on your system:

create table t1 (n1 number);
insert into t1 values(1);
execute dbms_stats.gather_table_stats(user,'t1');

set autotrace traceonly statistics
select * from t1;

          1  recursive calls
          0  db block gets
         22  consistent gets
          0  physical reads
          0  redo size
        540  bytes sent via SQL*Net to client
        544  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          1  rows processed

That really is 22 buffer gets to find the one row in the table. It gets worse; I used another session to insert (and commit) a second row in the table and the tablescan to return the two rows took 36 buffer gets.

If you want to get the same results you have to be a bit careful and a bit lucky. I was using ASSM (automatic segment space management) on a tablespace with an 8MB uniform extent size; when I inserted the single row into the table Oracle formatted a batch of 16 consecutive blocks in the first extent, deciding which range of blocks to format based on my process id. When I inserted my row, the block I inserted into was again dictated by my process id – this happened to be the 10th block in the formatted range.

When I ran the tablescan I did 6 gets on the segment’s space management blocks to work out what parts of the segment were formatted, then 10 gets to find the first row in the 10th block of the range, then a further 6 gets to scan the rest of the formatted range. I do not know why it takes 6 gets to read the space management blocks – but it may be two each for the segment header block, L2 bitmap block, and L1 bitmap block – it may be four on the segment header and one each on the other two blocks.

When I inserted a second row from another session (which I had set up very carefully), Oracle decided to format another batch of 16 blocks. This highlighted a little variation on what I’ve said so far. Because I had 8MB uniform extents the first 18 blocks of the first extent were all space management blocks (segment header, one L2 bitmap, and 16 L1 bitmaps – in the opposite order to that description). Oracle formats on boundaries of 16 blocks in the extent and in this case formatted the 14 blocks that would take it from block 19 to block 32 of the extent. That’s basically why my tablescan after the second insert took an extra 14 gets.

If you really do need to scan a “very small” table extremely frequently (and you shouldn’t, really) then it might be a good idea to check how many blocks have been formatted compared to the number of blocks used (dbms_rowid.rowid_block_number() will help with that check) and do an “alter table move” because in this special case Oracle won’t use the “format 16 blocks somewhere” strategy, it will format only as many blocks as needed starting from the first available block in the segment. For some reason the number of gets on space management blocks is also minimised in this case so that tablescan of a “single block” takes only 2 + number of blocks in table.

For ASSM segments Oracle maintains a Low High Water Mark (LHWM) and a High High Water Mark (HHWM). Every block below the LHWM is guaranteed to be formatted, blocks between the LHWM and HHWM will be formatted in batches of 16; so when doing a segment scan Oracle uses the largest multiblock-read it can from the start of the segment to the LHWM, then keeps checking the space management blocks to identify the batches of 16 that it can then read (and it can read consecutive formatted batches in a single read, so the reads are not necessarily limited to 16 blocks at a time).

One little side thought – because the choice of block and block range is dictated by the process id, the pattern of data insertion and contention using shared servers can be very different from the pattern produced by dedicated servers.


January 1, 2014


Filed under: Oracle,Performance — Jonathan Lewis @ 6:11 pm BST Jan 1,2014

Here’s a point that’s probably fairly well-known, but worth repeating – nvl() and coalesce() aren’t identical in behaviour but you may have some cases where you’re using nvl() when coalesce() would be a more efficient option.

The reason for this is “short-circuiting”. The expression nvl(expr1, expr2) will return expr2 if expr1 is null, otherwise it will return expr1; the expression coalesce(expr1, expr2, …, exprN) will return the first non-null expression in the list so, in particular, coalesce(expr1, expr2) will give the same result as nvl(expr1, expr2) ; the big difference is that nvl() will evaluate both expressions, while coalesce will evaluate expr2 only if it needs to (i.e. only if expr1 evaluates to null). In many cases the difference in performance will be insignificant – but consider the following type of construct (t1 is a table with a single, numeric, column n1 and a single row):

create table t1 as select 1 n1 from dual;
execute dbms_stats.gather_table_stats(user,'t1')

	nvl(n1, (select max(object_id) from all_objects))	nn

	coalesce(n1, (select max(object_id) from all_objects))	nn

In the first query Oracle will execute the inline scalar subquery whether n1 is null or not.
In the second query Oracle will execute the inline scalar subquery only if n1 is null.
I know which option I would prefer to use if I knew that n1 could be null.


There is a trap that you have to watch out for – try recreating t1 with n1 defined as a varchar2() column and the query with coalesce() will fail with Oracle error ORA-00932: inconsistent datatypes: expected CHAR got NUMBER the expressions that appear in the coalesce() must all be explicitly of the same type while nvl() will do implicit conversions when necessary, so be a little careful with the code when you’re looking for opportunities to make the change.

Note: the same difference (strictness of typing) appears when you compare decode() – which does implicit conversion when necessary – with case end – which does not. (Both case and decode() short-circuit, though).

December 11, 2013

Null Quiz

Filed under: Oracle,Performance,Troubleshooting,Tuning — Jonathan Lewis @ 6:42 pm BST Dec 11,2013

Here’s an example I saw a few months ago of the confusion caused by NULL. As the owner of the problem put it: the first query, run from SQL*Plus for testing purposes, takes no time to complete; but when “put into a pl/sql cursor” (as shown in the second query) it takes ages to complete.


November 11, 2013

Reverse Key

Filed under: Indexing,Oracle,Performance — Jonathan Lewis @ 3:19 pm BST Nov 11,2013

Here’s one of those little details which I would have said just couldn’t be true – except it’s in the manuals, and the manuals happen to be right.


September 18, 2013

Distributed Sets

Filed under: distributed,Oracle,Performance,sorting — Jonathan Lewis @ 6:14 pm BST Sep 18,2013

In an earlier post I’ve described how a distributed query can operate at a remote site if it’s a simple select but has to operate at the local site if it’s a CTAS (create as select) or insert as select. There’s (at least) one special case where this turns out to be untrue … provided you write the query in the correct fashion. I discovered this only as a result of doing a few experiments in response to a question on the OTN database forum.


September 11, 2013

MV Refresh

Filed under: compression,Infrastructure,Oracle,Partitioning,Performance — Jonathan Lewis @ 8:47 pm BST Sep 11,2013

I have a fairly strong preference for choosing simple solutions over complex solutions, and using Oracle-supplied packages over writing custom code – provided the difference in cost (whether that’s in human effort, run-time resources or licence fees) is acceptable. Sometimes, though, the gap between simplicity and cost is so extreme that a hand-crafted solution is clearly the better choice. Here’s an idea prompted by a recent visit to a site that makes use of materialized views and also happens to be licensed for the partitioning option.


September 9, 2013

Parallel Execution

Filed under: Oracle,Parallel Execution,Performance — Jonathan Lewis @ 6:51 am BST Sep 9,2013

While checking out potential scalability threats recently on a client system, I was directed to a time-critical task that was currently executing the same PL/SQL procedure 16 times (with different parameters) between 6:00 and 7:00 am; as the system went through its next phase of expansion the number of executions of this procedure was likely to grow. An interesting detail, though, was that nothing else was going on while the task was running so the machine (which had 6 cores) was running at 16% CPU.

An obvious strategy for handling the required growth target was to make sure that four (possibly 5) copies of the procedure were allowed to run concurrently. Fortunately the different executions were completely independent of each other and didn’t interfere with each other’s data, so the solution simply required a mechanism to control the parallelism. Conveniently 11gR2 gave us one.

August 9, 2013

12c Join Views

Filed under: 12c,Oracle,Performance,Tuning — Jonathan Lewis @ 6:36 pm BST Aug 9,2013

There are a couple of posts on the blog describing problems with updateable join views or, to be more precise, join views which were key-preserved but which the optimizer did not recognize as key-preserved. Both scenarios are addressed in 12c:

May 22, 2013

gather_plan_statistics – 2

Filed under: Oracle,Performance,Troubleshooting — Jonathan Lewis @ 12:40 pm BST May 22,2013

Some time ago – actually a few years ago – I wrote a note about the hint /*+ gather_plan_statistics */ making some informal comments about the implementation and relevant hidden parameters. I’ve recently discovered a couple of notes from Alexander Anokhin describing the feature in far more detail and describing some of the misleading side effects of the implementaiton. There are two parts (so far): part 1 and part 2.


May 13, 2013

Parse Time

Filed under: Execution plans,Oracle,Performance,Troubleshooting — Jonathan Lewis @ 6:59 pm BST May 13,2013

Dominic Brooks published a note recently about some very nasty SQL – originally thinking that it was displaying a run-time problem due to the extreme number of copies of the lnnvl() function the optimizer had produced. In fact it turned out to be a parse-time problem rather than a run-time problem, but when I first read Dominic’s note I was sufficiently surprised that I decided to try modelling the query.


May 5, 2013


Filed under: Oracle,Performance — Jonathan Lewis @ 6:21 pm BST May 5,2013

Anyone who has used Kevin Closson’s “Silly Little Oracle Benchmark” will want to know about his significantly updated SLOB2.

February 11, 2013

Optimisation ?

Filed under: Infrastructure,Oracle,Performance — Jonathan Lewis @ 2:05 pm BST Feb 11,2013

I was at a client site recently where one of the end-users seemed to have discovered a cunning strategy for optimising a critical SQL statement. His problem was that his query screen times out after 2 minutes, so any query he runs has to complete in less than two minutes or he doesn’t see the results. Unfortunately he had a particular query which took nearly 32 minutes from cold to complete – partly because it’s a seven-table join using ANSI OUTER joins, against tables ranging through the 10s of millions of rows and gigabytes of data – the (necessary) tablescan of the table that had to be first in the join order took 70 seconds alone.

But our intrepid user seems to have made an important discovery and engineered a solution to his performance problem. I think he’s noticed that when you run a query twice in a row the second execution is often faster than the first. I can’t think of any other reason why the same person would run the same query roughly every four minutes between 8:00 and 9:00 am every morning (and then do the same again around 5:00 in the afternoon).

Looking at the SQL Monitoring screen around 10:00 the first day I was on-site I noticed this query with a very pretty graphic effect of gradually shrinking blue bars as 32 minutes of I/O turned into 2 minutes of CPU over the course of 8 consecutive executions which reported run times something like:  32 minutes, 25 minutes, 18 minutes, 12 minutes, 6 minutes, 4 minutes, 2.1 minutes, 2 minutes.

It’s lucky (for that user) that the db_cache_size is 60GB. On the other hand this machine is one of those Solaris boxes that likes to pretend that it’s got 128 CPUs when really it’s only 16 cores with 8 lightweight threads per core – you don’t want anyone running a query that uses 2 solid CPU minute on one of those boxes because it’s taking out 1/16th of your CPU availability, while reporting a load of 1/128 of your CPUs.

Footnote: the query can be optimised (properly) – it accessed roughly 100M rows of data to return roughly 300 rows (with no aggregation), so we just need to do a little bit of work on precise access paths.

Next Page »

The Rubric Theme Blog at


Get every new post delivered to your Inbox.

Join 3,507 other followers