Oracle Scratchpad

June 20, 2011

Optimisation

Filed under: Oracle,Performance,Tuning — Jonathan Lewis @ 6:20 pm UTC Jun 20,2011

A question came up on Oracle-L recently about the difference in work done by the following two queries:

SELECT /*+ RULE */
	DOM_NAME
FROM
	DOMAINS,
	TABLE(CAST(:B1 AS DOMAIN_LIST)) DL
WHERE
	DOM_NAME = DL.COLUMN_VALUE
;

SELECT
	DOM_NAME
FROM
	DOMAINS
WHERE
	DOM_NAME IN (
		SELECT	COLUMN_VALUE
		FROM	TABLE(CAST(:B1 AS  DOMAIN_LIST))
	)
;

Before saying anything else, I should point out that these two queries are NOT logically equivalent unless you can guarantee that the table() operator returns a unique set of values – and Oracle doesn’t allow uniqueness to be enforced on collections.

(more…)

June 10, 2011

Quiz Night

Filed under: Indexing,Oracle,Performance — Jonathan Lewis @ 6:16 pm UTC Jun 10,2011

Here’s an interesting question from the OTN database forum:

“If I delete 90% of the rows from a table which has a few indexes, without rebuildling or coalescing indexes afterwards, will this improve the performance of index range scans ?”

The thing that makes it interesting is the scope it gives you for imagining reasons why the performance won’t change, or might get better, or could get worse. So how about it – can you think of an argument for each of the three possibilities ?

(more…)

April 22, 2011

Star Transformation

Filed under: CBO,Execution plans,Oracle,Performance,Tuning — Jonathan Lewis @ 6:14 pm UTC Apr 22,2011

A little while ago I published a note explaining how it was possible to find queries which ran faster if you manually de-coupled the index and table accesses. Here’s a further example that came up in discussion on a client site recently. The query looks something like this (at least in concept, although it was a little more complex, with some messy bits around the edges):
(more…)

April 19, 2011

More CR

Filed under: Infrastructure,Oracle,Performance,Read Consistency,Troubleshooting,undo — Jonathan Lewis @ 6:32 pm UTC Apr 19,2011

Following on from yesterday’s post on consistent reads, I thought I’d make the point that the way you work can make an enormous difference to the amount of work you do. Here’s a silly little demo (in 10.2.0.3):
(more…)

April 18, 2011

Consistent Reads

Filed under: Infrastructure,Oracle,Performance,Read Consistency,Troubleshooting,undo — Jonathan Lewis @ 11:08 am UTC Apr 18,2011

Here’s a quick demo to make a point about consistent reads (prompted by a question on the Oracle-L mailing list):
(more…)

March 29, 2011

Repetition

Filed under: Oracle,Performance,Troubleshooting — Jonathan Lewis @ 9:42 pm UTC Mar 29,2011

One of the problems of building models of Oracle activity is that it’s easy to build the wrong model. One of the commonest issues appears with repetitive actions – how do you write code that repeats a simple action many times in a row. It’s often enough to write a simple pl/sql loop but there are cases where a pl/sql loop behaves very differently from a long list of individual SQL statements – which is why I’ve occasionally used a very simple-minded approach to avoid that particular trap.

 

(more…)

March 24, 2011

Small Tables

Filed under: Infrastructure,Oracle,Performance,Troubleshooting — Jonathan Lewis @ 6:45 pm UTC Mar 24,2011

Here’s a note I’ve been meaning to research and write up for more than 18 months – ever since Dion Cho pinged a note I’d written about the effects of partitioning because of a comment it made about the “2% small table threshold”.

It has long been an item of common knowledge that Oracle has a “small table threshold” that allows for special treatment of data segments that are smaller than two percent of the size of the buffer cache, viz:

If a table is longer than the threshold then a full tablescan of the table will only use db_file_multiblock_read_countbuffers at the end of the LRU(least recently used) chain to read the table and (allowing a little inaccuracy for multi-user systems, pinning and so on) keeps recycling the same few buffers to read the table thus protecting the bulk of the buffer cache from being wiped out by a single large tablescan. Such a tablescan would be recorded under the statistic “table scans (long tables)”.

If a table is shorter than the threshold then it is read to the midpoint of the cache (just like any other block read) but – whether by accident or design – the touch count (x$bh.tch) is not set and the table will fall off the LRU end of the buffer cache fairly promptly as other objects are read into the buffer. Such a tablescan would be recorded under the statistic “table scans (short tables)”.

Then, in July 2009, Dion Cho decided to check this description before repeating it, and set about testing it on Oracle 10gR2 – producing some surprising results and adding another item to my to-do list. Since then I have wanted to check his conclusions, check whether the original description had ever been true and when (or if) it had changed.

As a simple starting point, of course, it was easy to check the description of the relevant (hidden) parameter to see when it changed:

8.1.7.4     _small_table_threshold        threshold level of table size for forget-bit enabled during scan
9.2.0.4     _small_table_threshold        threshold level of table size for direct reads
11.2.0.1    _small_table_threshold        lower threshold level of table size for direct reads

This suggests that the behaviour might have changed some time in 9i (9.2.0.4 happened to be the earliest 9i listing of x$ksppi I had on file) – so I clearly had at least three major versions to check.

The behaviour of the cache isn’t an easy thing to test, though, because there are a number of special cases to consider – in particular the results could be affected by the positioning of the “mid-point” marker (x$kcbwds.cold_hd) that separates the “cold” buffers from the “hot” buffers. By default the hot portion of the default buffer is 50% of the total cache (set by hidden parameter _db_percent_hot_default) but on instance startup or after a “flush buffer cache” there are no used buffers so the behaviour can show some anomalies.

So here’s the basic strategy:

  1. Start the instance
  2. Create a number of relatively small tables with no indexes
  3. Create a table large enough to come close to filling the cache, with an index to allow indexed access
  4. Collect stats on the table – largest last.
  5. Query the large table through an index range scan to fill the cache
  6. Repeat a couple of times with at least 3 second pauses to allow for incrementing the touch count
  7. Check x$bh for buffer usage
  8. Run repeated tablescans for the smaller tables to see how many blocks end up in the cache at different sizes.

Here’s some sample code:

create table t_15400
pctfree 99
pctused 1
as
with generator as (
	select	--+ materialize
		rownum id
	from dual
	connect by
		rownum <= 10000
)
select
	rownum			id,
	lpad(rownum,10,'0')	small_vc,
	rpad('x',100)		padding
from
	generator	v1,
	generator	v2
where
	rownum <= 15400
;

create index t_15400_id on t_15400(id);

begin
	dbms_stats.gather_table_stats(
		ownname		 => user,
		tabname		 =>'T_15400',
		estimate_percent => 100,
		method_opt 	 => 'for all columns size 1'
	);
end;
/

select
	object_name, object_id, data_object_id
from
	user_objects
where
	object_name in  (
		'T_300',
		'T_770',
		'T_1540',
		'T_3750',
		'T_7700',
		'T_15400',
		'T_15400_ID'
	)
order by
	object_id
;

select
	/*+ index(t) */
	max(small_vc)
from
	t_15400 t
where
	id > 0
;

The extract shows the creation of just the last and largest table I created and collected statistics for – and it was the only one with an index. I chose the number of blocks (I’ve rigged one row per block) because I had set up a db_cache_size of 128MB on my 10.2.0.3 Oracle instance and this had given me 15,460 buffers.

As you can see from the query against user_objects my test case included tables with 7,700 rows (50%), 3,750 rows (25%), 1,540 rows (10%), 770 rows (5%) and 300 rows (2%). (The number in brackets are the approximate sizes of the tables – all slightly undersized – relative to the number of buffers in the default cache).

Here’s the query that I then ran against x$bh (connected as sys from another session) to see what was in the cache (the range of values needs to be adjusted to cover the range of object_id reported from user_objects):

select
	obj, tch, count(*)
from	x$bh
where
	obj between 77710 and 77720
group by
	obj, tch
order by
	count(*)
;

Typical results from 10.2.0.3

After executing the first index range scan of t_15400 to fill the cache three times:

       OBJ        TCH   COUNT(*)
---------- ---------- ----------
     75855          0          1
     75854          0          1
     75853          0          1
     75851          0          1
     75850          0          1
     75849          0          1
     75852          0          1
     75855          2          9    -- Index blocks, touch count incremented
     75855          1         18    -- Index blocks, touch count incremented
     75854          1      11521    -- Table blocks, touch count incremented

Then after three tablescans, at 4 second intervals, of the 7,700 block table:

       OBJ        TCH   COUNT(*)
---------- ---------- ----------
     75853          3          1    -- segment header of 7700 table, touch count incremented each time
     75855          0          1
     75854          0          1
     75852          0          1
     75849          0          1
     75850          0          1
     75851          0          1
     75855          2          9
     75855          1         10
     75853          0       3991    -- lots of blocks from 7700 table, no touch count increment
     75854          1       7538

Then repeating the tablescan of the 3,750 block table three times:


       OBJ        TCH   COUNT(*)
---------- ---------- ----------
     75853          3          1
     75855          0          1
     75854          0          1
     75851          0          1
     75852          3          1    -- segment header block, touch count incremented each time
     75849          0          1
     75850          0          1
     75855          2          9
     75855          1         10
     75853          0        240
     75852          0       3750    -- table completely cached - touch count not incremented
     75854          1       7538

Then repeating the tablescan of the 1,540 block table three times:

       OBJ        TCH   COUNT(*)
---------- ---------- ----------
     75853          3          1
     75855          0          1
     75854          0          1
     75851          3          1    -- segment header block, touch count incremented each time
     75849          0          1
     75850          0          1
     75852          3          1
     75855          2          9
     75855          1         10
     75853          0        149
     75851          2       1540    -- Table fully cached, touch count incremented but only to 2
     75852          0       2430
     75854          1       7538

Then executing the tablescan of the 770 block table three times:

       OBJ        TCH   COUNT(*)
---------- ---------- ----------
     75853          3          1
     75855          0          1
     75850          3          1    -- segment header block, touch count incremented each time
     75849          0          1
     75851          3          1
     75852          3          1
     75854          0          1
     75855          2          9
     75855          1         10
     75851          0         69
     75853          0        149
     75850          2        770    -- Table fully cached, touch count incremented but only to 2
     75851          2       1471
     75852          0       1642
     75854          1       7538

Finally executing the tablescan of the 300 block table three times:

       OBJ        TCH   COUNT(*)
---------- ---------- ----------
     75853          3          1
     75855          0          1
     75854          0          1
     75850          3          1
     75852          3          1
     75851          3          1
     75855          2          9
     75855          1         10
     75851          0         69
     75850          0        131
     75853          0        149
     75849          3        301	-- Table, and segment header, cached and touch count incremented 3 times
     75850          2        639
     75852          0       1342
     75851          2       1471
     75854          1       7538

This set of results on its own isn’t conclusive, of course, but the indications for 10.2.0.3 are:

  • “Large” tablescans don’t increment the touch count – so avoiding promotion to the hot portion of the buffer
  • There is a 25% boundary (ca. 3750 in this case) above which a tablescan will start to recycle the buffer it has used
  • There is a 10% boundary (ca. 1540 in this case) below which repeating a scan WILL increment the touch count
  • There is a 2% boundary (ca. 300 in this case) below which tablescans will always increment the touch count.

I can’t state with any certainty where the used and recycled buffers might be, but since blocks from the 3750 tablescan removed the blocks from the 7700 tablescan, it’s possible that “large” tablescans do somehow go “to the bottom quarter” of the LRU.

There also some benefit in checking the statistics “table scans (short)” and “table scans (long)” as the tests run. For the 2% (300 block) table I recorded 3 short tablescans; for the tables in the 2% to 10% range (770 and 1540) I recorded one long and two short (which is consistent with the touch count increment of 2 – the first scan was expected to be long, but the 2nd and 3rd were deemed to be short based on some internal algorithm about the tables being fully cached); finally for the tables above 10% we always got 3 long tablescans.

But as it says in the original note on small partitions – there are plenty of questions still to answer:

I’ve cited 2%, 10%, and 25% and only one of these is set by a parameter (_small_table_threshold is derived as 2% of the db_cache_size - in simple cases) are the other figures derived, settable, or hard-coded.

I’ve quoted the 2% as the fraction of the db_cache_size – but we have automatic SGA management in 10g, automatic memory management in 11g, and up to eight different cache sizing parameters in every version from 9i onwards. What figure is used as the basis for the 2%, and is that 2% of the blocks or 2% of the bytes, and if you have multiple block sizes does each cache perhaps allow 2% of its own size.

And then, in 11g we have to worry about automatic direct path serial tablescans – and it would be easy to think that the “_small_table_threshold” may have been describing that feature since (at least) 9.2.0.4 if its description hadn’t changed slightly for 11.2 !

So much to do, so little time — but at least you know that there’s something that needs careful investigation if you’re planning to do lots of tablescans.

Footnote: Having written some tests, it’s easy to change versions. Running on 8.1.7.4 and 9.2.0.8, with similar sized caches, I could see that the “traditional” description of the small_table_threshold was true – a short tablescan was anything less 2% of the buffer cache, long tablescans were (in effect) done using just a window of db_file_multiblock_read_count buffers, and in both cases the touch count was never set (except for the segment header block). (The reference to “direct reads” in the parameter name for 9i may be related to parallel query, rather than serial direct reads – but that’s a topic for another investigation.)

Further reading:

Tanel Poder on the _small_table_threshold, _serial_direct_read, and (for 11.2.0.3) _direct_read_decision_statistics_driven. (Uses optimizer block count, not segment header block count). Includes link to a video about serial direct path reads and the fast object checkpoints and the “object cache queue” structures X$KCBOQH (one row per data_object_id per working data set currently in buffer cache) and X$KCBOBH (one row for each buffer in an object queue)

Alex Fatkulin on serial direct path reads.

March 3, 2011

Index Rebuilds

Filed under: Index Rebuilds,Indexing,Infrastructure,Performance — Jonathan Lewis @ 6:43 pm UTC Mar 3,2011

A couple of days ago I found several referrals coming in from a question about indexing on the Russian Oracle Forum. Reading the thread I found a pointer to a comment I’d written for the Oracle-L list server a couple of years ago about Advanced Queueing and why you might find that it was necessary to rebuild the IOTs (index organized tables) that support AQ.

The queue tables are, of course, a perfect example of what I call the “FIFO” index so it’s not a surprise that they might need special consideration. Rather than rewrite the whole note I’ll just link to it from here. (One of the notes in the rest of the Oracle-L thread also points to MOS document 271855.1 which describes the whys and hows of rebuilding AQ tables.)

November 19, 2010

Quiz Night

Filed under: Bugs,Execution plans,Oracle,Performance,Troubleshooting — Jonathan Lewis @ 6:00 pm UTC Nov 19,2010

Apart from the fact that the “Rows” figure for the FILTER operation at line 6 is blank, what’s the obvious error in this extract from an execution plan:
(more…)

November 14, 2010

Local Indexes – 2

Filed under: CBO,Partitioning,Performance — Jonathan Lewis @ 5:42 pm UTC Nov 14,2010

In the previous note on local indexes I raised a couple of questions about the problems of different partitions holding different volumes of data, and supplied a script to build some sample data that produced the following values for blevel across the partitions of a list-partitioned table.

(more…)

October 11, 2010

Distributed Objects

Filed under: distributed,Performance,Troubleshooting — Jonathan Lewis @ 7:12 pm UTC Oct 11,2010

I recently came across a tidy solution to a common problem – how to minimise code maintenance in a procedure while maximising flexibility of the procedure. The task was fairly simple – create a ref cursor for a calling program to return data that (a) followed complex selection rules and (b) allowed the user to specify numerous types of input.

The principle was simple – the final ref cursor was driven by a list of (say) order ids – and the details to be returned about those orders required some fairly complex SQL to execute. To separate the complexity of constructing the list of columns from the complexity of identifying the required rows the developers had split the procedure into two stages. First, select the list of relevant order ids using one of several possible statements – the appropriate statement being derived from analysis of the inputs to the procedure; secondly open a ref cursor using that list of order ids. In this way if a new set of rules for selection appeared the only new code needed was a new query to select the ids – the main body of code didn’t need to be modified and re-optimised.
(more…)

October 7, 2010

Distributed Pipelines

Filed under: distributed,Performance — Jonathan Lewis @ 6:06 pm UTC Oct 7,2010

In an article that I wrote about the /*+ driving_site */ hint a few months ago I pointed out that the hint was not supposed to work with “create table as select” (CTAS) and “insert as select”. One of the people commenting on the note mentioned pipelined function as a workaround to this limitation – and I’ve finally got around to writing a note about the method.

The idea is simple. If you can write a distributed select statement that takes advantage of the /*+ driving_site */ hint to work efficiently, you can wrap the statement in a pl/sql cursor loop and stick that loop into a pipelined function to maximise the efficiency of create or insert as select. Here’s some sample code (tested on 11.1.0.6) to demonstrate the principle:
(more…)

September 30, 2010

Rownum effects

Filed under: CBO,Performance,Troubleshooting — Jonathan Lewis @ 6:42 pm UTC Sep 30,2010

Here’s a hidden threat in the optimizer strategy that may cause performance problems if you’re trying to operate a series of batch updates (or batch deletes).

In the past I’ve pointed out that a predicate like “rownum <= N" generally makes the optimizer use “first_rows(N)” optimisation methods – known in the code as first_k_rows optimisation.

This isn’t true for updates and deletes, as the following simple example indicates:
(more…)

September 19, 2010

Index degeneration

Filed under: Indexing,Performance,Troubleshooting,Tuning — Jonathan Lewis @ 11:12 am UTC Sep 19,2010

There’s a thread on OTN that talks about a particular deletion job taking increasing amounts of time each time it is run.

It looks like an example where some thought needs to go into index maintenance and I’ve contributed a few comments to the thread – so this is a lazy link so that I don’t have to repeat myself on the blog.

August 29, 2010

Fair Comparison

Filed under: Performance — Jonathan Lewis @ 6:19 pm UTC Aug 29,2010

From time to time someone will post a question about query performance on the OTN database forum asking why one form of a query returns data almost immediately while another form of the query takes minutes to return the data.

Obviously there are all sorts of reasons – the optimizer is not perfect, and different transformations may take place that really do result in a huge differences in work done by two queries which return the same result set – but a very simple reason that can easily be overlooked is the front-end tool being used to run the query.
(more…)

« Previous PageNext Page »

Theme: Rubric. Blog at WordPress.com.

Follow

Get every new post delivered to your Inbox.

Join 1,438 other followers