Oracle Scratchpad

May 26, 2011

Interval Partitions

Filed under: Infrastructure,Oracle,Partitioning — Jonathan Lewis @ 12:07 pm UTC May 26,2011

Here’s a little gem from the OTN database forum – why should the CPU usage of querying an “interval partitioned” table depend on the size of something that’s missing ?

(The most significant symptom is identified in a reply I wrote to the thread – but that’s only a step in the right direction).

May 25, 2011

audit

Filed under: audit,Bugs,Infrastructure,Oracle,Troubleshooting — Jonathan Lewis @ 6:41 pm UTC May 25,2011

Here’s one of those funny little details that can cause confusion:

SQL> select * from user_audit_object;

no rows selected
 
SQL> audit select on indjoin by session whenever successful;

Audit succeeded.

SQL> select
  2     count(*)
  3  from
  4     indjoin         ij
  5  where
  6     id between 100 and 200
  7  and        val between 50 and 150
  8  ;

  COUNT(*)
----------
        51

1 row selected.

SQL> select * from user_audit_object where obj_name = 'INDJOIN';

no rows selected

(more…)

May 20, 2011

ASSM

Filed under: ASSM,Infrastructure,Oracle — Jonathan Lewis @ 5:16 pm UTC May 20,2011

Here’s an interesting little detail (obvious AFTER the event) about space management with ASSM (automatic segment space management). It starts with this question on OTN:

When I alter table deallocate unused and keep 1K the object ends up with 24 blocks, even after I’ve truncated the table. Why?
This is in a tablespace using ASSM, with locally managed extents set to use automatic (system) allocation.

Ultimately the answer is – the first extent in this table started life at 8MB, and an extent that large needs to have 16 level 1 bitmap (space management) blocks, one level 2 bitmap block, and the segment header block before you get to data blocks. When you truncate and deallocate Oracle doesn’t recreate the map, so the extent has to start with 18 blocks – round that up to the multiple of 8 blocks (the 64KB that Oracle normally uses for starting extents for small objects) and you get the 24 blocks from the question.

It took us a bit of time to get to the right answer on the thread – and that’s why I’m giving you the quick answer.

May 13, 2011

Overflow

Filed under: Infrastructure,Oracle,Troubleshooting — Jonathan Lewis @ 6:20 pm UTC May 13,2011

Here’s a little gem I hadn’t come across before (because I hadn’t read the upgrade manuals). Try running the following pl/sql block in 9i, and then 10g (or later):

declare
        v1      number(38);
begin
        v1 := 256*256*256*256;
        dbms_output.put_line(v1);
end;
/

In 9i the result is 4294967296; but for later versions the result is:


declare
*
ERROR at line 1:
ORA-01426: numeric overflow
ORA-06512: at line 4

It’s not a bug, it’s expected behaviour. The expression consists of integers only, so Oracle uses INTEGER arithmetic that limits the result to roughly 9 significant figures. If you want the block to work in newer versions of Oracle you have to add a decimal point to (at least) one of the operands to make Oracle use NUMBER arithmetic that takes it up to roughly 38 significant figures.

May 8, 2011

Consistent Gets

Filed under: Infrastructure,Oracle,Read Consistency,Troubleshooting,undo — Jonathan Lewis @ 10:09 am UTC May 8,2011

There’s an interesting question on the OTN database forum at present – why does an update of 300,000 rows take a billion buffer visits. (There are 25 indexes on the table – so you might point a finger at that initially, but only one of the indexes is going to be changed by the update so that should only account for around an extra 10 gets per row in a clean environment.)
(more…)

April 19, 2011

More CR

Filed under: Infrastructure,Oracle,Performance,Read Consistency,Troubleshooting,undo — Jonathan Lewis @ 6:32 pm UTC Apr 19,2011

Following on from yesterday’s post on consistent reads, I thought I’d make the point that the way you work can make an enormous difference to the amount of work you do. Here’s a silly little demo (in 10.2.0.3):
(more…)

April 18, 2011

Consistent Reads

Filed under: Infrastructure,Oracle,Performance,Read Consistency,Troubleshooting,undo — Jonathan Lewis @ 11:08 am UTC Apr 18,2011

Here’s a quick demo to make a point about consistent reads (prompted by a question on the Oracle-L mailing list):
(more…)

April 5, 2011

Rows per block

Filed under: Infrastructure,Oracle,Statistics,Troubleshooting — Jonathan Lewis @ 6:22 pm UTC Apr 5,2011

A recent question on the OTN database forum:

Can any one please point to me a document or a way to calculate the average number of rows per block in oralce 10.2.0.3

One answer would be to collect stats and then approximate as block / avg_row_len – although you have to worry about things like row overheads, the row directory, and block overheads before you can be sure you’ve got it right. On top of this, the average might not be too helpful anyway. So here’s another (not necessarily fast) option that gives you more information about the blocks that have any rows in them (I picked the source$ table from a 10g system because source$ is often good for some extreme behaviour).

(more…)

March 24, 2011

Small Tables

Filed under: Infrastructure,Oracle,Performance,Troubleshooting — Jonathan Lewis @ 6:45 pm UTC Mar 24,2011

Here’s a note I’ve been meaning to research and write up for more than 18 months – ever since Dion Cho pinged a note I’d written about the effects of partitioning because of a comment it made about the “2% small table threshold”.

It has long been an item of common knowledge that Oracle has a “small table threshold” that allows for special treatment of data segments that are smaller than two percent of the size of the buffer cache, viz:

If a table is longer than the threshold then a full tablescan of the table will only use db_file_multiblock_read_countbuffers at the end of the LRU(least recently used) chain to read the table and (allowing a little inaccuracy for multi-user systems, pinning and so on) keeps recycling the same few buffers to read the table thus protecting the bulk of the buffer cache from being wiped out by a single large tablescan. Such a tablescan would be recorded under the statistic “table scans (long tables)”.

If a table is shorter than the threshold then it is read to the midpoint of the cache (just like any other block read) but – whether by accident or design – the touch count (x$bh.tch) is not set and the table will fall off the LRU end of the buffer cache fairly promptly as other objects are read into the buffer. Such a tablescan would be recorded under the statistic “table scans (short tables)”.

Then, in July 2009, Dion Cho decided to check this description before repeating it, and set about testing it on Oracle 10gR2 – producing some surprising results and adding another item to my to-do list. Since then I have wanted to check his conclusions, check whether the original description had ever been true and when (or if) it had changed.

As a simple starting point, of course, it was easy to check the description of the relevant (hidden) parameter to see when it changed:

8.1.7.4     _small_table_threshold        threshold level of table size for forget-bit enabled during scan
9.2.0.4     _small_table_threshold        threshold level of table size for direct reads
11.2.0.1    _small_table_threshold        lower threshold level of table size for direct reads

This suggests that the behaviour might have changed some time in 9i (9.2.0.4 happened to be the earliest 9i listing of x$ksppi I had on file) – so I clearly had at least three major versions to check.

The behaviour of the cache isn’t an easy thing to test, though, because there are a number of special cases to consider – in particular the results could be affected by the positioning of the “mid-point” marker (x$kcbwds.cold_hd) that separates the “cold” buffers from the “hot” buffers. By default the hot portion of the default buffer is 50% of the total cache (set by hidden parameter _db_percent_hot_default) but on instance startup or after a “flush buffer cache” there are no used buffers so the behaviour can show some anomalies.

So here’s the basic strategy:

  1. Start the instance
  2. Create a number of relatively small tables with no indexes
  3. Create a table large enough to come close to filling the cache, with an index to allow indexed access
  4. Collect stats on the table – largest last.
  5. Query the large table through an index range scan to fill the cache
  6. Repeat a couple of times with at least 3 second pauses to allow for incrementing the touch count
  7. Check x$bh for buffer usage
  8. Run repeated tablescans for the smaller tables to see how many blocks end up in the cache at different sizes.

Here’s some sample code:

create table t_15400
pctfree 99
pctused 1
as
with generator as (
	select	--+ materialize
		rownum id
	from dual
	connect by
		rownum <= 10000
)
select
	rownum			id,
	lpad(rownum,10,'0')	small_vc,
	rpad('x',100)		padding
from
	generator	v1,
	generator	v2
where
	rownum <= 15400
;

create index t_15400_id on t_15400(id);

begin
	dbms_stats.gather_table_stats(
		ownname		 => user,
		tabname		 =>'T_15400',
		estimate_percent => 100,
		method_opt 	 => 'for all columns size 1'
	);
end;
/

select
	object_name, object_id, data_object_id
from
	user_objects
where
	object_name in  (
		'T_300',
		'T_770',
		'T_1540',
		'T_3750',
		'T_7700',
		'T_15400',
		'T_15400_ID'
	)
order by
	object_id
;

select
	/*+ index(t) */
	max(small_vc)
from
	t_15400 t
where
	id > 0
;

The extract shows the creation of just the last and largest table I created and collected statistics for – and it was the only one with an index. I chose the number of blocks (I’ve rigged one row per block) because I had set up a db_cache_size of 128MB on my 10.2.0.3 Oracle instance and this had given me 15,460 buffers.

As you can see from the query against user_objects my test case included tables with 7,700 rows (50%), 3,750 rows (25%), 1,540 rows (10%), 770 rows (5%) and 300 rows (2%). (The number in brackets are the approximate sizes of the tables – all slightly undersized – relative to the number of buffers in the default cache).

Here’s the query that I then ran against x$bh (connected as sys from another session) to see what was in the cache (the range of values needs to be adjusted to cover the range of object_id reported from user_objects):

select
	obj, tch, count(*)
from	x$bh
where
	obj between 77710 and 77720
group by
	obj, tch
order by
	count(*)
;

Typical results from 10.2.0.3

After executing the first index range scan of t_15400 to fill the cache three times:

       OBJ        TCH   COUNT(*)
---------- ---------- ----------
     75855          0          1
     75854          0          1
     75853          0          1
     75851          0          1
     75850          0          1
     75849          0          1
     75852          0          1
     75855          2          9    -- Index blocks, touch count incremented
     75855          1         18    -- Index blocks, touch count incremented
     75854          1      11521    -- Table blocks, touch count incremented

Then after three tablescans, at 4 second intervals, of the 7,700 block table:

       OBJ        TCH   COUNT(*)
---------- ---------- ----------
     75853          3          1    -- segment header of 7700 table, touch count incremented each time
     75855          0          1
     75854          0          1
     75852          0          1
     75849          0          1
     75850          0          1
     75851          0          1
     75855          2          9
     75855          1         10
     75853          0       3991    -- lots of blocks from 7700 table, no touch count increment
     75854          1       7538

Then repeating the tablescan of the 3,750 block table three times:


       OBJ        TCH   COUNT(*)
---------- ---------- ----------
     75853          3          1
     75855          0          1
     75854          0          1
     75851          0          1
     75852          3          1    -- segment header block, touch count incremented each time
     75849          0          1
     75850          0          1
     75855          2          9
     75855          1         10
     75853          0        240
     75852          0       3750    -- table completely cached - touch count not incremented
     75854          1       7538

Then repeating the tablescan of the 1,540 block table three times:

       OBJ        TCH   COUNT(*)
---------- ---------- ----------
     75853          3          1
     75855          0          1
     75854          0          1
     75851          3          1    -- segment header block, touch count incremented each time
     75849          0          1
     75850          0          1
     75852          3          1
     75855          2          9
     75855          1         10
     75853          0        149
     75851          2       1540    -- Table fully cached, touch count incremented but only to 2
     75852          0       2430
     75854          1       7538

Then executing the tablescan of the 770 block table three times:

       OBJ        TCH   COUNT(*)
---------- ---------- ----------
     75853          3          1
     75855          0          1
     75850          3          1    -- segment header block, touch count incremented each time
     75849          0          1
     75851          3          1
     75852          3          1
     75854          0          1
     75855          2          9
     75855          1         10
     75851          0         69
     75853          0        149
     75850          2        770    -- Table fully cached, touch count incremented but only to 2
     75851          2       1471
     75852          0       1642
     75854          1       7538

Finally executing the tablescan of the 300 block table three times:

       OBJ        TCH   COUNT(*)
---------- ---------- ----------
     75853          3          1
     75855          0          1
     75854          0          1
     75850          3          1
     75852          3          1
     75851          3          1
     75855          2          9
     75855          1         10
     75851          0         69
     75850          0        131
     75853          0        149
     75849          3        301	-- Table, and segment header, cached and touch count incremented 3 times
     75850          2        639
     75852          0       1342
     75851          2       1471
     75854          1       7538

This set of results on its own isn’t conclusive, of course, but the indications for 10.2.0.3 are:

  • “Large” tablescans don’t increment the touch count – so avoiding promotion to the hot portion of the buffer
  • There is a 25% boundary (ca. 3750 in this case) above which a tablescan will start to recycle the buffer it has used
  • There is a 10% boundary (ca. 1540 in this case) below which repeating a scan WILL increment the touch count
  • There is a 2% boundary (ca. 300 in this case) below which tablescans will always increment the touch count.

I can’t state with any certainty where the used and recycled buffers might be, but since blocks from the 3750 tablescan removed the blocks from the 7700 tablescan, it’s possible that “large” tablescans do somehow go “to the bottom quarter” of the LRU.

There also some benefit in checking the statistics “table scans (short)” and “table scans (long)” as the tests run. For the 2% (300 block) table I recorded 3 short tablescans; for the tables in the 2% to 10% range (770 and 1540) I recorded one long and two short (which is consistent with the touch count increment of 2 – the first scan was expected to be long, but the 2nd and 3rd were deemed to be short based on some internal algorithm about the tables being fully cached); finally for the tables above 10% we always got 3 long tablescans.

But as it says in the original note on small partitions – there are plenty of questions still to answer:

I’ve cited 2%, 10%, and 25% and only one of these is set by a parameter (_small_table_threshold is derived as 2% of the db_cache_size - in simple cases) are the other figures derived, settable, or hard-coded.

I’ve quoted the 2% as the fraction of the db_cache_size – but we have automatic SGA management in 10g, automatic memory management in 11g, and up to eight different cache sizing parameters in every version from 9i onwards. What figure is used as the basis for the 2%, and is that 2% of the blocks or 2% of the bytes, and if you have multiple block sizes does each cache perhaps allow 2% of its own size.

And then, in 11g we have to worry about automatic direct path serial tablescans – and it would be easy to think that the “_small_table_threshold” may have been describing that feature since (at least) 9.2.0.4 if its description hadn’t changed slightly for 11.2 !

So much to do, so little time — but at least you know that there’s something that needs careful investigation if you’re planning to do lots of tablescans.

Footnote: Having written some tests, it’s easy to change versions. Running on 8.1.7.4 and 9.2.0.8, with similar sized caches, I could see that the “traditional” description of the small_table_threshold was true – a short tablescan was anything less 2% of the buffer cache, long tablescans were (in effect) done using just a window of db_file_multiblock_read_count buffers, and in both cases the touch count was never set (except for the segment header block). (The reference to “direct reads” in the parameter name for 9i may be related to parallel query, rather than serial direct reads – but that’s a topic for another investigation.)

Further reading:

Tanel Poder on the _small_table_threshold, _serial_direct_read, and (for 11.2.0.3) _direct_read_decision_statistics_driven. (Uses optimizer block count, not segment header block count). Includes link to a video about serial direct path reads and the fast object checkpoints and the “object cache queue” structures X$KCBOQH (one row per data_object_id per working data set currently in buffer cache) and X$KCBOBH (one row for each buffer in an object queue)

Alex Fatkulin on serial direct path reads.

March 20, 2011

Upgrade Whoa

Filed under: Infrastructure,Troubleshooting — Jonathan Lewis @ 8:58 pm UTC Mar 20,2011

(The title’s a pun, by the way – an English form of humour that is not considered good unless it’s really bad.)

Very few people try to email me or call me with private problems – which is the way it should be, and I am grateful to my audience for realizing that this blog isn’t trying to compete with AskTom – but I do get the occasional communication and sometimes it’s an interesting oddity that’s worth a little time.

Today’s blog item is one such oddity – it was a surprise, it looked like a nasty change in behaviour, and it came complete with a description of environment, and a neatly formatted, complete, demonstration. For a discussion of the problem in Spanish you can visit the blog of John Ospino Rivas, who sent me the original email and has written his own blog post on the problem.
(more…)

March 18, 2011

ASSM ouch!

Filed under: ASSM,Infrastructure,Oracle — Jonathan Lewis @ 6:20 pm UTC Mar 18,2011

Here’s a nasty little surprise I got last week while investigating an oddity with stats collection. I wanted to create a table in an ASSM tablespace and populate it from two or three separate sessions simultaneously so that I could get some “sparseness” in the data load. So I created a table and ran up 17 concurrent sessions to insert a few rows each. Because I wanted to know where the rows were going I got every session to dump the bitmap space management block at the start of the segment – the results were surprising.
(more…)

March 16, 2011

buffer flush

Filed under: Infrastructure,Troubleshooting — Jonathan Lewis @ 6:34 pm UTC Mar 16,2011

This is part 2 of an article on the KEEP cache. If you haven’t got here from part one you should read that first for an explanation of the STATE and CUR columns of the output.

Here’s a little code to demonstrate some of the problems with setting a KEEP cache – I’ve set up a 16MB cache, which gives me 1,996 buffers of 8KB in 10.2.0.3, and then created a table that doesn’t quite fill that cache. The table is 1,900 data blocks plus one block for the segment header (I’ve used freelist management to make the test as predictable as possible, and fixed the pctfree to get one row per block).

(more…)

March 14, 2011

Buffer States

Filed under: Infrastructure,Troubleshooting — Jonathan Lewis @ 6:22 pm UTC Mar 14,2011

Here’s a bit of geek stuff that I’ve been meaning to write up for nearly a year – to the day, more or less – and I’ve finally been prompted to finish the job off by the re-appearance on the OTN database forum of the standard “keep cache” question:

    Why isn’t Oracle keeping an object “properly” when it’s smaller than the db_keep_cache_size and it has been assigned to the buffer_pool keep ?

This is a two-part note – and in the first part I’m just going to run a query and talk about the results. The query is one that has to be run by SYS because it references a couple of x$ structures, and this particular version of the query was engineered specifically for a particular client.
(more…)

March 3, 2011

Index Rebuilds

Filed under: Index Rebuilds,Indexing,Infrastructure,Performance — Jonathan Lewis @ 6:43 pm UTC Mar 3,2011

A couple of days ago I found several referrals coming in from a question about indexing on the Russian Oracle Forum. Reading the thread I found a pointer to a comment I’d written for the Oracle-L list server a couple of years ago about Advanced Queueing and why you might find that it was necessary to rebuild the IOTs (index organized tables) that support AQ.

The queue tables are, of course, a perfect example of what I call the “FIFO” index so it’s not a surprise that they might need special consideration. Rather than rewrite the whole note I’ll just link to it from here. (One of the notes in the rest of the Oracle-L thread also points to MOS document 271855.1 which describes the whys and hows of rebuilding AQ tables.)

February 21, 2011

Constraints

Filed under: Infrastructure,NULL,Oracle,Troubleshooting — Jonathan Lewis @ 7:15 pm UTC Feb 21,2011

There’s an important detail about constraints – check constraints in particular – that’s easy to forget, and likely to lead you into errors. Here’s a little cut-n-paste demo from an SQL*Plus session:

SQL> select count(*) from t1;

  COUNT(*)
----------
         4

1 row selected.

SQL>
SQL> alter table t1 add constraint t1_ck_v1 check (v1 in ('a','b','c'));

Table altered.

SQL>
SQL> select count(*) from t1 where v1 in ('a','b','c');

  COUNT(*)
----------
         3

1 row selected.

We count the number of rows in a table – and it’s four.
We add a constraint to restrict the values for a certain column – and every column survives the check.
We use the corresponding predicate to count the number of rows that match the check constraint – and we’ve lost a row !

Why ?

A predicate returns a row if it evaluates to TRUE.
A constraint allows a row if it does not evaluate to FALSE - which means it is allowed to evaluate to TRUE or to NULL.

I have one row in the table where v1 is null, and for that row the check constraint evaluates to NULL, which is not FALSE. So the row passes the check constraint, but doesn’t get returned by the predicate. I think it’s worth mentioning this difference because from time to time I see production systems that “lose” data because of this oversight.

To make the constraint consistent with the predicate I would probably add a NOT NULL declaration to the column or rewrite the constraint as (v1 is not null and v1 in (‘a’,'b’,'c’)).

« Previous PageNext Page »

Theme: Rubric. Blog at WordPress.com.

Follow

Get every new post delivered to your Inbox.

Join 1,392 other followers