Oracle Scratchpad

February 9, 2014

IOT Catalogue

Filed under: Infrastructure,IOT,Oracle — Jonathan Lewis @ 6:52 pm BST Feb 9,2014

I’ve just added a catalogue of Richard Foote’s articles on IOTs to the list I made a couple of years ago of Martin Widlake’s articles, so this is just a temporary note to point people to the updated list.

February 6, 2014

12c fixed subquery

Filed under: 12c,Execution plans,Oracle,subqueries — Jonathan Lewis @ 2:25 pm BST Feb 6,2014

Here’s a simple little demonstration of an enhancement to the optimizer in 12c that may result in some interesting changes in execution plans as cardinality estimates change from “guesses” to accurate estimates.

create table t1
with generator as (
	select	--+ materialize
		rownum id 
	from dual 
	connect by 
		level <= 1e4
	rownum				id,
	trunc(sysdate) + rownum		d1
	generator	v1

		ownname		 => user,
		tabname		 =>'T1',
		method_opt	 => 'for all columns size 1'

set autotrace traceonly explain

select * from t1 where id > (select 9900 from dual);

set autotrace off

Since there are no indexes in sight the execution plan has to be a tablescan. The interesting thing, though, is the optimizer’s prediction for the number of rows returned. If you look at the code you can work out that the actual result set should be 100; but Oracle has used a standard “guess” of 5% for predicates of the form “column greater than (scalar subquery)”, and sure enough, here’s the plan for

| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
|   0 | SELECT STATEMENT  |      |   500 |  2000 |     5   (0)| 00:00:01 |
|*  1 |  TABLE ACCESS FULL| T1   |   500 |  2000 |     3   (0)| 00:00:01 |
|   2 |   FAST DUAL       |      |     1 |       |     2   (0)| 00:00:01 |

Predicate Information (identified by operation id):
   1 - filter("ID"> (SELECT 9900 FROM "SYS"."DUAL" "DUAL"))

However, after upgrading to, the numbers change:

12c plan - shows correct cardinality estimate
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
|   0 | SELECT STATEMENT  |      |   100 |   400 |     5   (0)| 00:00:01 |
|*  1 |  TABLE ACCESS FULL| T1   |   100 |   400 |     3   (0)| 00:00:01 |
|   2 |   FAST DUAL       |      |     1 |       |     2   (0)| 00:00:01 |

Predicate Information (identified by operation id):
   1 - filter("ID"> (SELECT 9900 FROM "SYS"."DUAL" "DUAL"))

Shades of precompute_subquery – it looks like the optimizer recognizes that the subquery has to return a constant, and extracts that constant for use in the plan. Strangely, though, I can’t see any sign in the 10053 trace file of the subquery value being used until the plan just magically appears with the right estimate.

February 5, 2014


Filed under: Execution plans,Oracle,Troubleshooting,Tuning — Jonathan Lewis @ 5:42 pm BST Feb 5,2014

Here’s a little script to demonstrate an interesting observation that appeared in my email this morning (that’s morning Denver time):

create table t1
select * from all_objects where rownum = 1;

delete from t1;

create table t2
select * from all_objects where rownum <= 100000;

		ownname		 => user,
		tabname		 =>'T1',
		method_opt	 => 'for all columns size 1'

		ownname		 => user,
		tabname		 =>'T2',
		method_opt	 => 'for all columns size 1'

alter system flush buffer_cache;
alter session set events '10046 trace name context forever, level 8';

prompt  ======================
prompt  And now the test query
prompt  ======================

select * from t1
select * from t2

select * from table(dbms_xplan.display_cursor(null,null,'allstats last'));
alter session set events '10046 trace name context off';

Clearly the first query block in the test query will return no rows, and since the MINUS operator returns rows from the first result set that do not appear in the second result set there is no need for Oracle to run the second query block. Well, guess what …

The ‘create where rownum = 1′ followed by ‘delete’ is a lazy workaround to avoid side effects of deferred segment creation so that you can run the script on any (recent) version of Oracle. The flush, combined with 10046 trace, allowed me to see waits that showed which objects Oracle scanned and when, and the display_cursor() was just icing on the cake.

I’ve checked and, and both of them scan t1 first and then scan t2 unnecessarily.

This surprised me slightly given how smart the optimizer can be, but I guess it’s one of those boundary cases where the optimizer has just one strategy for an entire class of queries. I couldn’t think of any “legal” way to control the effect, but here’s the first dirty trick that came to my mind. If you’re sure that the first subquery is going to be cheap and you’re worried that the second subquery is expensive, you could do the following:

select v2.*
	(select * from t1 where rownum = 1)	v1,
		select * from t1
		select * from t2
	)	v2

Introduce a spurious query to return one row from the first subquery and join it do the MINUS query. If the inline view doesn’t return any rows Oracle short-circuits the join, as shown by the following execution path with stats:

| Id  | Operation              | Name | Starts | E-Rows | A-Rows |   A-Time   | Buffers | Reads  |  OMem |  1Mem | Used-Mem |
|   0 | SELECT STATEMENT       |      |      1 |        |      0 |00:00:00.01 |       2 |     32 |       |       |          |
|   1 |  MERGE JOIN CARTESIAN  |      |      1 |      1 |      0 |00:00:00.01 |       2 |     32 |       |       |          |
|   2 |   VIEW                 |      |      1 |      1 |      0 |00:00:00.01 |       2 |     32 |       |       |          |
|*  3 |    COUNT STOPKEY       |      |      1 |        |      0 |00:00:00.01 |       2 |     32 |       |       |          |
|   4 |     TABLE ACCESS FULL  | T1   |      1 |      1 |      0 |00:00:00.01 |       2 |     32 |       |       |          |
|   5 |   BUFFER SORT          |      |      0 |      1 |      0 |00:00:00.01 |       0 |      0 | 73728 | 73728 |          |
|   6 |    VIEW                |      |      0 |      1 |      0 |00:00:00.01 |       0 |      0 |       |       |          |
|   7 |     MINUS              |      |      0 |        |      0 |00:00:00.01 |       0 |      0 |       |       |          |
|   8 |      SORT UNIQUE       |      |      0 |      1 |      0 |00:00:00.01 |       0 |      0 | 73728 | 73728 |          |
|   9 |       TABLE ACCESS FULL| T1   |      0 |      1 |      0 |00:00:00.01 |       0 |      0 |       |       |          |
|  10 |      SORT UNIQUE       |      |      0 |  70096 |      0 |00:00:00.01 |       0 |      0 | 73728 | 73728 |          |
|  11 |       TABLE ACCESS FULL| T2   |      0 |  70096 |      0 |00:00:00.01 |       0 |      0 |       |       |          |

Predicate Information (identified by operation id):
   3 - filter(ROWNUM=1)

The only thing to watch out for is that the “rownum = 1″ doesn’t make the optimizer switch to an unsuitable “first_rows(1)” execution plan.

February 4, 2014


Filed under: Indexing,Oracle — Jonathan Lewis @ 1:33 am BST Feb 4,2014

To create an index on a table (with no existing indexes) Oracle has to start by doing a tablescan.

What’s the difference between the tablescan it uses for a B-tree index and the tablescan it uses for a bitmap index ? Why ?


I was going to give a hint that if you answered the “why” first that might lead you to the right idea and a test for the “what”, but we already have an answer, with a sample of proof.

February 2, 2014


Filed under: clusters,Infrastructure,Oracle,Troubleshooting — Jonathan Lewis @ 10:27 pm BST Feb 2,2014

It doesn’t matter which bit of Oracle technology you want to use, eventually someone, somewhere, runs into the special case where something nasty happens. Here’s an edge case for people using (index) clusters – Oracle Bug  17866999 ora-1499 for cluster following rman convert

It comes from a conversation on Oracle-L where Jack van Zanen reported a problem of inconsistent results after migrating data between platforms using rman to converts some tablespaces containing index clusters. This is the starting post where he shows a query that is clearly getting the wrong answer (select where channel_number = 503 obviously shouldn’t return data with channel_number 501).

Unfortunately the “audit-trail” on this problem is incomplete because I accidentally started sending email to Jack instead of the list (reply, rather than reply all) – but given that the problem column was the cluster key of an index cluster the obvious guess was that something had gone wrong as the data for this key value had grown and chained into a second (or third) block. To confirm this I asked Jack to select the rowids reported for the cluster and use the dbms_rowid package to convert them into distinct (file_number, block_number) values:

	dbms_rowid.rowid_relative_fno(rowid)	file_no,
	dbms_rowid.rowid_block_number(rowid)	block_no,
	count(*)				rows_in_block
	&m_table	t1
group by
order by

With a few file and block numbers in hand, we then needed to see the results from dumping the blocks. Jack sent me the trace files, and I found that my guess was correct – and also discovered that I had been on a private discussion, so I posted the results back to the list. As you can see from that posting, the cluster key (indicated by the initial K in the flag byte, and by its presence in “tab 0″ in the block) is 4 columns long, and I’ve got entries from two blocks where the cluster keys are pointing to each other – but the cluster keys don’t hold the same value.

It’s an extraordinary corruption – the data structures appear to be totally self-consistent and perfectly clean, and yet the data content is wrong.  (Usually I find that a corruption means the data becomes unusable).

Oracle eventually tracked this down and reproduced it; it’s a problem that can arise if you are using index clusters and use rman to do a cross-platform conversion – and it’s not going to be easy to fix it. I don’t suppose many people are likely to see this bug (it’s been around for years, apparently, but there aren’t many people who use clusters, and of those perhaps very few have migrated between platforms using rman) – but at present the only workaround is to export and reimport the data; which is not nice when you’re talking about terabytes of clusters.

February 1, 2014

Modify PK

Filed under: Indexing,Oracle — Jonathan Lewis @ 11:00 am BST Feb 1,2014

Sitting in the lounge waiting to be called for my flight I was musing on the 12c feature of having multiple indexes defined on the same ordered column set  when a thought crossed my mind and I decided to run a little test that looked like this:

create table t1 as select * from all_objects where rownum <= 10000;

create unique index t1_pk on t1(object_id);

alter table t1 add constraint t1_pk primary key(object_id);

create index t1_i1 on t1(object_id, object_name);

drop index t1_pk;

    expect ORA-02429: cannot drop index used for enforcement of unique/primary key

alter table t1 modify primary key using index t1_i1;

drop index t1_pk;

For years I’ve been assuming that you really have to mess around with the PK (and any related FKs) if you want to change the index supporting the primary key – but this code demonstrates that you can add a new index to a table and “move” the primary key to it before dropping the original index.

The worrying thing about this (for me, at any rate) is that it isn’t a new feature – after testing it on I started working backwards, and it works down to (the earlist 9i I have access to). It doesn’t work on, and the version behaves slightly differently from later versions because the original PK index disappears as the constraint is moved.

As I’ve often said about trust – keep an eye on the date and version of any article you read, it may no longer be true.

January 27, 2014


Filed under: Oracle — Jonathan Lewis @ 9:18 am BST Jan 27,2014

A recent “Hot topics” email from Oracle support listed the following bug as one which had recently been updated:


Since the optimizer is one of my pet topics I thought I’d take a quick look at what it said – and found this heart-warming introduction;

*** 11/03/13 03:46 am ***

Based on a blog article from the international recognized Oracle Expert Jonathan Lewis ...

If it’s on MoS surely it’s just got to be true!  (Yes, I know I’ve said the opposite in the past – but it’s definitely right some of the time)

The bug/blog in question was this one, and the problem is fixed in 12.2

January 26, 2014


Filed under: Bugs,Indexing,Oracle,Troubleshooting,Tuning — Jonathan Lewis @ 12:08 pm BST Jan 26,2014

I was involved in a thread on Oracle-L recently started with the question: “How many LIOs is too many LIOs”. Rather than rewrite the whole story, I’ve supplied a list of links to the contributions I made, in order – the final “answer” is actually the answer to a different question – but travels an interesting path to get there.#

I’ve got a script to emulate the requirement so that people can see for themselves the bug that I mention in post 15; I’ll try to add a couple of notes to it and publish it some time, but for the moment I’ll just remind myself that it’s called (slightly counter-intuitively: no_sort_problem.sql)

January 23, 2014


Filed under: Execution plans,Oracle,Performance — Jonathan Lewis @ 6:05 pm BST Jan 23,2014

Here’s a recent request from the OTN database forum – how do you make this query go faster (tkprof output supplied):

 select a.rowid
   from  a, b
   where A.MARK IS NULL
     and a.cntry_code = b.cntry_code and b.dir_code='XX' and b.numb_type='XXX'
     and upper(Trim(replace(replace(replace(replace(replace(replace(replace(a.co_name,'*'),'&'),'-'),'/'),')'),'('),' '))) like
         upper(Trim(substr(replace(replace(replace(replace(replace(replace(replace(b.e_name,'*'),'&'),'-'),'/'),')'),'('),' '),1,25)))||'%';

call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        1      0.01       0.01          0          0          0           0
Execute      2      0.00       0.00          0          0          0           0
Fetch        3   3025.53    3260.11       8367       7950          0          31
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total        6   3025.54    3260.13       8367       7950          0          31

Misses in library cache during parse: 1
Optimizer goal: CHOOSE
Parsing user id: 74

Rows     Row Source Operation
-------  ---------------------------------------------------
     31  HASH JOIN
 302790   INDEX FAST FULL SCAN OBJ#(39024) (object id 39024)   -- B 500,000 in the table
  55798   TABLE ACCESS FULL OBJ#(78942)                        -- A 175,000 in the table

-- and from some "explain plan" tool
SELECT STATEMENT  CHOOSE Cost: 333  Bytes: 52,355,940  Cardinality: 608,790 
3 HASH JOIN  Cost: 333  Bytes: 52,355,940  Cardinality: 608,790 
  1 INDEX FAST FULL SCAN UNIQUE B_PK Cost: 4  Bytes: 503,022  Cardinality: 12,898 
    2 TABLE ACCESS FULL A Cost: 215  Bytes: 3,150,034  Cardinality: 67,022

One thing you might note from the spartan tkprof output – this is an old version of Oracle ( to be exact).

The first thing to do is note that most of the time is spent on the CPU – and maybe that multiply cascading replace() has something to do with it.  Now replace() and translate() are things I use so rarely that I usually get them wrong first time, but I think the predicate could be replaced by:

upper(translate(a.co_name, 'x*&-/)( ', 'x')) like upper(substr(translate(b.e_name, 'x*&-/)( ', 'x'),1,25))||'%'

Beyond making the code slightly less eye-boggling, though, I don’t think this is going to help much. Consider the information we have about the sizes of the rowsources involved.

If we can trust the tkprof row counts as being the complete output from the first execution of the statement (there seem to have been 2 in the trace file) – we selected 300,000 rows from one table and 56,000 rows from the other and then joined them with a hash join. A hash join requires equality predicates, and the only join predicate in the query that could be used is the one “a.cntry_code = b.cntry_code”.

Now, if cntry_code is short for “country code” we have a scaling problem: there are only about 600 countries in the world, so on average each row in the A table (56,000 rows acquired) is going to find roughly 500 rows in the B table (300,000 rows divided across 600 countries). So at run time the hash join will generate a rowsource of at least 56,000 * 500 = 28 Million rows; then Oracle is going to do that complicated bit of textual manipulation on two columns, compare them, and find that ultimately only 31 rows match !

So how can we do less work ?

If we’re lucky we can make the hash join much more efficient by thinking about what that nasty textual predicate means. We compare to see if one string looks like it’s starting with the first 25 characters of the other string – but if it does then the two strings have to be identical on the first 25 characters, and a hash join works with equality. So let’s just add in a new predicate to see what effect it has:

upper(substr(translate(a.co_name, 'x*&-/)( ', 'x'),1,25)) = upper(substr(translate(b.e_name, 'x*&-/)( ', 'x'),1,25))

I’ve made the suggestion on the forum – now I’m waiting to see if it has a beneficial effect (or whether I’ve made a silly mistake in my logic or guesswork)

January 21, 2014


Filed under: Oracle — Jonathan Lewis @ 9:42 pm BST Jan 21,2014

Here’s a little detail I could do without in my database:

       owner, object_type, object_name
        object_name like '_'
order by
        object_name, object_type

--------------- ------------------- --------------------
APEX_030200     PROCEDURE           F
PUBLIC          SYNONYM             F
APEX_030200     PROCEDURE           G
APEX_030200     PROCEDURE           P
PUBLIC          SYNONYM             P
APEX_030200     FUNCTION            V
PUBLIC          SYNONYM             V
APEX_030200     PROCEDURE           Z
PUBLIC          SYNONYM             Z

9 rows selected.

Public names like P and F for procedures or functions are just not on (unless I create them myself).

January 17, 2014

Bitmap question

Filed under: bitmaps,Indexing,Oracle — Jonathan Lewis @ 7:06 pm BST Jan 17,2014

If you know anything about bitmap indexes you probably know that a single entry in a bitmap index takes the form (key_value, starting rowid, ending rowid, BBC compressed bit string). So an entry covers a single value for a column over a range of rowids  in the table, and the string of bits for that (notional) range is reduce to a minimum by a compression mechanism that eliminate repeated zeros in multiples of 8.

So here’s a question – to which I don’t know the answer, although you may be surprised when you try to find it:

If you have a very large table and in one of its columns the first row and the last row (and no others) hold the value 0 (say) and you create a bitmap index on this column, what’s the largest number of rows you could have in the table before Oracle would HAVE to create two index entries in order to cover both rows ?

Follow-up question – once you start getting close to working out the answer, can you think of a way to provide an example without actually creating a table with that many rows in it ?


January 14, 2014

Single block reads

Filed under: Infrastructure,Oracle,Performance — Jonathan Lewis @ 6:52 pm BST Jan 14,2014

When a “cache read” tablescan (or index fast full scan) takes place we generally expect to see waits on “db file scattered read” as Oracle performs multi-block reads to do the scan. But we all know that Oracle will skip over blocks that are already in the cache and can therefore end up doing multi-block reads of many different sizes, even down to the point where it does single block reads (waiting for “db file sequential read”).

A quirky little question came up on OTN a little while ago: “for a large table we expect multiblock reads to be positioned at the end of the LRU for prompt re-use; but when Oracle does a single block read as part of a tablescan does it go to the end of the LRU (because it’s part of a large tablescan) or does it go to the mid-point of the LRU (because it’s a single block read)?”

The description of how blocks are treated in a tablescan has been simplified, of course, but the question is still valid – so what’s the answer, and how (without going into an extreme level of detail) would you demonstrate it ?


January 13, 2014


Filed under: Infrastructure,Oracle — Jonathan Lewis @ 10:09 am BST Jan 13,2014

I see Tanel has just started a short series of articles on where the X$ data come from so, for those with time on their hands (?anyone?) here’s a little script I wrote a few years ago when I wanted to check which v$ objects corresponded to the same internal structure in the SGA: (as v$session and v$session_wait used to)

It starts with a function that has to be created in the SYS schema – so no doing this on production systems. The it’s a pipelined function so that we can treat its output like a table, which means I need to create an object type and a table type before creating the function.  In the function I select each x$ name from the list of x$ names (x$kqfta) and for each table I construct a dynamic SQL statement selecting the first row from the table.

Over the versions of Oracle, various bugs have resulted in deadlocks and crashes when selecting a row from particular X$ objects, so there a short list of exceptions that has grown gradually over time.  The code has some calls to dbms_output(), which you can choose to hide by setting serveroutput off.  Depending on your system you may see a very large number of X$ with no data in them.

create type jpl_row as
	object(x_address raw(8), x_name varchar2(30), x_indx number);

create type jpl_table as
	table of jpl_row;

create or replace function x_first
return jpl_table pipelined
/*	Created by x_link.sql	*/
--	Pipeline function to examine addresses of data in x$ objects
--	Uses types jpl_row and jpl_table to implement the pipeline
	dg_broker_problem exception;
	pragma EXCEPTION_INIT(dg_broker_problem, -16525);

	logminer_problem exception;
	pragma EXCEPTION_INIT(logminer_problem, -1306);

	logminer_problem2 exception;
	pragma EXCEPTION_INIT(logminer_problem2, -1307);

	mem_advisor_problem exception;
	pragma exception_init(mem_advisor_problem, -1007);

	deadlock exception;
	pragma EXCEPTION_INIT(deadlock, -60);

	no_table exception;
	pragma EXCEPTION_INIT(no_table, -942);

	m_addr		x$kqfta.addr%type;
	m_indx		x$kqfta.indx%type;
	m_table		varchar2(30);

	for c1 in (select kqftanam, indx from x$kqfta) loop
			execute immediate
				' select indx, addr, ''' || c1.kqftanam ||
				''' from ' || c1.kqftanam ||
				' where rownum = 1'
				into m_indx, m_addr, m_table;
				dbms_output.put_line(m_addr || ' ' || m_table || ' ' || c1.indx);
				pipe row (jpl_row(m_addr, m_table, c1.indx));
			when no_data_found then
				dbms_output.put_line('Warning: No data-' || ' ' || c1.kqftanam || ' ' || c1.indx);
			when no_table then
				dbms_output.put_line('Warning: No table-' || ' ' || c1.kqftanam || ' ' || c1.indx);
			when logminer_problem then
				dbms_output.put_line('Warning: logminer issue' || ' ' || c1.kqftanam || ' ' || c1.indx);
			when logminer_problem2 then
				dbms_output.put_line('Warning: logminer issue' || ' ' || c1.kqftanam || ' ' || c1.indx);
			when dg_broker_problem then
				dbms_output.put_line('Warning: DataGuard issue' || ' ' || c1.kqftanam || ' ' || c1.indx);
			when mem_advisor_problem then
				dbms_output.put_line('Warning: Memory Advisor' || ' ' || c1.kqftanam || ' ' || c1.indx);
			when deadlock then
				dbms_output.put_line('Warning: deadlock' || ' ' || c1.kqftanam || ' ' || c1.indx);
			when others then
				dbms_output.put_line('Warning: other' || ' ' || c1.kqftanam || ' ' || c1.indx);
	end loop;

show errors

Once the function has been created it’s a simple matter to call it, ordering the output by (for example) x_addr.

order by
	x_address, x_indx

X_ADDRESS        X_NAME                             X_INDX
---------------- ------------------------------ ----------
0000000000000000 X$KSUTM                                41
                 X$KSUSGIF                              42
                 X$KTTVS                               477
                 X$KDXHS                               487
                 X$KZSRO                               528
00000000099EC500 X$KGSKCP                              139
00000000099EC560 X$KGSKPP                              140
00000000099EC5D0 X$KGSKASP                             141
000000000BEE9560 X$KSFDFTYP                            119
000000000C0F5B1C X$KQDPG                               495
000000000C0F7660 X$KQFTVRTTST0                         863

0000000060000000 X$KSMMEM                               71
0000000060001190 X$KSMSP_DSNEW                          82
00000000600077CC X$KGICS                               513
00000000AAE1B588 X$KSQST                                32
00000000AAE35598 X$KSRPCIOS                            118
00007FFB03E92478 X$DBGDIREXT                           820
00007FFB04274F50 X$KSMSP                                75
00007FFB045D4E28 X$ACTIVECKPT                          270
00007FFB093A7B48 X$KXFPSST                             569
00007FFB093A9D48 X$KOCST                               638

473 rows selected.

This was 64-bit Oracle version – and I’ve only recorded data in 473 of the possible 950 x$ structures.
As far as Tanel’s series is concerned a key detail here is the way in which you can break the range into 4 chunks:

  • The zeros – x$ which don’t take any memory but simply return the result of a real-time function call.
  • The low range (up to C0F7660) which is a fixed data segment (whose technical name in x86-speak I forget) in the SGA
  • The mid range (60000000 to AAE35598) which is the SGA heap
  • The high range (from 7FFB03E9247 upwards) which is user space (PGA/UGA addressing)

As Tanel’s first post explains, and as you can infer from the sample listing above, when you query x$ksmsp, you are querying a structure in user space..



January 8, 2014

CR Trivia

Filed under: Infrastructure,Oracle,Performance — Jonathan Lewis @ 6:44 am BST Jan 8,2014

Everybody “knows” that when you do a tablescan of a table that it starts with two buffer gets on the segment header, though older versions (ca. 8i and earlier) of Oracle used to do 4 buffer gets on the segment header. The upshot of this is that many people still say that if you create a table and insert a single row then you’re going to get 3 buffer gets when you tablescan a table: two for the segment header and one for the data block:

So here’s a test, with the second set of autotrace stats which, for reasons I’ll describe shortly, may not be immediately reproducible on your system:

create table t1 (n1 number);
insert into t1 values(1);
execute dbms_stats.gather_table_stats(user,'t1');

set autotrace traceonly statistics
select * from t1;

          1  recursive calls
          0  db block gets
         22  consistent gets
          0  physical reads
          0  redo size
        540  bytes sent via SQL*Net to client
        544  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          1  rows processed

That really is 22 buffer gets to find the one row in the table. It gets worse; I used another session to insert (and commit) a second row in the table and the tablescan to return the two rows took 36 buffer gets.

If you want to get the same results you have to be a bit careful and a bit lucky. I was using ASSM (automatic segment space management) on a tablespace with an 8MB uniform extent size; when I inserted the single row into the table Oracle formatted a batch of 16 consecutive blocks in the first extent, deciding which range of blocks to format based on my process id. When I inserted my row, the block I inserted into was again dictated by my process id – this happened to be the 10th block in the formatted range.

When I ran the tablescan I did 6 gets on the segment’s space management blocks to work out what parts of the segment were formatted, then 10 gets to find the first row in the 10th block of the range, then a further 6 gets to scan the rest of the formatted range. I do not know why it takes 6 gets to read the space management blocks – but it may be two each for the segment header block, L2 bitmap block, and L1 bitmap block – it may be four on the segment header and one each on the other two blocks.

When I inserted a second row from another session (which I had set up very carefully), Oracle decided to format another batch of 16 blocks. This highlighted a little variation on what I’ve said so far. Because I had 8MB uniform extents the first 18 blocks of the first extent were all space management blocks (segment header, one L2 bitmap, and 16 L1 bitmaps – in the opposite order to that description). Oracle formats on boundaries of 16 blocks in the extent and in this case formatted the 14 blocks that would take it from block 19 to block 32 of the extent. That’s basically why my tablescan after the second insert took an extra 14 gets.

If you really do need to scan a “very small” table extremely frequently (and you shouldn’t, really) then it might be a good idea to check how many blocks have been formatted compared to the number of blocks used (dbms_rowid.rowid_block_number() will help with that check) and do an “alter table move” because in this special case Oracle won’t use the “format 16 blocks somewhere” strategy, it will format only as many blocks as needed starting from the first available block in the segment. For some reason the number of gets on space management blocks is also minimised in this case so that tablescan of a “single block” takes only 2 + number of blocks in table.

For ASSM segments Oracle maintains a Low High Water Mark (LHWM) and a High High Water Mark (HHWM). Every block below the LHWM is guaranteed to be formatted, blocks between the LHWM and HHWM will be formatted in batches of 16; so when doing a segment scan Oracle uses the largest multiblock-read it can from the start of the segment to the LHWM, then keeps checking the space management blocks to identify the batches of 16 that it can then read (and it can read consecutive formatted batches in a single read, so the reads are not necessarily limited to 16 blocks at a time).

One little side thought – because the choice of block and block range is dictated by the process id, the pattern of data insertion and contention using shared servers can be very different from the pattern produced by dedicated servers.


January 6, 2014

LOB changes

Filed under: Infrastructure,LOBs,Oracle,Troubleshooting — Jonathan Lewis @ 7:10 pm BST Jan 6,2014

It’s always useful to collect baseline information – especially when it helps you notice that the baseline has moved in a way that might explain the next performance problem you see. Here’s an example demonstrating the benefit.

I have a table with a LOB column that is stored out of line. Many years ago I decided that I wanted to compare how the redo generation varied as I change the LOB from cached to nocache (with nologging). So here was one of my simplest test scripts (stripped to a minimum):

create table t1 (
	id		number,
	cached_lob	clob
lob (cached_lob) store as text_lob_cache(
	disable storage in row 
--	nocache logging	

insert into t1
	rownum <= 1000

When I first ran this test (twice, one for each option) I noticed that cached LOBs recorded just the redo needed to describe the data that had been written into them (so at 1,000 rows of 4KB each that would be in the ballpark of 4MB with some overheads), but the nocache option created redo for the entire block (so nearer 8MB, plus overheads). There were, of course, all sorts of other side effects of the change – some of them very nasty – but this was one of the more important differences to note.

A little while ago I was prompted to repeat the test – so I ran the whole thing three times on,, and; it’s fascinating to see the way the key redo stats change:
Name                                                                     Value
----                                                                     -----
redo entries                                                             5,588
redo size                                                            5,288,836

redo entries                                                             5,578
redo size                                                            9,407,824
redo entries                                                             5,223
redo size                                                            5,277,520

redo entries                                                             5,220
redo size                                                            1,212,992
redo size for direct writes                                             52,000
redo entries                                                             5,231
redo size                                                            9,343,444

redo entries                                                             5,291
redo size                                                            1,301,700
redo size for direct writes                                             60,000

As you can see the results mirror the first results I got – the redo size is consistent with the volume of data when the LOBs are cached, and matches whole block dumps when they’re not cached – but something very clever seems to appear in the results – the redo for the lob component of nocache LOBs seems to have disappeared – there’s a drop of (roughly) 8,192,000 bytes from the 10g figures: is this a clever trick (filesystem-like logging ?) or is it a reporting error. Then 12c is even stranger – the redo for the cached LOBs now matches block logging (8MB + overheads) instead of data logging – and the redo for the nocache LOBs is still “missing”.

To be investigated further – possibly by dumping redo log files, but initially simply by watching when log file switches take place – to see if the changes are reporting errors or genuine changes.

« Previous PageNext Page »

The Rubric Theme Blog at


Get every new post delivered to your Inbox.

Join 3,507 other followers