Oracle Scratchpad

May 9, 2013

Clustering_factor

Filed under: CBO,Indexing,Oracle — Jonathan Lewis @ 8:14 am UTC May 9,2013

Cost Based Oracle – Fundamentals (November 2005)

But the most interesting function for our purposes is sys_op_countchg(). Judging from its name, this function is probably counting changes, and the first input parameter is the block ID portion (object_id, relative file number, and block number) of the table’s rowid, so the function is clearly matching our notional description of how the clustering_factor is calculated. But what is that 1 we see as the second parameter?

When I first understood how the clustering_factor was defined, I soon realized that its biggest flaw was that Oracle wasn’t remembering recent history as it walked the index; it only remembered the previous table block so that it could check whether the latest row was in the same table block as last time or in a new table block. So when I saw this function, my first guess (or hope) was that the second parameter was a method of telling Oracle to remember a list of previous block visits as it walked the index.

And finally, Oracle Corp. had implemented an official interface to the second parameter of sys_op_countchg() – provided you install the right patch – through a new table (or schema, or database) preference type available to the dbms_stats.set_table_prefs() procedure.

I’ve been meaning to write this post for two or three months, ever since Sean Molloy sent me an email about short blog note from Martin Decker describing Bug 13262857  Enh: provide some control over DBMS_STATS index clustering factor computation. Unfortunately I’ve not yet had time to investigate the patch, but I don’t think I need to any more because Richard Foote has written it up in his latest blog post.

Read Richard’s post – it’s important.

Update 10th May

Richard’s post has, unsurprisingly, produced a buzz of excitement in his reader – and started up the discussion of how best to use this capability; so here’s another quote from the book (p.111 – available in the download of chapter 5):

So using Oracle’s own function for calculating the clustering_factor, but substituting the freelists value for the table, may be a valid method for correcting some errors in the clustering_factor for indexes on strongly sequenced data. (The same strategy applies if you use multiple freelist groups—but multiply freelists by freelist groups to set the second parameter.)

Can a similar strategy be used to find a modified clustering_factor in other circumstances? I think the answer is a cautious “yes” for tables that are in ASSM tablespaces. Remember that Oracle currently allocates and formats 16 new blocks at a time when using automatic segment space management (even when the extent sizes are very large, apparently). This means that new data will be roughly scattered across groups of 16 blocks, rather than being tightly packed.

Calling Oracle’s sys_op_countchg() function with a parameter of 16 could be enough to produce a reasonable clustering_factor where Oracle currently produces a meaningless one. The value 16 should, however, be used as an upper bound. If your real degree of concurrency is typically less than 16, then your actual degree of concurrency would probably be more appropriate.

Whatever you do when experimenting with this function—don’t simply apply it across the board to all indexes, or even all indexes on a particular table. There will probably be just a handful of critical indexes where it is a good way of telling Oracle a little more of the truth about your system—in other cases you will simply be confusing the issue.

Note particularly the comments about how the best value depends on the data in the indexed columns, the table configuration, and the degree of concurrency - you don’t necessarily want to use the same value for every index on a given table. That’s a shame, since Oracle has defined the interface as a TABLE preference, so if you set it then you get the same for every index. Despite this, if you’re prepared to put in a little control work, it does mean that you can use an official Oracle mechanism to play the game I was suggesting in the book – for each “special” index, set the preference, collect the stats, then clear the preference.

March 4, 2013

Duplicate indexes ?

Filed under: Indexing,Oracle,Troubleshooting — Jonathan Lewis @ 5:39 pm UTC Mar 4,2013

I don’t think this is likely to happen on a production system (until 12c) – but look what you can do if you try hard enough:

  1  select
  2     index_name, column_name from user_ind_columns
  3  where
  4     table_name = 'T1'
  5  order by
  6*    index_name , column_position
SQL> /

INDEX_NAME           COLUMN_NAME
-------------------- --------------------
T1_I1                N1
                     V1

T1_I2                N1
                     V1


4 rows selected.

That’s a straight cut-n-paste from an Oracle 11.1.0.7 SQL*Plus session. (You can tell I typed it in real time because I missed the return before the FROM, and couldn’t be bothered to go back and do it again ;) )

January 10, 2013

Over-indexing

Filed under: Indexing,Oracle — Jonathan Lewis @ 6:43 pm UTC Jan 10,2013

This is the text of an article I published in the UKOUG magazine a few years ago, but it caught my eye while I was browsing through my knowledge base recently, and it’s still relevant. I haven’t altered the original apart from adding a couple of very brief comments in brackets [Ed: like this].

Over-indexing

One of the strengths of a relational database is that you should be able to throw any reasonable query (and even some unreasonable queries) at it and it will be able to return the right answer without being told how to navigate through the data.

There’s no guarantee, though, that you’ll get the answer quickly unless you’ve given the database some help by turning your logical model into a sensible physical implementation. Part of the physical implementation will be the choice of indexes – and this article reviews one of the commonest indexing issues that I see in OLTP systems

Costs and Benefits.

Why do we create indexes? There are three main reasons as far as Oracle is concerned. We need some indexes to help Oracle enforce uniqueness; we need some indexes for performance reasons to supply a high-precision access path to important data; and we may need some indexes to help Oracle enforce referential integrity constraints.

Theoretically, of course, we don’t need any indexes to ensure data correctness, so you could say that the only reason for having indexes is to improve performance.  For example, you could enforce primary key constraints simply by locking tables on every insert, update or delete and then doing a tablescan to make sure that there is no conflicting data – but having an index “for” the primary key allows you to check just one block in the index without locking the table, so you get better performance and better concurrency.  (In passing, an index that covers a primary key or unique constraint need not be a unique index, and the index definition need not be an exact match for the constraint – provided it starts with the relevant columns.)

So indexes can be a massive aid to performance and concurrency. But indexes have a price – you have to maintain them, usually in real-time, and that can cause problems. Roughly every two weeks [Ed: not quite so frequently in 2012], someone writes into the Oracle Forums with a question like: “I have a batch process that inserts 1 million rows of data into a table every night. But it’s very slow and I see lots of waits for ‘db file sequential read’ while it’s going on. Why is this happening?”

If you insert 1 million rows into a table you might have to create and (eventually) write something in the region of 25,000 table blocks (assuming a row size of about 200 bytes … 40 rows per block). If you have one index on that table then, in principle, you might have to do a random read to find an index leaf block for every single row you insert – and you may have to write the updated block out very soon afterwards to make space for the next random read. If the table is large and you have a couple of dozen indexes this “in principle” threat very soon becomes a real one.

There is a very important difference between (heap) tables and their indexes – a row can go into a table wherever the next empty space happens to be; a row has to go into an index in exactly the right place, and finding that place can be expensive. Things get worse if you are doing updates – you can update a row in the table “in situ”, but if you change the value of an indexed column you have to find an index entry for the old value, delete it, and then insert an index entry for the new value in the right place.

So don’t create indexes unless they really pay for their existence when you come to query the data – and when you do create an index make sure you maximise the payback.

The Commonest Error.

There are a number of subtle ways in which you can create too many indexes, but there is one error that is easy to spot – the foreign key index.

When you create a referential integrity constraint (foreign key) you do not need to create an index to support it and (unlike the unique constraint and primary key constraint) Oracle will not create such an index automatically. However, if you are going to update the primary key or unique key at the “parent” end of the referential integrity constraint Oracle will lock the “child” table to check for committed and uncommitted child rows if a suitable index does not exist. Consequently many people (and some application generators) automatically create a matching index for every foreign key they create.

Rule 1: don’t create “foreign key” indexes unless they are really necessary, or unless they are sufficiently useful that you would have created them any way even if there hadn’t been a referential integrity constraint to protect.

If you do have a foreign key constraint that needs an index, though, remember that the index does not have to be an exact match for the constraint – the critical feature is that it should start with the same columns (not necessarily in the same order) as the constraint definition. This means that you can always ‘add value’ to a foreign key index that might otherwise be just a technical overhead.

Rule 2: consider adding columns to foreign key indexes to make them useful to high precision queries.

Moreover, when you think about the parent/child relationship remember that you often see one parent row with several child rows – think orders/order_lines, or currency_codes/payments. In cases like this there may be an opportunity for saving quite a lot of space (and reducing the impact on the buffer cache) by compressing the index on some, or all, of the foreign key column(s).

Case Study.

Here’s a list of indexes (with a little camouflage) from a system I was looking at recently [Ed: some time in 2008]. It’s typical of the over-indexing problems that I see fairly frequently. I got this list by running a query similar to:

select
        index_name, column_name
from
        user_ind_columns
where
        table_name = 'XXXXX'
order by
        index_name, column_position
;

You could enhance this by query to use dba_ind_columns, of course, and add all sorts of extras by joining to dba_tab_columns, dba_indexes, and dba_ind_expressions to pick up details about mandatory columns, column types, uniqueness, index types, and functions involved in function-based indexes. In the case of my example, I used a slightly more sophisticated query to show the expressions used in function-based indexes – and here’s the list of indexes I found on one table:

ap_fun_fk_i ( fun_id )
ap_grp_fk_i ( grp_id )
ap_grp_fun_id_i( grp_id,fun_id )
ap_org_ap_i ( org_id,ap_id )
ap_org_fk_i ( org_id )
ap_per_ap_i ( per_id,ap_id )
ap_per_fk_i ( per_id )
ap_pk ( ap_id )
ap_ud_i ( trunc(update_date) )

This system had a convenient naming convention, PK in the name indicates the primary key, and FK indicates foreign keys. You can see immediately that we seem to have several redundant indexes as far as protecting foreign keys is concerned.

  • AP_GRP_FK_I is made redundant by AP_GRP_FUN_ID_I,
  • AP_ORG_FK_I is made redundant by AP_ORG_AP_I,
  • AP_PER_FK_I is made redundant by AP_PER_AP_I.

On top of this, we might decide that since there are only a few organizations (org_id) we could benefit by compressing the index on (org_id, ap_id) on its first column. We might also compress the index on functions (fun_id) as well – although we might go one step further and decide that we will never delete functions or update the key values, and drop the index altogether.

Following the thought about compression, we could also consider compressing the index on (trunc(update_date)) – a fairly large amount of data gets updated each day, so the value is quite repetitive, similarly (grp_id, fun_id) is also fairly repetitive, so we might compress on both columns.

This leaves us with:

ap_grp_fun_id_i ( grp_id,fun_id ) compress 2
ap_org_ap_i ( org_id,ap_id )  compress 1
ap_per_ap_i ( per_id,ap_id )
ap_pk ( ap_id )
ap_ud_i ( trunc(update_date) )  compress 1

The reduction in the number of indexes is likely to reduce the undo and redo by about 40% as we insert (single row) data into this table, as well as reducing the possible I/O demand by the same sort of percentage. At the same time, the effect of the compression could (in this case) eliminate a further 10% to 15% of I/O related to this table because of the reduction in size of the indexes.

Further considerations.

The only remaining issue to consider is whether there is anything about the nature of our processing that suggests the need for housekeeping on these indexes – and the index on trunc(update_date) is an ideal candidate for consideration.

As we update data, we are going to delete entries from blocks in the left hand end of the index and insert them at the right hand end of the index. Depending on the rate and pattern of updates it is possible that a large number of blocks at the left hand end of the index will become close to empty –  this could have a significant impact on the effectiveness of the buffer cache and might encourage us to use the coalesce command on the index every few days. (If this sounds familiar, I have written about this particular index in the past.)

Having thought about that, you might then wonder why we have an index on trunc(update_date) at all. A query that tries to find all rows updated on a certain date, or range of dates, may have to jump around all over the table to acquire a lot of data. Who wants to do this, and how often? It’s possible that the index exists for a particular report that runs just once per day – perhaps there is an argument for dropping this index as well.

Conclusion:

It’s very easy to create more indexes than you really need, and indexes can be very expensive to maintain. Sometimes you can spot “obviously” redundant indexes simply by inspecting index names and column names.  A few reasonable guesses may allow you to drop some indexes and make others more efficient with a minimum of risk and effort.

 

January 3, 2013

Skip Scan 2

Filed under: CBO,Indexing,Oracle — Jonathan Lewis @ 6:42 pm UTC Jan 3,2013

Here’s a question that is NOT a trick question, it’s demonstrating an example of optimizer behaviour that might come as a surprise.
I have an index (addr_id0050, effective_date), the first column is numeric, the second is a date. Here’s a query with an execution plan that uses that index:

define m_date='30-Jan-2013'

select
	small_vc
from	t1
where
	addr_id0050 between 24 and 26
and	effective_date = to_date('&m_date', 'dd-mon-yyyy')
;

------------------------------------------------------------------------
| Id  | Operation                   | Name     | Rows  | Bytes | Cost  |
------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |          |    18 |   396 |    23 |
|   1 |  TABLE ACCESS BY INDEX ROWID| T1       |    18 |   396 |    23 |
|*  2 |   INDEX SKIP SCAN           | T1_I0050 |    18 |       |     5 |
------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("ADDR_ID050">=24 AND "EFFECTIVE_DATE"=TO_DATE(' 2013-01-30'
              00:00:00', 'syyyy-mm-dd hh24:mi:ss') AND "ADDR_ID050"<=26)
       filter("EFFECTIVE_DATE"=TO_DATE(' 2013-01-30 00:00:00', 'syyyy-mm-dd
              hh24:mi:ss'))

So here’s the question – given that my where clause includes a predicate on the first column of the index that would allow an index range scan to take place, wouldn’t you expect Oracle to do a range scan, and how does a skip scan work in this case ?

To push the point a little further I have another column, also numeric, in the same table which appears in a similar index (addr_id2500, effective_date). Here’s the equivalent query with its execution plan.

select
	small_vc
from	t1
where
	addr_id2500 between 24 and 26
and	effective_date = to_date('&m_date', 'dd-mon-yyyy')
;

------------------------------------------------------------------------
| Id  | Operation                   | Name     | Rows  | Bytes | Cost  |
------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |          |     1 |    23 |     5 |
|   1 |  TABLE ACCESS BY INDEX ROWID| T1       |     1 |    23 |     5 |
|*  2 |   INDEX RANGE SCAN          | T1_I2500 |     1 |       |     4 |
------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("ADDR_ID2500">=24 AND "EFFECTIVE_DATE"=TO_DATE('
              2013-01-30 00:00:00', 'syyyy-mm-dd hh24:mi:ss') AND "ADDR_ID2500"<=26)
       filter("EFFECTIVE_DATE"=TO_DATE(' 2013-01-30 00:00:00',
              'syyyy-mm-dd hh24:mi:ss'))

In this case, with the same starting predicate (but referencing the alternative column), the optimizer produces the index range that we might expect.

The index skip scan isn’t just for cases where the first column of an index is missing from the list of predicates. The basic principle is that the optimizer has the option to do a small number of tiny range scans on an index when the alternative is to do a tablescan or a very large index range scan.

As you might guess from my column names, addr_id0050 holds 50 distinct values, so the range 24 to 26 actually accounts for 6% of the total volume of the index. On the other hand, there are over 800 distinct values for effective_date. There is even an index on (effective_date) but that’s going to pick up lots of rows that are widely scattered throughout the table then throw away 94% of them so Oracle has decided it’s too expensive to use.

So the optimizer has worked out that it can probe for (24, 30th Jan), (25, 30th Jan), and (26, 30th Jan) as the most efficient access path. Of course, it doesn’t know that it’s probing for exactly those values but statistically it assumes that there are only a few possible values that will show up if it probes the index on the first column – in effect using an “inlist iterator” on the first column without knowing in advance what it’s going to find in the list.

When the optimizer sees the query using addr_id2500 (which holds 2,500 distinct values) the arithmetic faviours the option we are familiar with. A simple range scan based on values between 24 and 26 is going to range through roughly l/800th of the index – which is a tiny number of leaf blocks (just one or two, in my case) so the optimizer decides to do that and check every index entry on the way to see if the effective_date holds a suitable value.

If you want to repeat the experiment, I was using 11.2.0.3 with 1MB uniform extents and freelist management. The SQL to create the table and indexes is as follows:

create table t1
as
with generator as (
	select	--+ materialize
		rownum 	id
	from	all_objects
	where	rownum <= 3000
)
select
	mod(rownum,2500)				addr_id2500,
	mod(rownum,50)					addr_id0050,
	trunc(sysdate) + trunc(mod(rownum,2501)/3)	effective_date,
	lpad(rownum,10,'0')				small_vc,
	rpad('x',100)					padding
from
	generator	v1,
	generator	v2
where
	rownum <= 250000
;

create index t1_i1 on t1(effective_date);
create index t1_i2500 on t1(addr_id2500, effective_date);
create index t1_i0050 on t1(addr_id0050, effective_date);

-- collect stats, no histograms, 11g auto sample size

I had disabled CPU costing to get repeatable results – depending on parameter settings whether you have system stats enabled or not then you may need to tweak the code to change the relative numbers of distinct values in the numeric columns before you see the switch between range scan and skip scan.

November 27, 2012

IOT Load

Filed under: Indexing,Infrastructure,Oracle — Jonathan Lewis @ 5:15 pm UTC Nov 27,2012

When I introduced Connor McDonald’s blog a few days ago, it was because we had exchanged a couple of email messages (through the Oak Table Network) about how to minimise the resource consumption when copying a load of data from one IOT to another of the same structure. His problem was the way in which the obvious way of copying the data resulted in a massive sort even though, in principle, it should not have been necessary to sort anything since the data could have been extracted in order by walking the existing IOT.

As a suggestion I referenced a comment I had made in the Addenda to Practical Oracle 8i about 12 years ago when I had first solved the problem of loading an IOT with minimal logging and no sorting. At the time I had been loading data from a sorted file into an empty table that was then going to be exchnaged into a partitioned IOT – but it crossed my mind that load from a flat file and loading from a UNIX pipe were pretty much the same thing, so perhaps Connor could workaround his problem by making one session spool to a pipe while another session was reading it. In the end, he simply created a massive temporary tablespace, but I thought I’d modify a test script I wrote a few years ago to demonstrate my idea – and here it is:

rem
rem	create an empty IOT
rem	Note the columns used for the PK
rem

create table iot_load(
	id	number,
	name	varchar2(32),
	padding	varchar2(100),
	constraint iot_pk primary key(name, id)
)
organization index
;

rem
rem	Create a load file
rem	But since we are running UNIX use the mknod command to
rem	create a pipe before spooling to that pipe.
rem

set pagesize 0
set feedback off
set newpage 0
set trimspool on
set linesize 180
set timing off
set termout off

column object_name format a38

host mknod /tmp/iot_load.dat p

spool /tmp/iot_load.dat

select r, object_name, padding
from
(
	select
		rownum	r,
		object_name,
		rpad('x',100,'x') padding
	from
		all_Objects
	where
		rownum &lt;= 10000
)
order by
	object_name, r
;

spool off

This script creates a target IOT, and uses a simple select statement (you get a clue about how old this code is by the fact that I’ve referenced all_objects, and not a “connect by” with dual) to generate a set of date that is sorted in an order that is a suitable match for the definition of the IOT. In Connor’s case he could have used a simple ‘select from source IOT order by primary key’. I’ve set up a few SQL*Plus environment detals to make the output completely flat and undecorated. A key point, though, is that I’m spooling to a file called /tmp/iot_load.dat and that file has been pre-created as a UNIX pipe. When you run this script the select statement will hang almost immediately because the pipe will become full, at which point you need another process to start emptying it. So from another UNIX session run the following (changing the userid and password as necessary):

sqlldr userid=test_user/test control=iot_load data=/tmp/iot_load.dat

Here’s the content of the control file iot_load.ctl

OPTIONS (direct = true)
UNRECOVERABLE
LOAD DATA
TRUNCATE
INTO table iot_load
sorted indexes (iot_pk)
(
	id		position(01:10)	char,
	name		position(12:49) char,
	padding		position(51:150) char
)

We use direct path load (which means virtually no undo) in unrecoverable mode (which means virtually no redo). But the unrecoverable bit only works because we start by truncating the IOT, and promising that the incoming data is sorted in order of the IOT’s primary key index. And since the data is appearing in order, Oracle doesn’t have to sort it before inserting it. Net result: no undo, no redo (apart from metadata) and no sorting – i.e. minimum overhead for loading the IOT – we simply walk the source in order and copy it into the target.

As this session reads from the pipe, the first session can resume writing into the pipe, so the data flows from source to target through a small piece of shared memory (techically a bit of the file-system buffer, I suppose) until the source session closes the pipe with its “spool off”.

November 23, 2012

Consistency

Filed under: Indexing,Oracle — Jonathan Lewis @ 5:18 pm UTC Nov 23,2012

Here’s a funny little glitch – typical of the sort of oddity that creeps into the data dictionary from time to time – cut-n-pasted from 11.1.0.7:

SQL&gt; select count(*) from user_indexes;

  COUNT(*)
----------
      1074

1 row selected.

SQL&gt; select count(*) from user_objects where object_type = 'INDEX';

  COUNT(*)
----------
       917

1 row selected.

We seem to have 157 indexes that aren’t indexes – what could they be. Perhaps a small enhancement to our query on user_indexes (the one that has the larger result) will help:

SQL&gt; select index_type, count(*) from user_indexes group by index_type order by count(*) desc;

INDEX_TYPE                    COUNT(*)
--------------------------- ----------
NORMAL                             854
LOB                                157
IOT - TOP                           40
FUNCTION-BASED NORMAL               13
CLUSTER                             10

5 rows selected.

Spot the missing (or excess, depending on your viewpoint) 157 indexes. LOB indexes (the system generated index that maps a LOB segment) aren’t reported as being of type index, and the following predicate in the definition of the view user_objects tells us why not:

  and (o.type# not in (1  /* INDEX - handled below */,
                      10 /* NON-EXISTENT */)
       or
       (o.type# = 1 and 1 = (select 1
                              from sys.ind$ i
                             where i.obj# = o.obj#
                               and i.type# in (1, 2, 3, 4, 6, 7, 9))))

Compare this with the definition of view user_indexes, and we can see that user_objects is explicitly excluding LOB and “IOT – NESTED” indexes.

        decode(i.type#, 1, 'NORMAL'||
                          decode(bitand(i.property, 4), 0, '', 4, '/REV'),
                      2, 'BITMAP', 3, 'CLUSTER', 4, 'IOT - TOP',
                      5, 'IOT - NESTED', 6, 'SECONDARY', 7, 'ANSI', 8, 'LOB',
                      9, 'DOMAIN'),

This changes in 11.2.0.3, where type 8 (LOB) is explicitly included in the user_objects view.

Does anyone have an example of an ANSI index, or an “IOT – NESTED” index ?

October 23, 2012

Skip Scan

Filed under: CBO,Indexing,Oracle — Jonathan Lewis @ 5:55 pm UTC Oct 23,2012

A recent question on OTN asked how you could model a case where Oracle had the choice between a “perfect” index for a range scan and an index that could be used for an index skip scan and choose the latter path even though it was clearly (to the human eye) the less sensible choice. There have been a number of wierd and wonderful anomalies with the index skip scan and bad choice over the years, and this particular case is just one of many oddities I have seen in the past – so I didn’t think it would be hard to model one (in fact, I thought I already had at least two examples somewhere in my library – but I couldn’t find them).

Take a data set with two columns, call them id1 and id2, and create indexes on (id1), and (id2, id1). Generate the id1 column as a wide range of cyclic values, generate the id2 set with a small number of repetitive values so that a large number of physically adjacent rows hold the same value. The clustering_factor on the (id1) index will be very large, the clustering_factor on the (id2, id1) index will be relatively small because it will be controlled largely by the repetitive id2 value. Here’s the data set:
(more…)

October 4, 2012

Indexing 12c

Filed under: 12c,Indexing,Oracle — Jonathan Lewis @ 2:43 pm UTC Oct 4,2012

Another little detail that Hermann Baer mentioned in his presentation yesterday was the ability to create multiple indexes with the same column definition – something which currently gets you Oracle error “ORA-01408: such column list already indexed.” 

No details, and there’s always the “safe harbour” slide of course – the one which says seomthing about the presentation being only an indication of current thinking and nothing is guaranteed to appear.

Having said that, this looks like an interesting option for those (possibly rare) occasions when you want to change a unique index into a non-unique index (for example, to change a unique constraint to deferrable). Rather than having to drop the index and create a new one – leaving the table unindexed while the index builds, you appear to have the option to: “create new index online”, “drop old index”. Moving a primary key constraint from one index to the other might not be so easy, of course, especially if there are foreign keys in place – but this certainly looks like a helpful step.

Details to follow when 12c becomes available.

September 18, 2012

Minimum stats

Filed under: Indexing,Oracle,Statistics — Jonathan Lewis @ 5:06 pm UTC Sep 18,2012

Occasionally I come across complaints that dbms_stats is not obeying the estimate_percent when sampling data and is therefore taking more time than it “should” when gathering stats. The complaint, when I have seen it, always seems to be about the sample size Oracle chose for indexes.

There is a simple but (I believe) undocumented reason for this: because indexes are designed to collate similar data values they are capable of accentuating any skew in the data distribution, which means a sample taken from a small number of leaf blocks can be highly misleading as a guide to the whole index – so Oracle aims for a minimum sample size for gathering index stats.

I’ve found remnants of a note I wrote on comp.databases.oracle.server in December 2004 which claims that this limit (as of Oracle 9.2) was 919 leaf blocks – and I have a faint memory of discovering this figure in an official Metalink (MOS) note. I can’t find the note any more, but it’s easy enough to set up a test to see if the requirement still exists and if the limit is still the same. Here’s a test I ran recently on 11.2.0.3 using an 8KB block size:
(more…)

September 11, 2012

FBI Delete

Filed under: Bugs,Function based indexes,Indexing,Oracle — Jonathan Lewis @ 5:56 pm UTC Sep 11,2012

A recent post on Oracle-l complained about an oddity when deleting through a function-based index.

I have a function based index but the CBO is not using it. The DML that I expect to have a plan with index range scan is doing a FTS. Its a simple DML that deletes 1000 rows at a time in a loop and is based on the column on which the FBI is created.

Although execution plans are mentioned, we don’t get to see the statement or the plan – and it’s always possible that there will be some clue in the (full) plan that tells us something about the code that the OP has forgotten to mention. However, function-based indexes have a little history of not doing quite what you expect, so I thought I’d take a quick look at the problem, starting with the simplest possible step – do function-based indexes and “normal” b-tree indexes behave differently on a delete. Here’s the data set I created for my test:
(more…)

September 4, 2012

Online Rebuild

Filed under: Index Rebuilds,Indexing,Oracle — Jonathan Lewis @ 5:46 pm UTC Sep 4,2012

I’ve commented in the past about the strange stories you can find on the internet about how Oracle works and how sometimes, no matter how daft those stories seem, there might be something behind them. Here’s one such remark I came across a little while ago – published in two or three places this year:

“An index that enforces referential integrity cannot be rebuilt online.”

There are a couple of problems with this statement – first, of course, indexes don’t enforce referential integrity, though they may help to enforce uniqueness, and the so-called “foreign key” index may avoid a locking issue related to referential integrity: that’s splitting hairs a little bit, though, and we can probably guess what the author means by “indexes enforcing referential integrity”.  (An example demonstrating the problem would have been useful, though – it would have saved me from writing this note, and it might save other people from jumping to the wrong conclusion and taking unsuitable action as a consequence.)

So here’s a simple test (run under 11.2.0.3):
(more…)

August 19, 2012

Compression Units – 5

Filed under: CBO,Exadata,Indexing,Oracle — Jonathan Lewis @ 6:02 pm UTC Aug 19,2012

The Enkitec Extreme Exadata Expo (E4) event is over, but I still have plenty to say about the technology. The event was a great success, with plenty of interesting speakers and presentations. As I said in a previous note, I was particularly keen to hear  Frits Hoogland’s comments  on Exadata and OLTP, Richard Foote on Indexes, and Maria Colgan’s comments on how Oracle is making changes to the optimizer to understand Exadata a little better.

All three presentations were interesting – but Maria’s was possiby the most important (and entertaining). In particular she told us about two patches for 11.2.0.3, one current and one that is yet to be released (unfortunately I forgot to take  note of the patch numbers – ed: but they’ve been supplied by readers’ comment below).
(more…)

July 23, 2012

Compression Units – 2

Filed under: CBO,Exadata,Indexing,Oracle — Jonathan Lewis @ 4:41 pm UTC Jul 23,2012

When I arrived in Edinburgh for the UKOUG Scotland conference a little while ago Thomas Presslie, one of the organisers and chairman of the committee, asked me if I’d sign up on the “unconference” timetable to give a ten-minute talk on something. So I decided to use Hybrid Columnar Compression to make a general point about choosing and testing features. For those of you who missed this excellent conference, here’s a brief note of what I said.
(more…)

May 28, 2012

Ch-ch-ch-ch-changes

Filed under: Index Rebuilds,Indexing,Infrastructure,Oracle — Jonathan Lewis @ 5:31 pm UTC May 28,2012

For those not familiar with Richard Foote’s extensive blog about indexes (and if you’re not you should be) – the title of this note is a blatant hi-jacking of his preferred naming mechanism.

It’s just a short note to remind myself (and my readers) that anything you know about Oracle, and anything published on the Internet – even by Oracle Corp. and its employees – is subject to change without notice (and sometimes without being noticed). I came across one such change today while reading the Expert Oracle Exadata book by Kerry Osborne, Randy Johnson and Tanel Poder. It was just a little throwaway comment on page 429 to the effect that:

In NOARCHIVELOG mode all bulk operations (such as INSERT, APPEND, index REBUILD and ALTER TABLE MOVE) are automatically nologging.

(more…)

May 17, 2012

Index Sizing

Filed under: Indexing,Infrastructure,Oracle — Jonathan Lewis @ 8:53 am UTC May 17,2012

I was in a discussion recently about how to estimate the size of a bitmap index before you build it, and why it’s much harder to do this for bitmap indexes than it is for B-tree indexes. Here’s what I wrote in “Practical Oracle 8i”:
(more…)

Next Page »

Theme: Rubric. Blog at WordPress.com.

Follow

Get every new post delivered to your Inbox.

Join 1,395 other followers