Oracle Scratchpad

July 13, 2012

Recursive SQL

Filed under: Infrastructure,Oracle,Partitioning — Jonathan Lewis @ 4:41 pm BST Jul 13,2012

A question came up on the OTN database forum a few days ago asking why a very simple query against a hash partitioned table was doing so much physical I/O to get a very small amount of data from a single partition:

I have table hash-partitioned on an indexed column “id” which is non-unique and part of my primary key. Inside each partition, rows with the same id are located close to each other which is done by dbms_redefinition reorg using orderby_cols. The intention is to reduce the amount of physical reads since there are no queries that don’t filter on the id column.

What I’m seeing is a lot of physical reads though. The first partition has roughly 80K rows, an average row length of 347, block size of 8K and compression … resulting in 821 blocks. And when (after flushing buffered cache and shared pool) submit a query that filters on “id” only and results is 106 selected rows, I see roughly 1400 physical reads.

The poster has clearly applied some careful thought to minimising the work required – so what’s gone wrong ? To help us analyse the problem he posted the results from running a query through autotrace:

------------------------------------------------------------------------------------------------------------
| Id  | Operation                          | Name  | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |
------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                   |       |   106 | 36782 |     3   (0)| 00:00:01 |       |       |
|   1 |  PARTITION HASH SINGLE             |       |   106 | 36782 |     3   (0)| 00:00:01 |     1 |     1 |
|   2 |   TABLE ACCESS BY LOCAL INDEX ROWID| XXX   |   106 | 36782 |     3   (0)| 00:00:01 |     1 |     1 |
|*  3 |    INDEX RANGE SCAN                | XXXXX |     1 |       |     1   (0)| 00:00:01 |     1 |     1 |
------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   3 - access("ID"=49743)

Statistiken
----------------------------------------------------------
      22243  recursive calls
          0  db block gets
      66651  consistent gets
       1404  physical reads
          0  redo size
      10933  bytes sent via SQL*Net to client
        299  bytes received via SQL*Net from client
          9  SQL*Net roundtrips to/from client
        150  sorts (memory)
          0  sorts (disk)
        106  rows processed

As you can see from the statistics (or Statistiken) it took 1,404 physical reads to get the data, even though the description of the data suggests that we should only need to visit a very small number of blocks, the predicate information suggests a reasonably precise query, and the plan shows access to a single, known, partition by local index.

But the figure for “physical reads” is not the most important figure in this example – the important one is the “recursive calls”. Something unusual is happening if a very simple query – with an (accurate) estimate of 106 rows returned at very low cost – results in more than 22,000 recursive calls, which probably account for almost all the 66,000 consistent gets and most of the 1,404 physical I/Os.

There are many reasons why recursive calls may appear. Running the query with sql_trace enabled and running tkprof on the resulting trace file would probably tell us all we need to know but, before trying that, a simple re-execution of the query might be very revealing. If all the excess work disappears on the second execution then it’s probably due to the work done in parsing and optimising.

A week later we got the answer – the user had flushed the shared pool before running the query. Here’s a simple demonstration of the effect:

create table pt_hash_1 (
	id,
	grp,
	small_vc,
	padding
)
partition by hash(grp)
partitions 512
as
select
	rownum				id,
	trunc(rownum/10)		grp,
	to_char(trunc(rownum/20))	small_vc,
	rpad('x',100)			padding
from
	all_objects
where
	rownum <= 30000
;

execute dbms_stats.gather_table_stats(user,'pt_hash_1',method_opt=> 'for all columns size 1')

create index h1_i1 on pt_hash_1(id) local;
create index h1_i2 on pt_hash_1(small_vc) local;

alter system flush buffer_cache;
alter system flush shared_pool;

alter session set sql_trace true;

select *
from pt_hash_1
where
	id  = 98
and	grp = 98
;

alter session set sql_trace false;

So with 512 partitions (always use a power of 2 when dealing with hash partitions) and 2 local indexes, statistics collected (but no histograms) how many recursive calls are we likely to see for the simple query that will be picking at most a handful of rows from a single known partition ?

The answer – or at least a pointer to roughly the right answer – is 1,536; one for each segment of the table and its two indexes. Here’s the tkprof summary of the most significant parts of the trace file:

select type#,blocks,extents,minexts,maxexts,extsize,extpct,user#,iniexts,
  NVL(lists,65535),NVL(groups,65535),cachehint,hwmincr, NVL(spare1,0),
  NVL(scanhint,0),NVL(bitmapranges,0)
from
 seg$ where ts#=:1 and file#=:2 and block#=:3

call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        0      0.00       0.00          0          0          0           0
Execute   1540      0.03       0.02          0          0          0           0
Fetch     1540      0.01       0.10         62       4620          0        1540
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total     3080      0.04       0.12         62       4620          0        1540

This is Oracle loading the dc_segments area of the dictionary cache for every segment associated with the table and its indexes. There are calls to other dictionary tables in the trace file, of course, but the number is relatively small.

Given the 22,243 calls shown by the original query we might guess that the user has 4,096 partitions and five local indexes, or 2,048 partitions and 9 local indexes; either guess would account for 20,480 of the recursive calls, which seems to be about the right number to give us a few spare for the rest of the work the optimizer would have had to do.

The moral of the story – if you’re creating objects with a large number of (sub)partitions and indexes, make sure you look after your shared pool.

6 Comments »

  1. [...] table was doing so much physical I/O to get a very small amount of data from a single partition. Jonathan Lewis [...]

    Pingback by Log Buffer #277, A Carnival of the Vanities for DBAs | The Pythian Blog — July 14, 2012 @ 4:18 am BST Jul 14,2012 | Reply

  2. Jonathan, any obvious reason for oracle to load all dc segments related to the partitioned object (partition segments) while single partition is likely to be accessed ? We are using tables with hundreds of thousands partitions and accessing only one partition results in a huge amount of these recursive statements and also possibly related to many ora-4031 i have to troubleshoot. Thx. Olivier.
    Ps : this is occuring on 11.2.0.3 where i also had to disable multiple subpools because of a known memory leak in shared pool when managing lots of partitions and only fixed in 12beta.

    Comment by Olivier Bernhard — July 21, 2012 @ 12:06 pm BST Jul 21,2012 | Reply

    • Olivier,

      There’s no obvious reason, although it might simply be a question of simplicity of code. At this point, of course, Oracle doesn’t know which partition it’s going to use because that information embedded in the partition definition (tabpart$) rather than the segment definition – but I think Oracle gets that various bits of object partition information before it gets the segment information anyway.

      It’s interesting to note two things. First, the calls to tabpart$ and indpart$ don’t follow the same pattern – it’s one execution for the table, one for each index, with a large array fetch to get the data for all the partitions in one got. Secondly, the results above are from 11.2, and you don’t see any parse calls reported for the query against seg$, in 11.1 there is a parse call for every execution.

      I suspect that there’s some history attached to this code – in the good old days partitioned objects tended to have only a few partitions, so this single-row processing didn’t seem to introduce much overhead, and the strategy probably made it easy to deal with generic code that had to handle heap table, and IOTs, and any other possible partitioned object – I suspect that what were seeing is just the effect of code that has been written with implicit cursor loops that avoids writing more complex code that deals with several variants of joins between seg$ and other data dictionary tables.

      Comment by Jonathan Lewis — July 21, 2012 @ 3:18 pm BST Jul 21,2012 | Reply

  3. Many thanks for your very clear inputs !
    Olivier

    Comment by Olivier Bernhard — July 22, 2012 @ 5:36 pm BST Jul 22,2012 | Reply

  4. Jonathan, I’ve always wondered why You have to have dc_segments whereas seg$ table blocks are as well cached in buffer cache .
    Seems like double buffering for me .
    Could You please elaborate about that ?
    Regards
    GregG

    Comment by goryszewskig — December 13, 2012 @ 10:49 am BST Dec 13,2012 | Reply

    • Greg,

      I’m always allowed to say: “I don’t know, I didn’t write the spec or the code.”

      I can think of a few arguments for why the row cache is a good idea, though, here’s one. Oracle developers ought to assume that people will design bad applications and find themselves constantly optimising SQL statements. To optimise a statement you have to check whether or not the objects referenced exist and aren’t being created, changed, or destroyed, so you need a mechanism to lock the object definition temporarily.

      If you kept all the information in the data buffer, how would you lock the rows in the data buffer in a way that didn’t modify the data blocks (i.e. didn’t generate undo and redo) but ensured no-one could could change the row. Apart from using a whole data buffer to hold one row in memory, you’d have to write a layer of code that crossed the normal buffer handing code and introduced a level of complexity that would make it much harder to modify either set of code (i.e. the data handling code and the optimisation code).

      Comment by Jonathan Lewis — December 13, 2012 @ 11:39 am BST Dec 13,2012 | Reply


RSS feed for comments on this post. TrackBack URI

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Theme: Rubric. Get a free blog at WordPress.com

Follow

Get every new post delivered to your Inbox.

Join 4,258 other followers