Oracle Scratchpad

February 19, 2015

255 columns

Filed under: Infrastructure,Oracle,Performance — Jonathan Lewis @ 12:45 am GMT Feb 19,2015

You all know that having more than 255 columns in a table is a Bad Thing ™ – and surprisingly you don’t even have to get to 255 to hit the first bad thing about wide tables. If you’ve ever wondered what sorts of problems you can have, here are a few:

  • If you’re still running 10g and gather stats on a table with more than roughly 165 columns then the query Oracle uses to collect the stats will only handle about 165 of them at a time; so you end up doing multiple (possibly sampled) tablescans to gather the stats. The reason why I can’t give you an exact figure for the number of columns is that it depends on the type and nullity of the columns – Oracle knows that some column types are fixed length (e.g. date types, char() types) and if any columns are declared not null then Oracle doesn’t have to worry about counting nulls – so for some of the table columns Oracle will be able to eliminate one or two of the related columns it normally includes in the stats-gathering SQL statement – which means it can gather stats on a few more table columns.  The 165-ish limit doesn’t apply in 11g – though I haven’t checked to see if there’s a larger limit before the same thing happens.
  • If you have more than 255 columns in a row Oracle will split it into multiple row pieces of 255 columns each plus one row piece for “the rest”; but the split counts from the end, so if you have a table with 256 columns the first row-piece has one column the second row-piece has 255 columns. This is bad news for all sorts of operations because Oracle will have to expend extra CPU chasing the the row pieces to make use of any column not in the first row piece. The optimists among you might have expected “the rest” to be in the last row piece. If you want to be reminded how bad row-chaining can get for wide tables, just have a look at an earlier blog note of mine (starting at this comment).
  • A particularly nasty side effect of the row split comes with direct path tablescans – and that’s what Oracle does automatically when the table is large. In many cases all the row pieces for a row will be in the same block; but they might not be, and if a continuation row-piece is in a different block Oracle will do a “db file sequential read” to read that block into the buffer cache and it won’t be cached (see 1st comment below).  As an indication of how badly this can affect performance, the results I got at a recent client site showed “select count(col1) from wide_table” taking 10  minutes while “select count(column40) from wide_table” took 22 minutes because roughly one row in a hundred required a single block read to follow the chain.
  • An important side effect of the split point is that you really need to put the columns you’re going to index near the start of the table to minimise the risk of this row chaining overhead when you create or rebuild an index.
  • On top of everything else, of course, it takes a surprisingly large amount of extra CPU to load a large table if the rows are chained. Another client test reported 140 CPU seconds to load 5M rows of 256 columns, but only 20 CPU seconds to load 255.

If you are going to have tables with more than 255 columns, think very carefully about column order – if you can get all the columns that are almost always null at the end of the row you may get lucky and find that you never need to create a secondary row piece. A recent client had about 290 columns in one table of 16M rows, and 150 columns were null for all 16M rows – unfortunately they had a mandatory “date_inserted” column at the end of the row, but with a little column re-arrangement they eliminated row chaining and saved (more than) 150 bytes storage per row.  Of course, if they have to add and back-fill a non-null column to the table they’re going to have to rebuild the table to insert the column “in the middle”, otherwise all new data will be chained and wasting 150 bytes per row, and any old data that gets updated will suffer a row migration/chain catastrophe.

February 12, 2015

Parallel rownum

Filed under: Execution plans,Oracle,Parallel Execution — Jonathan Lewis @ 7:27 am GMT Feb 12,2015

It’s easy to make mistakes, or overlook defects, when constructing parallel queries – especially if you’re a developer who hasn’t been given the right tools to make it easy to test your code. Here’s a little trap I came across recently that’s probably documented somewhere, which could be spotted easily if you had access to the OEM SQL Monitoring screen, but would be very easy to miss if you didn’t check the execution plan very carefully. I’ll start with a little script to generate some data:


create table t1 nologging
as
select * from all_objects where rownum <= 50000
;

insert /*+ append */ into t1 select * from t1;
commit;
insert /*+ append */ into t1 select * from t1;
commit;
insert /*+ append */ into t1 select * from t1;
commit;
insert /*+ append */ into t1 select * from t1;
commit;

begin
	dbms_stats.gather_table_stats(
		ownname		 => user,
		tabname		 =>'T1',
		method_opt	 => 'for all columns size 1'
	);
end;
/

create table t2 as select * from t1;
alter table t2 add id number(10,0);

All I’ve done is create some data – 800,000 rows – and then create a table to copy it to; and while I copy it I’m going to add a temporary id to the rows, which I’ll do with a call to rownum; and since there’s a lot of data I’ll use parallel execution:


alter session enable parallel dml;

insert /*+ parallel(t2 3) */ into t2
select /*+ parallel(t1 4) */ t1.* , rownum from t1;

For the purposes of experiment and entertainment I’ve done something a little odd by supplying two possible degrees of parallelism, but this lets me ask the query: will this statement run parallel 3, parallel 4, both of the above, or neither ? (You may assume that I have parallel execution slaves available when the statement runs.)

The answer is both – because that rownum does something nasty to the execution plan (I didn’t include the 50,000 limit in my first test, which is why the plan reports 993K rows instead of 800,000):


--------------------------------------------------------------------------------------------------------------------
| Id  | Operation                  | Name     | Rows  | Bytes | Cost (%CPU)| Time     |    TQ  |IN-OUT| PQ Distrib |
--------------------------------------------------------------------------------------------------------------------
|   0 | INSERT STATEMENT           |          |   993K|    92M|  1076   (1)| 00:00:13 |        |      |            |
|   1 |  PX COORDINATOR            |          |       |       |            |          |        |      |            |
|   2 |   PX SEND QC (RANDOM)      | :TQ20001 |   993K|    92M|  1076   (1)| 00:00:13 |  Q2,01 | P->S | QC (RAND)  |
|   3 |    LOAD AS SELECT          | T2       |       |       |            |          |  Q2,01 | PCWP |            |
|   4 |     PX RECEIVE             |          |   993K|    92M|  1076   (1)| 00:00:13 |  Q2,01 | PCWP |            |
|   5 |      PX SEND ROUND-ROBIN   | :TQ20000 |   993K|    92M|  1076   (1)| 00:00:13 |        | S->P | RND-ROBIN  |
|   6 |       COUNT                |          |       |       |            |          |        |      |            |
|   7 |        PX COORDINATOR      |          |       |       |            |          |        |      |            |
|   8 |         PX SEND QC (RANDOM)| :TQ10000 |   993K|    92M|  1076   (1)| 00:00:13 |  Q1,00 | P->S | QC (RAND)  |
|   9 |          PX BLOCK ITERATOR |          |   993K|    92M|  1076   (1)| 00:00:13 |  Q1,00 | PCWC |            |
|  10 |           TABLE ACCESS FULL| T1       |   993K|    92M|  1076   (1)| 00:00:13 |  Q1,00 | PCWP |            |
--------------------------------------------------------------------------------------------------------------------

See that “P->S” (parallel to serial) at operation 8. The select statement runs in parallel (degree 4) to scan the data, and then sends it all to the query co-ordinator to supply the rownum; then the query co-ordinator re-distributes the data (including rownum) to another set of slave (S->P) to do the parallel (degree 3) insert. The P->S at line 2 shows the parallel execution slaves passing details to the query co-ordinator of the private segments that they have created so that the query co-ordinator can stitch the segments together into a single data segment for the table. (If you watch closely you’ll see the query co-ordinator doing a few local writes as it tidies up the header blocks in those segment blocks.)

There are two threats to this rownum detail. The first, of course, is that the operation essentially serialises through the query co-ordinator so it’s going to take longer than you might expect; secondly an accident of this type is typically going to allocate twice as many parallel execution slaves as you might have expected – the select and the insert are two separate data flow operations (note how the Name column shows TQ1xxxx and TQ2xxxx), each gets its own slave sets, and both sets of slaves are held for the duration of the statement. If this statement is demanding twice the slaves it should be using, then you may find that other statements that start running at the same time get their degree of parallelism downgraded because you’ve run out of PX slaves. Although the rownum solution is nice and clean – it require no further infrastructure – you probably need to introduce a sequence (with a large cache) to get the same effect without losing parallelism.

If you look at v$pq_tqstat after running this statement the results are a little disappointing – there are a few problems connecting lines from the plan with rows in the view – here’s my original output (and you’ll now see why I chose to have two different degrees of parallelism):


DFO_NUMBER      TQ_ID SERVER_TYPE     INSTANCE PROCESS           NUM_ROWS      BYTES      WAITS   TIMEOUTS AVG_LATENCY
---------- ---------- --------------- -------- --------------- ---------- ---------- ---------- ---------- -----------
         1          0 Consumer               1 P000                331330   39834186         74         71           0
                                             1 P001                331331   39844094         75         72           0
                                             1 P002                330653   39749806         74         71           0

                    1 Producer               1 P000                     1        131       2263        396           0
                                             1 P001                     1        131       2238        417           0
                                             1 P002                     1        131       2182        463           0

         2          0 Producer               1 P003                247652   28380762         13          0           0
                                             1 P004                228857   26200574         13          1           0
                                             1 P005                267348   30496182         14          0           0
                                             1 P006                249457   28401982         13          0           0
                                             1 QC                  993314  119428086 4294967269 4294967286           0
                      Consumer               1 QC                  993314  113479500        125         65           0

                    1 Consumer               1 QC                       3        393          2          1           0

The first problem is that the DFO_number reported in the view doesn’t match with the :TQ1xxxx and :TQ2xxxx reported in the plan – the parallel 4 bit is the select, which is covered by :TQ1000, but it’s listed under DFO_Number = 2 in the view, and the insert is the parallel 3 bit, which is covered by :TQ2000 and :TQ20001 but listed under DFO_Number = 1.

More confusingly, potentially, is that the all appearances of the query coordinator have been assigned to DFO_Number = 2. Ignoring the fact that the DFO_Number column switches the 1 and 2 from the plan, what we should see is as follows:

  • The consumer at line 16 is consuming from the 4 producers at lines 11 – 14.
  • The producer at line 15 is producing FOR the 3 consumers at lines 3 – 5
  • The consumer at line 18 is consuming from the producers at lines 7 – 9

Ideally (including the correction for the DFO_Number) I think the view content should be as follows:


DFO_NUMBER      TQ_ID SERVER_TYPE     INSTANCE PROCESS           NUM_ROWS      BYTES      WAITS   TIMEOUTS AVG_LATENCY
---------- ---------- --------------- -------- --------------- ---------- ---------- ---------- ---------- -----------
         1          0 Producer               1 P003                247652   28380762         13          0           0
                                             1 P004                228857   26200574         13          1           0
                                             1 P005                267348   30496182         14          0           0
                                             1 P006                249457   28401982         13          0           0
                      Consumer               1 QC                  993314  113479500        125         65           0

         2          0 Producer               1 QC                  993314  119428086 4294967269 4294967286           0
                      Consumer               1 P000                331330   39834186         74         71           0
                                             1 P001                331331   39844094         75         72           0
                                             1 P002                330653   39749806         74         71           0

                    1 Producer               1 P000                     1        131       2263        396           0
                                             1 P001                     1        131       2238        417           0
                                             1 P002                     1        131       2182        463           0
                      Consumer               1 QC                       3        393          2          1           0

Just don’t ask me why the waits and timeouts for the QC as producer seem to be counting backwards from 2^32.

February 8, 2015

Functions & Subqueries

Filed under: Oracle,Performance,Subquery Factoring,Tuning — Jonathan Lewis @ 4:12 am GMT Feb 8,2015

I think the “mini-series” is a really nice blogging concept – it can pull together a number of short articles to offer a much better learning experience for the reader than they could get from the random collection of sound-bites that so often typifies an internet search; so here’s my recommendation for this week’s mini-series: a set of articles by Sayan Malakshinov a couple of years ago comparing the behaviour of Deterministic Functions and Scalar Subquery Caching.

http://orasql.org/2013/02/10/deterministic-function-vs-scalar-subquery-caching-part-1/

http://orasql.org/2013/02/11/deterministic-function-vs-scalar-subquery-caching-part-2/

http://orasql.org/2013/03/13/deterministic-function-vs-scalar-subquery-caching-part-3/

Footnote:
Although I’ve labelled it as “this week’s” series, I wouldn’t want you to assume that I’ll be trying to find a new mini-series every week.

Footnote 2:
I had obviously expected to publish this note a long time ago – but must have forgotten about it. I was prompted to search my blog for “deterministic” very recently thanks to a recent note on the OTN database forum and discovered both this note and an incomplete note about improving the speed of creating function-based indexes by tweaking hidden parameters – which I might yet publish, although if you read all of Sayan’s articles you’ll find the solution anyway.

 

January 26, 2015

In-memory DB

Filed under: 12c,in-memory,Infrastructure,Oracle — Jonathan Lewis @ 8:18 am GMT Jan 26,2015

A recent thread on the OTN database forum supplied some code that seemed to show that In-memory DB made no difference to performance when compared with the traditional row-store mechanism and asked why not.  (It looked as if the answer was that almost all the time for the tests was spent returning the 3M row result set to the SQL*Plus client 15 rows at a time.)

The responses on the thread led to the question:  Why would the in-memory (column-store) database be faster than simply having the (row-store) data fully cached in the buffer cache ?

Maria Colgan has addressed this question in part 3 of her series on In-Memory Database (see catalogue below), but I thought I’d repeat the basic ideas with a few imaginative numbers thrown in to give a better flavour of what’s going on. So imagine you have a table sized at 100GB, with 100 columns of data where every column holds data of a similar size and pattern; and want to execute a query of the form: select {list of columns} from big_table where colX >=  {some constant}.

Traditional Tablescan (approximation) with table fully cached

For each block of the 100GB, Oracle has to acquire the “cache buffers chains” latch, pin the block, drop the latch, and scan the block testing each row, then acquire the latch, unpin the block, and drop the latch.  Scanning the block requires a walk through the row directory and, for each row pointer, jumping to the correct location in the block for the row, stepping along the row one column at a time to get to the correct row, and then checking the column  value. If the column matches the predicate extract, format and return the required columns from that row.

It’s a lot of memory to scan, in a large number of small steps, involving a lot of latching and pinning – which translates into a lot of CPU. On the plus side, although it’s very expensive to identify the required rows, it’s very cheap to construct and return a row once you’ve identified it.

In-memory scan (approximation)

  1. Given the way I’ve described the table (100GB, 100 similar columns), Oracle can recreate it in memory as 100 lists of 1GB each; so we can identify the rows we want by scanning one of those lists and applying the predicate – so only 1GB of (fairly contigious) memory to scan, rather than 100GB, and virtually no latching and pinning to find that memory, and no jumping around following pointers and counting along rows.
  2. But it’s probably NOT 1GB of memory to scan, because Oracle has some compression/deduplication methods that it can apply to the data that often reduces the memory requirement by a factor of four of five – so perhaps it’s only 250 MB of memory to scan.
  3. But Oracle breaks large lists into chunks, so rather than 250MB of contiguous memory, it’s likely to be 250 chunks of 1MB; and as part of the description of each chunk Oracle records the lowest and highest value in the chunk; [ed:  Christian Antognini says that the metadata list every distinct value for the chunk] so it can check the predicate against the boundary values on the chunk and be in a position to discard entire chunks without looking at their content. So, depending on the actual content and pattern of the data, we may examine only a handful of chunks in detail, dropping the scan from 250MB to, perhaps, 10MB.
  4. And we still haven’t finished there, because the code that handles the column-store is able to take advantage of the SIMD (Single Instruction,  Multiple Data) operations in the CPU to check the predicate against 4, or possibly even 8, values simultaneously at a speed matching a single column comparison that has to be used for the traditional cached row-store. (Given the way that Oracle  (probably) handles the compression, I suspect that this final advantage is only significant for range-based predicates – but that’s purely conjectural).

So the huge benefit you get from In-Memory column store, compared to “fully cached row-store”  is that you are likely to scan far less memory to identify the rows that match your predicate, and do it with far less “infrastructure” activity like latching and pinning. The potential saving in CPU usage is huge.

There is, of course, a penalty to pay. As you identify the rows of interest you can (in effect) construct a bitmap representing the position of those rows in the table (and if you have predicates on more than 1 column you can use bitmap operations on the individual column bitmaps to identify the rows you want in the final result) but then you have to construct the row that goes into the result set. If your query is interested in just 5 columns that means using the bitmap to locate the correct entry from each of 5 separate column lists; if your query is interested in 99 column that means extracting the correct entry from each of 99 separate column lists. Identifying the rows you want can be very  quick, building the final result may be relatively slow.

Soundbite summary

  • Using the In-memory Database, you can identify the rows you want very quickly but it’s relatively slow to reconstruct them.
  • Using a fully cached traditional row-store, it’s relatively slow to identify the rows you want, but once you’ve found them you spend no time reconstructing them.

Bear in mind that this is an extremely simplified analysis and ignores all sorts of details about read-consistency, the probability of physical reads, the probability of scanning blocks instead of scanning chunks, and so on; my focus is only on the type of activity that differentiates row-store handling from column-store handling when all the data is in memory so that you can have some appreciation of why the benefits available from In-memory DB can vary with the pattern of the data and the way you use it.

Catalogue of blog posts by Maria Colgan and Andy Rivenes:

Other articles on In-memory DB:

 

January 21, 2015

LOB Space

Filed under: ASSM,Infrastructure,LOBs,Oracle — Jonathan Lewis @ 1:26 pm GMT Jan 21,2015

Following on from a recent “check the space” posting, here’s another case of the code not reporting what you thought it would, prompted by a question on the OTN database forum about a huge space discrepancy in LOBs.

There’s a fairly well-known package called dbms_space that can give you a fairly good idea of the space used by a segment stored in a tablespace that’s using automatic segment space management. But what can you think when a piece of code (written by Tom Kyte, no less) reports the following stats about your biggest LOB segment:


Unformatted Blocks .....................             107
FS1 Blocks (0-25)  .....................               0
FS2 Blocks (25-50) .....................               0
FS3 Blocks (50-75) .....................               0
FS4 Blocks (75-100).....................               0
Full Blocks        .....................         859,438
Total Blocks............................       1,746,304
Total Bytes.............................  14,305,722,368

Of the available 1.7M blocks approximately 890,000 seem to have gone missing!

I tend to think that the first thing to do when puzzled by unexpected numbers is to check for patterns in the arithmetic. First (though not particularly interesting) the LOB segment seems to be using the standard 8KB blocksize: 1,746,304 * 8192 = 14,305,722,368; more interestingly, although only approximately true, the number of full blocks is pretty close to half the total blocks – does this give you a hint about doing a little test.


create table test_lobs (
        id              number(5),
        text_content    clob
)
lob (text_content) store as text_lob(
        disable storage in row
        chunk 32K
        tablespace test_8k_1m_assm
)
;

begin
        for i in 1..1000 loop
                insert into test_lobs values(
                        i, 'x'
                );
                commit;
        end loop;
end;
/

I’ve created a table with a LOB segment, storing LOBs out of row with a chunk size of 32KB in a tablespace which (using my naming convention) is locally managed, 8KB blocksize, uniform 1MB extents, using ASSM. So what do I see if I try to check the space usage through calls to the dbms_space package ? (There’s some sample code to do this in the comments of the blog I linked to earlier, but I’ve used some code of my own for the following – the first figure reported is the blocks, the second the bytes):


====
ASSM
====

Unformatted                   :      119 /    3,899,392
Freespace 1 (  0 -  25% free) :        0 /            0
Freespace 2 ( 25 -  50% free) :        0 /            0
Freespace 3 ( 50 -  75% free) :        0 /            0
Freespace 4 ( 75 - 100% free) :        0 /            0
Full                          :    1,000 /   32,768,000

=======
Generic
=======
Segment Total blocks: 4224
Object Unused blocks: 0

Apparently the segment has allocated 4,224 blocks, but we’ve only used 1,000 of them, with 119 unformatted and 3,105 “missing”; strangely, though, the 1,000 “Full blocks” are simultaneously reported as 32,768,000 bytes … and suddenly the light dawns. The dbms_space package is NOT counting blocks, it’s counting chunks; more specifically it’s counting “bits” in the bitmap space management blocks for the LOB segment and (I think I’ve written this somewhere, possibly as far back as Practical Oracle 8i) the bits in a LOB segment represent chunks, not blocks.

Conclusion:

The OP has set a 16KB chunksize with a 8KB block size. His numbers look fairly self-consistent: 1,746,304 –  (2 * 859,438) – 107 = 27,321; the difference is about 1.6% of the total allocation, which is in the right ballpark for the space management blocks, especially if the segment is in a tablespace using with 1MB uniform extents.

 

January 19, 2015

Bitmap Counts

Filed under: bitmaps,Indexing,Oracle,Performance,Troubleshooting — Jonathan Lewis @ 12:15 pm GMT Jan 19,2015

In an earlier (not very serious) post about count(*) I pointed out how the optimizer sometimes does a redundant “bitmap conversion to rowid” when counting. In the basic count(*) example I showed this wasn’t a realistic issue unless you had set cursor_sharing to “force” (or the now-deprecated “similar”). There are, however, some cases where the optimizer can do this in more realistic circumstances and this posting models a scenario I came across a few years ago. The exact execution path has changed over time (i.e. version) but the anomaly persists, even in 12.1.0.2.

First we create a “fact” table and a dimension table, with a bitmap index on the fact table and a corresponding primary key on the dimension table:


create table area_sales (
	area		varchar2(10)	not null,
	dated		date		not null,
	category	number(3)	not null,
	quantity	number(8,0),
	value		number(9,2),
	constraint as_pk primary key (dated, area),
	constraint as_area_ck check (area in ('England','Ireland','Scotland','Wales'))
)
;

insert into area_sales
with generator as (
	select	--+ materialize
		rownum 	id
	from	all_objects
	where	rownum <= 3000
)
select
	decode(mod(rownum,4),
		0,'England',
		1,'Ireland',
		2,'Scotland',
		3,'Wales'
	),
	sysdate + 0.0001 * rownum,
	mod(rownum-1,300),
	rownum,
	rownum
from
	generator,
	generator
where
	rownum <= 1e6
;

create bitmap index as_bi on area_sales(category) pctfree 0;

create table dim (
	id	number(3) not null,
	padding	varchar2(40)
)
;

alter table dim add constraint dim_pk primary key(id);

insert into dim
select
	distinct category, lpad(category,40,category)
from	area_sales
;

commit;

begin
	dbms_stats.gather_table_stats(
		ownname		 => user,
		tabname		 =>'AREA_SALES',
		method_opt 	 => 'for all columns size 1',
		cascade		 => true
	);

	dbms_stats.gather_table_stats(
		ownname		 => user,
		tabname		 =>'DIM',
		method_opt 	 => 'for all columns size 1',
		cascade		 => true
	);
end;
/

Now we run few queries and show their execution plans with rowsource execution statistics. First a query to count the number of distinct categories used in the area_sales tables, then a query to list the IDs from the dim table that appear in the area_sales table, then the same query hinted to run efficiently.


set trimspool on
set linesize 156
set pagesize 60
set serveroutput off

alter session set statistics_level = all;

select
	distinct category
from
	area_sales
;

select * from table(dbms_xplan.display_cursor(null,null,'allstats last'));

==========================================
select  distinct category from  area_sales
==========================================
---------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                    | Name  | Starts | E-Rows | A-Rows |   A-Time   | Buffers |  OMem |  1Mem | Used-Mem |
---------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |       |      1 |        |    300 |00:00:00.01 |     306 |       |       |          |
|   1 |  HASH UNIQUE                 |       |      1 |    300 |    300 |00:00:00.01 |     306 |  2294K|  2294K| 1403K (0)|
|   2 |   BITMAP INDEX FAST FULL SCAN| AS_BI |      1 |   1000K|    600 |00:00:00.01 |     306 |       |       |          |
---------------------------------------------------------------------------------------------------------------------------

As you can see, Oracle is able to check the number of distinct categories very quickly by scanning the bitmap index and extracting ONLY the key values from each of the 600 index entries that make up the whole index (the E-rows figure effectively reports the number of rowids identified by the index, but Oracle doesn’t evaluate them to answer the query).


=======================================================================
select  /*+   qb_name(main)  */  dim.* from dim where  id in (   select
   /*+     qb_name(subq)    */    distinct category   from
area_sales  )
========================================================================

----------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                     | Name  | Starts | E-Rows | A-Rows |   A-Time   | Buffers |  OMem |  1Mem | Used-Mem |
----------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT              |       |      1 |        |    300 |00:00:10.45 |     341 |       |       |          |
|*  1 |  HASH JOIN SEMI               |       |      1 |    300 |    300 |00:00:10.45 |     341 |  1040K|  1040K| 1260K (0)|
|   2 |   TABLE ACCESS FULL           | DIM   |      1 |    300 |    300 |00:00:00.01 |      23 |       |       |          |
|   3 |   BITMAP CONVERSION TO ROWIDS |       |      1 |   1000K|    996K|00:00:02.64 |     318 |       |       |          |
|   4 |    BITMAP INDEX FAST FULL SCAN| AS_BI |      1 |        |    599 |00:00:00.01 |     318 |       |       |          |
----------------------------------------------------------------------------------------------------------------------------

What we see here is that (unhinted) oracle has converted the IN subquery to an EXISTS subquery then to a semi-join which it has chosen to operate as a HASH semi-join. But in the process of generating the probe (sescond) table Oracle has converted the bitmap index entries into a set of rowids – all 1,000,000 of them in my case – introducing a lot of redundant work. In the original customer query (version 9 or 10, I forget which) the optimizer unnested the subquery and converted it into an inline view with a distinct – but still performed a redundant bitmap conversion to rowids. In the case of the client, with rather more than 1M rows, this wasted a lot of CPU.


=====================================================================
select  /*+   qb_name(main)  */  dim.* from (  select   /*+
qb_name(inline)    no_merge    no_push_pred   */   distinct category
from   area_sales  ) sttv,  dim where  dim.id = sttv.category
=====================================================================

-----------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                      | Name  | Starts | E-Rows | A-Rows |   A-Time   | Buffers |  OMem |  1Mem | Used-Mem |
-----------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT               |       |      1 |        |    300 |00:00:00.02 |     341 |       |       |          |
|*  1 |  HASH JOIN                     |       |      1 |    300 |    300 |00:00:00.02 |     341 |  1969K|  1969K| 1521K (0)|
|   2 |   VIEW                         |       |      1 |    300 |    300 |00:00:00.01 |     306 |       |       |          |
|   3 |    HASH UNIQUE                 |       |      1 |    300 |    300 |00:00:00.01 |     306 |  2294K|  2294K| 2484K (0)|
|   4 |     BITMAP INDEX FAST FULL SCAN| AS_BI |      1 |   1000K|    600 |00:00:00.01 |     306 |       |       |          |
|   5 |   TABLE ACCESS FULL            | DIM   |      1 |    300 |    300 |00:00:00.01 |      35 |       |       |          |
-----------------------------------------------------------------------------------------------------------------------------

By introducing a manual unnest in the original client code I avoided the bitmap conversion to rowid, and the query executed much more efficiently. As you can see the optimizer has predicted the 1M rowids in the inline view, but used only the key values from the 600 index entries. In the case of the client it really was a case of manually unnesting a subquery that the optimizer was automatically unnesting – but without introducing the redundant rowids.

In my recent 12c test I had to include the no_merge and no_push_pred hints. In the absence of the no_merge hint Oracle did a join then group by, doing the rowid expansion in the process; if I added the no_merge hint without the no_push_pred hint then Oracle did a very efficient nested loop semi-join into the inline view. Although this still did the rowid expansion (predicting 3,333 rowids per key) it “stops early” thanks to the “semi” nature of the join so ran very quickly:


=========================================================================
select  /*+   qb_name(main)  */  dim.* from (  select   /*+
qb_name(inline)    no_merge   */   distinct category  from   area_sales
 ) sttv,  dim where  dim.id = sttv.category
=========================================================================

-------------------------------------------------------------------------------------------------
| Id  | Operation                     | Name  | Starts | E-Rows | A-Rows |   A-Time   | Buffers |
-------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT              |       |      1 |        |    300 |00:00:00.02 |     348 |
|   1 |  NESTED LOOPS SEMI            |       |      1 |    300 |    300 |00:00:00.02 |     348 |
|   2 |   TABLE ACCESS FULL           | DIM   |      1 |    300 |    300 |00:00:00.01 |      35 |
|   3 |   VIEW PUSHED PREDICATE       |       |    300 |   3333 |    300 |00:00:00.01 |     313 |
|   4 |    BITMAP CONVERSION TO ROWIDS|       |    300 |   3333 |    300 |00:00:00.01 |     313 |
|*  5 |     BITMAP INDEX SINGLE VALUE | AS_BI |    300 |        |    300 |00:00:00.01 |     313 |
-------------------------------------------------------------------------------------------------

Bottom line on all this – check your execution plans that use bitmap indexes – if you see a “bitmap conversion to rowids” in cases where you don’t then visit the table it may be a redundant conversion, and it may be expensive. If you suspect that this is happening then dbms_xplan.display_cursor() may confirm that you are doing a lot of CPU-intensive work to produce a very large number of rowids that you don’t need.

January 16, 2015

Spatial space

Filed under: Infrastructure,Oracle,Statistics,Troubleshooting — Jonathan Lewis @ 1:00 pm GMT Jan 16,2015

One thing you (ought to) learn very early on in an Oracle career is that there are always cases you haven’t previously considered. It’s a feature that is frequently the downfall of “I found it on the internet” SQL.  Here’s one (heavily paraphrased) example that appeared on the OTN database forum a few days ago:

select table_name,round((blocks*8),2)||’kb’ “size” from user_tables where table_name = ‘MYTABLE';

select table_name,round((num_rows*avg_row_len/1024),2)||’kb’ “size” from user_tables where table_name = ‘MYTABLE';

The result from the first query is 704 kb,  the result from the second is 25.4 kb … fragmentation, rebuild, CTAS etc. etc.

The two queries are perfectly reasonable approximations (for an 8KB block size, with pctfree of zero) for the allocated space and actual data size for a basic heap table – and since the two values here don’t come close to matching it’s perfectly reasonable to consider doing something like a rebuild or shrink space to reclaim space and (perhaps) to improve performance.

In this case it doesn’t look as if the space reclaimed is likely to be huge (less than 1MB), on the other hand it’s probably not going to take much time to rebuild such a tiny table; it doesn’t seem likely that the rebuild could make a significant difference to performance (though apparently it did), but the act of rebuilding might cause execution plans to change for the better because new statistics might appear as the rebuild took place. The figures came from a test system, though, so maybe the table on the production system was much larger and the impact would be greater.

Being cautious about wasting time and introducing risk, I made a few comments about the question –  and learned that one of the columns was of type SDO_GEOMETRY. This makes a big difference about what to do next, because dbms_stats.gather_table_stats() doesn’t process such columns correctly, which results in a massive under-estimate for the avg_row_len (which is basically the sum of avg_col_len for the table). Here’s an example (run on 12c, based on some code taken from the 10gR2 manuals):


drop table cola_markets purge;

CREATE TABLE cola_markets (
  mkt_id NUMBER,
  name VARCHAR2(32),
  shape SDO_GEOMETRY);

INSERT INTO cola_markets VALUES(
  1,
  'cola_a',
  SDO_GEOMETRY(
    2003,  -- two-dimensional polygon
    NULL,
    NULL,
    SDO_ELEM_INFO_ARRAY(1,1003,3), -- one rectangle (1003 = exterior)
    SDO_ORDINATE_ARRAY(1,1, 5,7) -- only 2 points needed to
          -- define rectangle (lower left and upper right) with
          -- Cartesian-coordinate data
  )
);

insert into cola_markets select * from cola_markets;
/
/
/
/
/
/
/
/
/

execute dbms_stats.gather_table_stats(user,'cola_markets')
select
	avg_row_len, num_rows, blocks,
	round(avg_row_len * num_rows / 7200,0) expected_blocks
from user_tables where table_name = 'COLA_MARKETS';

analyze table cola_markets compute statistics;
select
	avg_row_len, num_rows, blocks,
	round(avg_row_len * num_rows / 7200,0) expected_blocks
from user_tables where table_name = 'COLA_MARKETS';

If you care to count the number of times I execute the “insert as select” it’s 10, so the table ends up with 2^10 = 1024 rows. The 7,200 in the calculated column converts bytes to approximate blocks on the assumption of 8KB blocks and pctfree = 10. Here are the results following the two different methods for generating object statistics:


PL/SQL procedure successfully completed.

AVG_ROW_LEN   NUM_ROWS     BLOCKS EXPECTED_BLOCKS
----------- ---------- ---------- ---------------
         14       1024        124               2

Table analyzed.

AVG_ROW_LEN   NUM_ROWS     BLOCKS EXPECTED_BLOCKS
----------- ---------- ---------- ---------------
        109       1024        124              16

Where does the difference in Expected_blocks come from ? (The Blocks figures is 124 because I’ve used 1MB uniform extents – 128 block – under ASSM (which means 4 space management blocks at the start of the first extent.)

Here are the column lengths after the call to dbms_stats: as you can see the avg_row_len is the sum of avg_col_len.


select column_name, data_type, avg_col_len
from   user_tab_cols
where  table_name = 'COLA_MARKETS'
order by
        column_id
;

COLUMN_NAME          DATA_TYPE                AVG_COL_LEN
-------------------- ------------------------ -----------
MKT_ID               NUMBER                             3
NAME                 VARCHAR2                           7
SYS_NC00010$         SDO_ORDINATE_ARRAY
SHAPE                SDO_GEOMETRY
SYS_NC00008$         NUMBER                             0
SYS_NC00004$         NUMBER                             4
SYS_NC00005$         NUMBER                             0
SYS_NC00006$         NUMBER                             0
SYS_NC00007$         NUMBER                             0
SYS_NC00009$         SDO_ELEM_INFO_ARRAY

The figures from the analyze command are only slightly different, but fortunately the analyze command uses the row directory pointers to calculate the actual row allocation, so picks up information about the impact of inline varrays, LOBs, etc. that the dbms_stats call might not be able to handle.


COLUMN_NAME          DATA_TYPE                AVG_COL_LEN
-------------------- ------------------------ -----------
MKT_ID               NUMBER                             2
NAME                 VARCHAR2                           6
SYS_NC00010$         SDO_ORDINATE_ARRAY
SHAPE                SDO_GEOMETRY
SYS_NC00008$         NUMBER                             1
SYS_NC00004$         NUMBER                             3
SYS_NC00005$         NUMBER                             1
SYS_NC00006$         NUMBER                             1
SYS_NC00007$         NUMBER                             1
SYS_NC00009$         SDO_ELEM_INFO_ARRAY

As a basic reminder – whenever you do anything slightly non-trivial (e.g. something you couldn’t have done in v5, say) then remember that all those dinky little script things you find on the Internet might not actually cover your particular case.

January 15, 2015

Execution Plans

Filed under: Execution plans,Oracle — Jonathan Lewis @ 11:00 am GMT Jan 15,2015

This is the index to a series of articles I’ve been writing for redgate, published on their AllThingsOracle site, about generating and interpreting execution plans.

When I started I didn’t really know how many parts it was going to end up as, I had thought maybe 5 or 6 but that was a wildly inaccurate estimate. I am now fairly confident that it’s going to stop at 14 – but the last two articles are still under construction so there’s still a little room for error.

Update 4th March: Parts 13 and 14 should be published some time in the next couple of weeks.

January 14, 2015

Bind Effects

Filed under: Execution plans,Oracle,Troubleshooting — Jonathan Lewis @ 1:24 pm GMT Jan 14,2015

A couple of days ago I highlighted an optimizer anomaly caused by the presence of an index with a descending column. This was a minor (unrelated) detail that appeared in a problem on OTN where the optimizer was using an index FULL scan when someone was expecting to see an index RANGE scan. My earlier posting supplies the SQL to create the table and indexes I used to model the problem – and in this posting I’ll explain the problem and answer the central question.

Here’s the query and execution plan (from 11.2.0.x) as supplied by the OP – the odd appearance of the sys_op_descend() function calls is the minor detail that I explained in the previous post, but that’s not really relevant to the question of why Oracle is using an index full scan rather than an index range scan. The /*+ first_rows */ hint isn’t something you should be using but it was in the OP’s query, so I’ve included it in my model:


select /*+ FIRST_ROWS gather_plan_statistics scanned */ count(1)      FROM  XXX
where  (((((COL1 = '003' and COL2 >= '20150120') and COL3 >= '00000000') and COL4>= '000000000000' )
or ((COL1 = '003' and COL2 >= '20150120') and COL3> '00000000' )) or (COL1= '003' and COL2> '20150120'))
order by COL1,COL2,COL3,COL4  

Plan hash value: 919851669  

---------------------------------------------------------------------------------------------------------
| Id  | Operation                   | Name   | Starts | E-Rows | A-Rows |   A-Time   | Buffers | Reads  |
---------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |        |      1 |        |  18533 |00:01:47.04 |    156K |  70286 |
|   1 |  TABLE ACCESS BY INDEX ROWID| XXX    |      1 |  7886K |  18533 |00:01:47.04 |    156K |  70286 |
|*  2 |   INDEX FULL SCAN           | XXXXPK |      1 |  7886K |  18533 |00:01:30.36 |    131K |  61153 |
---------------------------------------------------------------------------------------------------------  

Predicate Information (identified by operation id):
---------------------------------------------------
  2 - filter((("COL2">:B2 AND "COL1"=:B1 AND
              SYS_OP_DESCEND("COL2")<SYS_OP_DESCEND(:B2)) OR ("COL1"=:B1 AND "COL2">=:B2
              AND "COL3">:B3 AND SYS_OP_DESCEND("COL2")<=SYS_OP_DESCEND(:B2)) OR
              ("COL1"=:B1 AND "COL2">=:B2 AND "COL3">=:B3 AND "COL4">=:B4 AND
              SYS_OP_DESCEND("COL2")<=SYS_OP_DESCEND(:B2))))  

If you look closely you’ll see that the OP has NOT supplied the output from a call to dbms_xplan.display_cursor() – the column and table names are highly suspect (but that’s allowable cosmetic change for confidentiality reasons) the giveaway is that the SQL statement uses literals but the execution plan is using bind variables (which are of the form B{number}, suggesting that the real SQL is embedded in PL/SQL with PL/SQL variables being used to supply values): the bind variables make a difference.

Let’s go back to my model to demonstrate the problem. Here’s a query with the same predicate structure as the problem query (with several pairs of brackets eliminated to improve readability) showing the actual run-time plan (from 11.2.0.4) when using literals:


select
        /*+ first_rows */
        *
from t1
where
        (C1 = 'DE' and C2 >  'AB')
or      (C1 = 'DE' and C2 >= 'AB' and C3 > 'AA' )
or      (C1 = 'DE' and C2 >= 'AB' and C3 >= 'AA' and C4 >= 'BB')
order by
        C1, C2, C3, C4
;

---------------------------------------------------------------------------------------
| Id  | Operation                   | Name    | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |         |       |       |     4 (100)|          |
|   1 |  TABLE ACCESS BY INDEX ROWID| T1      |    21 |  2478 |     4  (25)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN          | T1_IASC |    21 |       |     3  (34)| 00:00:01 |
---------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("C1"='DE')
       filter(((SYS_OP_DESCEND("C2")<SYS_OP_DESCEND('AB') AND "C2">'AB') OR
              (SYS_OP_DESCEND("C2")<=SYS_OP_DESCEND('AB') AND "C3">'AA' AND "C2">='AB') OR
              (SYS_OP_DESCEND("C2")<=SYS_OP_DESCEND('AB') AND "C4">='BB' AND "C2">='AB' AND
              "C3">='AA')))

As you can see, the optimizer has managed to “factor out” the predicate C1 = ‘DE’ from the three disjuncts and has then used it as an access() predicate for an index range scan. Now let’s see what the code and plan look like if we replace the four values by four bind variables:


variable B1 char(2)
variable B2 char(2)
variable B3 char(2)
variable B4 char(2)

begin
        :b1 := 'DE';
        :b2 := 'AB';
        :b3 := 'AA';
        :b4 := 'BB';
end;
/

select
        /*+ first_rows */
        *
from t1
where
        (C1 = :B1  and C2 >  :B2 )
or      (C1 = :B1  and C2 >= :B2 and C3 >  :B3 )
or      (C1 = :B1  and C2 >= :B2 and C3 >= :B3 and C4 >= :B4)
order by C1, C2, C3, C4
;

---------------------------------------------------------------------------------------
| Id  | Operation                   | Name    | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |         |       |       |    31 (100)|          |
|   1 |  TABLE ACCESS BY INDEX ROWID| T1      |   437 | 51566 |    31   (4)| 00:00:01 |
|*  2 |   INDEX FULL SCAN           | T1_IASC |   437 |       |    27   (4)| 00:00:01 |
---------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - filter((("C1"=:B1 AND "C2">:B2) OR ("C1"=:B1 AND "C3">:B3 AND
              "C2">=:B2) OR ("C1"=:B1 AND "C4">=:B4 AND "C2">=:B2 AND "C3">=:B3)))

The optimizer hasn’t factored out the common expression C1 = :B1. The reason for this, I think, is that though WE know that :B1 is supposed to be the same thing in all three occurrences the optimizer isn’t able to assume that that’s the case; in principle :B1 could be the place holder for 3 different values – so the optimizer plays safe and optimizes for that case. This leaves it with three options: Full tablescan with filter predicates, index full scan with filter predicates, three-part concatenation with index range scans in all three parts. The combination of the /*+ first_rows */ hint and the “order by” clause which matches the t1_1asc index has left the optimizer choosing the index full scan path – presumably to avoid the need to collect all the rows and sort them before returning the first row.

Given our understanding of the cause of the problem we now have a clue about how we might make the query more efficient – we have to eliminate the repetition of (at least) the :B1 bind variable. In fact we can get some extra mileage by modifying the repetition of the :B2 bind variable. Here’s a rewrite that may help:


select
        /*+ first_rows */
        *
from t1
where
        (C1 = :B1 and C2 >= :B2)
and     (
             C2 > :B2
         or  C3 > :B3
         or (C3 >= :B3 and C4 > :B4)
        )
order by C1, C2, C3, C4
;

---------------------------------------------------------------------------------------
| Id  | Operation                   | Name    | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |         |       |       |     4 (100)|          |
|   1 |  TABLE ACCESS BY INDEX ROWID| T1      |   148 | 17464 |     4   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN          | T1_IASC |   148 |       |     2   (0)| 00:00:01 |
---------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("C1"=:B1 AND "C2">=:B2 AND "C2" IS NOT NULL)
       filter(("C2">:B2 OR "C3">:B3 OR ("C4">:B4 AND "C3">=:B3)))

I’ve factored out as much of the C1 and C2 predicates as I can – and the optimizer has used the resulting conditions as the access() predicate on the index (adding in a “not null” predicate on C2 that looks redundant to me – in fact the index was on the primary key in the original, but I hadn’t included that constraint in my model). You’ll notice, by the way, that the cardinality is now 148; compare this with the previous cardinality of 437 and you might (without bothering to look closely as the 10053 trace) do some hand-waving around the fact that 437 = (approximately) 148 * 3, which fits the idea that the optimizer was treating the three :B1 appearances as if they were three different possible values accessing three sets of data.

Miscellaneous.

This isn’t the end of the story; there are always more complications and surprises in store as you look further into the detail. For example, on the upgrade to 12c the execution plan for the query with bind variables was the same (ignoring the sys_op_descend() functions) as the query using literals – the optimizer managed to factor out the C1 predicate: does this mean SQL*Plus got smarter about telling the optimizer about the bind variables, or does it mean the optimizer got smarter about something that SQL*Plus has been doing all along ?

This change might make you think that the optimizer is supposed to assume that bind variables of the same name represent the same thing – but that’s not correct, and it’s easy to show; here’s a trivial example (accessing the same table with a query that, for my data, identifies the first row):


declare
        m_id number := 1;
        m_c1 char(2) := 'BV';
        m_c2 char(2) := 'GF';
        m_n number := 0;
begin
        execute immediate
                'SELECT /*+ FIND THIS */ COUNT(*) FROM T1 WHERE ID = :B1 AND C1 = :B1 AND C2 = :B1'
                into m_n
                using m_id, m_c1, m_c2
        ;
end;
/

select sql_id, sql_text from V$sql where sql_text like 'SELECT%FIND THIS%';

SQL_ID        SQL_TEXT
------------- ----------------------------------------------------------------------------------
9px3nuv54maxp SELECT /*+ FIND THIS */ COUNT(*) FROM T1 WHERE ID = :B1 AND C1 = :B1 AND C2 = :B1

If you were looking at the contents of v$sql, or a trace file, or an AWR report, you might easily be fooled into thinking that this was a query where the same value had been used three times – when we know that it wasn’t.

So, as we upgrade from 11g to 12c my model of the original problem suggests that the problem is going to go away – but, actually, I don’t really know why that’s the case (yet). On the other hand, I have at least recognised a pattern that the 11g optimizer currently has a problem with, and I have a method for helping the optimizer to be a little more efficient.

 

January 12, 2015

FBI Bug reprise

Filed under: Execution plans,Oracle,Troubleshooting,Uncategorized — Jonathan Lewis @ 12:52 pm GMT Jan 12,2015

I’ve just had cause to resurrect a blog note I wrote three years ago. The note says that an anomaly I discovered in 9.2.0.8 wasfixed in 10.2.0.3 – and this is true for the simple example in the posting; but a recent question on the OTN database forum has shown that the bug still appears in more complex cases.  Here’s some code to create a table and two indexes:


create table t1
nologging
as
with generator as (
	select	--+ materialize
		rownum id
	from dual
	connect by
		level <= 1e4
)
select
	rownum			id,
	cast(dbms_random.string('U',2) as char(2))	c1,
	cast(dbms_random.string('U',2) as char(2))	c2,
	cast(dbms_random.string('U',2) as char(2))	c3,
	cast(dbms_random.string('U',2) as char(2))	c4,
	rpad('x',100)		padding
from
	generator	v1,
	generator	v2
where
	rownum <= 1e5
;

begin
	dbms_stats.gather_table_stats(
		ownname		 => user,
		tabname		 =>'T1',
		method_opt	 => 'for all columns size 1'
	);
end;
/

create index t1_iasc  on t1(c1, c2,      c3, c4) nologging;
create index t1_idesc on t1(c1, c2 desc, c3, c4) nologging;

I’ve designed the table to model the way a problem was presented on OTN, it’s possible that the anomaly would appear in simpler circumstance; note that I have two indexes on the same four columns, but the second column of one of the indexes is declared as descending. To identify the indexes easily in execution plans the latter index has the text “desc” in its name. So here’s a query, with execution plan, that should use one of these indexes:


select
        *
from t1
where
        (C1 = 'DE' and C2 >  'AB')
or      (C1 = 'DE' and C2 >= 'AB' and C3 > 'AA' )
or      (C1 = 'DE' and C2 >= 'AB' and C3 >= 'AA' and C4 >= 'BB')
order by
        C1, C2, C3, C4
;

---------------------------------------------------------------------------------------
| Id  | Operation                   | Name    | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |         |       |       |     4 (100)|          |
|   1 |  TABLE ACCESS BY INDEX ROWID| T1      |    21 |  2478 |     4  (25)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN          | T1_IASC |    21 |       |     3  (34)| 00:00:01 |
---------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("C1"='DE')
       filter(((SYS_OP_DESCEND("C2")<SYS_OP_DESCEND('AB') AND "C2">'AB') OR
              (SYS_OP_DESCEND("C2")<=SYS_OP_DESCEND('AB') AND "C3">'AA' AND "C2">='AB') OR
              (SYS_OP_DESCEND("C2")<=SYS_OP_DESCEND('AB') AND "C4">='BB' AND "C2">='AB' AND
              "C3">='AA')))

Sure enough – the query has used the t1_iasc index – but why has the optimizer introduced all those predicates with the sys_op_descend() function calls in them when we’re not using an index with a descending column ? Somewhere in the code path the optimizer has picked up the other index, and decided to use some information from it that is totally redundant. One side effect of this is that the cardinality prediction is 21 – if I drop the index t1_idesc the sys_op_descend() calls disappear and the cardinality changes to 148.

Oracle 12c behaves differently – it uses concatenation to turn the query into three separate index probes unless I add the hint /*+ first_rows */ (which I shouldn’t be using, but it’s the hint supplied by the OP on OTN). With the hint in place we get an example of the actual execution plan differing from the prediction made through explain plan:


12c execution plan unhinted (concatenation, and lots of sys_op_descend()):

--------------------------------------------------------------------------------------------------
| Id  | Operation                             | Name     | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                      |          |       |       |    12 (100)|          |
|   1 |  SORT ORDER BY                        |          |   149 | 17582 |    12   (9)| 00:00:01 |
|   2 |   CONCATENATION                       |          |       |       |            |          |
|   3 |    TABLE ACCESS BY INDEX ROWID BATCHED| T1       |   142 | 16756 |     4   (0)| 00:00:01 |
|*  4 |     INDEX RANGE SCAN                  | T1_IDESC |     1 |       |     2   (0)| 00:00:01 |
|   5 |    TABLE ACCESS BY INDEX ROWID BATCHED| T1       |     6 |   708 |     4   (0)| 00:00:01 |
|*  6 |     INDEX RANGE SCAN                  | T1_IDESC |     1 |       |     2   (0)| 00:00:01 |
|   7 |    TABLE ACCESS BY INDEX ROWID BATCHED| T1       |     1 |   118 |     3   (0)| 00:00:01 |
|*  8 |     INDEX RANGE SCAN                  | T1_IASC  |     1 |       |     2   (0)| 00:00:01 |
--------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   4 - access("C1"='DE' AND "C3">='AA' AND "C4">='BB' AND
              SYS_OP_DESCEND("C2")<=SYS_OP_DESCEND('AB'))
       filter("C4">='BB' AND SYS_OP_UNDESCEND(SYS_OP_DESCEND("C2"))>='AB' AND "C3">='AA')
   6 - access("C1"='DE' AND "C3">'AA' AND SYS_OP_DESCEND("C2")<=SYS_OP_DESCEND('AB'))
       filter("C3">'AA' AND SYS_OP_UNDESCEND(SYS_OP_DESCEND("C2"))>='AB' AND
              (LNNVL("C4">='BB') OR LNNVL(SYS_OP_DESCEND("C2")<=SYS_OP_DESCEND('AB')) OR
              LNNVL(SYS_OP_UNDESCEND(SYS_OP_DESCEND("C2"))>='AB') OR LNNVL("C3">='AA')))
   8 - access("C1"='DE' AND "C2">'AB' AND "C2" IS NOT NULL)
       filter((LNNVL("C3">'AA') OR LNNVL(SYS_OP_DESCEND("C2")<=SYS_OP_DESCEND('AB')) OR
              LNNVL("C2">='AB')) AND (LNNVL("C4">='BB') OR
              LNNVL(SYS_OP_DESCEND("C2")<=SYS_OP_DESCEND('AB')) OR LNNVL("C2">='AB') OR
              LNNVL("C3">='AA')))

12c Execution plan with first_rows hint (and the sys_op_descend have gone)
---------------------------------------------------------------------------------------
| Id  | Operation                   | Name    | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |         |       |       |   150 (100)|          |
|   1 |  TABLE ACCESS BY INDEX ROWID| T1      |   148 | 17464 |   150   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN          | T1_IASC |   148 |       |     2   (0)| 00:00:01 |
---------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("C1"='DE')
       filter(("C2">'AB' OR ("C3">'AA' AND "C2">='AB') OR ("C4">='BB' AND
              "C2">='AB' AND "C3">='AA')))

12c Execution plan with first_rows according to Explain Plan (and they're back again)
---------------------------------------------------------------------------------------
| Id  | Operation                   | Name    | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |         |   148 | 17464 |   150   (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID| T1      |   148 | 17464 |   150   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN          | T1_IASC |   148 |       |     2   (0)| 00:00:01 |
---------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("C1"='DE')
       filter("C2">'AB' AND SYS_OP_DESCEND("C2")<SYS_OP_DESCEND('AB') OR
              "C3">'AA' AND SYS_OP_DESCEND("C2")<=SYS_OP_DESCEND('AB') AND "C2">='AB' OR
              "C4">='BB' AND SYS_OP_DESCEND("C2")<=SYS_OP_DESCEND('AB') AND "C2">='AB' AND
              "C3">='AA')

On the plus side the upgrade to 12c has removed some of the sys_op_descend() appearances and seems to have fixed the cardinality anomaly when the sys_op_descend() calls do appear – but things are still going wrong in odd ways. (And this looks like another case where you could drop an index that you’re not using and seeing an execution plan change.)

Footnote:

When playing around with 12c, I had to be very careful to avoid confusing the issue as SQL Plan Directives were appearing from time to time, and giving me unexpected results as I repeated some of the experiments.

Footnote 2:

I haven’t actually addressed the question originally posed on OTN, but that can wait for another day.

January 9, 2015

count(*) – again !

Filed under: bitmaps,humour,Indexing,Oracle,Troubleshooting,Tuning — Jonathan Lewis @ 12:56 pm GMT Jan 9,2015

Because you can never have enough of a good thing.

Here’s a thought – The optimizer doesn’t treat all constants equally.  No explanations, just read the code – execution plans at the end:


SQL> drop table t1 purge;
SQL> create table t1 nologging as select * from all_objects;
SQL> create bitmap index t1_b1 on t1(owner);

SQL> alter session set statistics_level = all;

SQL> set serveroutput off
SQL> select count(*) from t1;
SQL> select * from table(dbms_xplan.display_cursor(null,null,'allstats last'));

SQL> select count(1) from t1;
SQL> select * from table(dbms_xplan.display_cursor(null,null,'allstats last'));

SQL> select count(-1) from t1;
SQL> select * from table(dbms_xplan.display_cursor(null,null,'allstats last'));

SQL> alter session set cursor_sharing = force;
SQL> alter system flush shared_pool;

SQL> select count(1) from t1;
SQL> select * from table(dbms_xplan.display_cursor(null,null,'allstats last'));

So, are you expecting to see the same results and performance from every single one of those queries ?


select count(*) from t1
----------------------------------------------------------------------------------------------------------
| Id  | Operation                     | Name  | Starts | E-Rows | A-Rows |   A-Time   | Buffers | Reads  |
----------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT              |       |      1 |        |      1 |00:00:00.01 |       9 |      5 |
|   1 |  SORT AGGREGATE               |       |      1 |      1 |      1 |00:00:00.01 |       9 |      5 |
|   2 |   BITMAP CONVERSION COUNT     |       |      1 |  84499 |     31 |00:00:00.01 |       9 |      5 |
|   3 |    BITMAP INDEX FAST FULL SCAN| T1_B1 |      1 |        |     31 |00:00:00.01 |       9 |      5 |
----------------------------------------------------------------------------------------------------------

select count(1) from t1
-------------------------------------------------------------------------------------------------
| Id  | Operation                     | Name  | Starts | E-Rows | A-Rows |   A-Time   | Buffers |
-------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT              |       |      1 |        |      1 |00:00:00.01 |       9 |
|   1 |  SORT AGGREGATE               |       |      1 |      1 |      1 |00:00:00.01 |       9 |
|   2 |   BITMAP CONVERSION COUNT     |       |      1 |  84499 |     31 |00:00:00.01 |       9 |
|   3 |    BITMAP INDEX FAST FULL SCAN| T1_B1 |      1 |        |     31 |00:00:00.01 |       9 |
-------------------------------------------------------------------------------------------------

select count(-1) from t1
-------------------------------------------------------------------------------------------------
| Id  | Operation                     | Name  | Starts | E-Rows | A-Rows |   A-Time   | Buffers |
-------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT              |       |      1 |        |      1 |00:00:00.43 |       9 |
|   1 |  SORT AGGREGATE               |       |      1 |      1 |      1 |00:00:00.43 |       9 |
|   2 |   BITMAP CONVERSION TO ROWIDS |       |      1 |  84499 |  84499 |00:00:00.22 |       9 |
|   3 |    BITMAP INDEX FAST FULL SCAN| T1_B1 |      1 |        |     31 |00:00:00.01 |       9 |
-------------------------------------------------------------------------------------------------

SQL> alter session set cursor_sharing = force;
SQL> alter system flush shared_pool;

select count(1) from t1
select count(:"SYS_B_0") from t1    -- effect of cursor-sharing
-------------------------------------------------------------------------------------------------
| Id  | Operation                     | Name  | Starts | E-Rows | A-Rows |   A-Time   | Buffers |
-------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT              |       |      1 |        |      1 |00:00:00.46 |       9 |
|   1 |  SORT AGGREGATE               |       |      1 |      1 |      1 |00:00:00.46 |       9 |
|   2 |   BITMAP CONVERSION TO ROWIDS |       |      1 |  84499 |  84499 |00:00:00.23 |       9 |
|   3 |    BITMAP INDEX FAST FULL SCAN| T1_B1 |      1 |        |     31 |00:00:00.01 |       9 |
-------------------------------------------------------------------------------------------------

Check operation 2 in each plan – with the bitmap index in place there are two possible ways to count the rows referenced in the index – and one of them converts to rowids and does a lot more work.

The only “real” threat in this set of examples, of course, is the bind variable one – there are times when count(*) WILL be faster than count(1). Having said that, there is a case where a redundant “conversion to rowids” IS a threat – and I’ll write that up some time in the near future.

Trick question: when is 1+1 != 2 ?
Silly answer: compare the plan for: “select count (2) from t1″ with the plan for “select count(1+1) from t1″

Note: All tests above run on 12.1.0.2

January 7, 2015

Most Recent

Filed under: Execution plans,Oracle,Performance,subqueries — Jonathan Lewis @ 6:21 pm GMT Jan 7,2015

There’s a thread on the OTN database forum at present asking for advice on optimising a query that’s trying to find “the most recent price” for a transaction given that each transaction is for a stock item on a given date, and each item has a history of prices where each historic price has an effective start date. This means the price for a transaction is the price as at the most recent date prior to the transaction date.

There is an absolutely standard way of expressing “the most recent occurrence” in SQL. Assume we have a table of (item_code, effective_date, price) with the obvious primary key of (item_code, effective_date), then a requirement to find “the most recent price for item XXXX as at 25th Dec 2014″ case would give us code like the following (note – all the examples in this note were run against Oracle 11.2.0.4):


select  *
from    prices  pri1
where   item_code = 'XXXX'
and     effective_date = (
                select  max(effective_date)
                from    prices  pri2
                where   pri2.item_code = 'XXXX'
                and     pri2.effective_date <= date'2014-12-25'
        )
/

The ideal execution plan that we should expect to see for this query is as follows (with a small variation if you had created the prices table as an index-organized table – which would probably be sensible in many cases):


-----------------------------------------------------------------------------------------
| Id  | Operation                      | Name   | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT               |        |     1 |    52 |     2   (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID   | PRICES |     1 |    52 |     2   (0)| 00:00:01 |
|*  2 |   INDEX UNIQUE SCAN            | PRI_PK |     1 |       |     1   (0)| 00:00:01 |
|   3 |    SORT AGGREGATE              |        |     1 |    32 |            |          |
|   4 |     FIRST ROW                  |        |     1 |    32 |     2   (0)| 00:00:01 |
|*  5 |      INDEX RANGE SCAN (MIN/MAX)| PRI_PK |     1 |    32 |     2   (0)| 00:00:01 |
-----------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("ITEM_CODE"='XXXX' AND "EFFECTIVE_DATE"= (SELECT
              MAX("EFFECTIVE_DATE") FROM "PRICES" "PRI2" WHERE
              "PRI2"."EFFECTIVE_DATE"<=TO_DATE(' 2014-12-25 00:00:00', 'syyyy-mm-dd hh24:mi:ss')
              AND "PRI2"."ITEM_CODE"='XXXX'))

   5 - access("PRI2"."ITEM_CODE"='XXXX' AND "PRI2"."EFFECTIVE_DATE"<=
             TO_DATE('2014-12-25 00:00:00', 'syyyy-mm-dd hh24:mi:ss'))

As you can see, this plan is using the “driving subquery” approach – the order of operation is 5, 4, 3, 2, 1, 0: we do an index min/max range scan in line 5 to find the maximum effective date for the item, then pass that up through the (essentially redundant) First Row and Sort Aggregate operations to use as an input to the index unique scan at operation 2 which passes the rowid up to operation 1 to find the specific row. In my case this was 2 consistent gets for the range scan, 2 more for the unique scan, and one for the table access.

You might point out that my example uses the item_code ‘XXXX’ twice, once in the main query, once in the subquery; and you might decide that this was in very poor taste since we should clearly be using a correlated subquery – the correlating predicate ought to be: pri2.item_code = pri1.item_code. Here’s the execution plan I got when I made that change:


----------------------------------------------------------------------------------------
| Id  | Operation                    | Name    | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |         |     1 |    78 |     3   (0)| 00:00:01 |
|   1 |  NESTED LOOPS                |         |     1 |    78 |     3   (0)| 00:00:01 |
|   2 |   NESTED LOOPS               |         |     1 |    78 |     3   (0)| 00:00:01 |
|   3 |    VIEW                      | VW_SQ_1 |     1 |    26 |     2   (0)| 00:00:01 |
|*  4 |     FILTER                   |         |       |       |            |          |
|   5 |      HASH GROUP BY           |         |     1 |    32 |     2   (0)| 00:00:01 |
|*  6 |       INDEX RANGE SCAN       | PRI_PK  |     1 |    32 |     2   (0)| 00:00:01 |
|*  7 |    INDEX UNIQUE SCAN         | PRI_PK  |     1 |       |     0   (0)| 00:00:01 |
|   8 |   TABLE ACCESS BY INDEX ROWID| PRICES  |     1 |    52 |     1   (0)| 00:00:01 |
----------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   4 - filter("PRI2"."ITEM_CODE"='XXXX')
   6 - access("PRI2"."ITEM_CODE"='XXXX' AND "PRI2"."EFFECTIVE_DATE"<=
              TO_DATE('2014-12-25 00:00:00', 'syyyy-mm-dd hh24:mi:ss'))
   7 - access("ITEM_CODE"='XXXX' AND "EFFECTIVE_DATE"="MAX(EFFECTIVE_DATE)")

The plan changes dramatically, the optimizer has unnested the subquery. In my case this didn’t make any difference to the overall performance as my data set was small, I only had one or two prices per item code, and the query was very basic; but in most other cases the change could be catastrophic.

The Problem Query

The requirement on OTN had a stock transactions (xo_stock_trans) table and a prices (xo_prices) table, and the OP had supplied some code to create and populate these tables with 6.4 million and 4.5 million rows respectively. Unfortunately the xo_prices table didn’t have a suitable unique constraint on it and ended up with lots of items having multiple prices for the same date.  The OP had created a function to return a price for an item given a driving date and price_type, and had a query that called that function three times per row (once for each of three price types); but this did not perform very well and the OP wanted to know if there was a way of addressing the requirement efficiently using pure SQL; (s)he had already tried the following:


select tr.item, tr.trans_date, tr.quantity
    , pr.gross_price
    , pr.net_price
    , pr.special_price
from xo_stock_trans tr
join xo_prices pr on pr.item = tr.item
                and pr.price_date = (select max(pr2.price_date)
                                     from xo_prices pr2
                                     where pr2.item = pr.item
                                       and pr2.price_date <= tr.trans_date
                                     )
where tr.trans_date between '01-AUG-2014' and '31-AUG-2014';  

That was SO close – it’s clearly implementing the right sort of strategy: but it didn’t perform well, so let’s check the execution plan:

------------------------------------------------------------------------------------------------------------
| Id  | Operation                     | Name               | Rows  | Bytes |TempSpc| Cost (%CPU)| Time     |
------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT              |                    |     1 |    70 |       |   168M(100)|234:06:13 |
|   1 |  NESTED LOOPS                 |                    |     1 |    70 |       |   168M(100)|234:06:13 |
|   2 |   NESTED LOOPS                |                    |     9 |    70 |       |   168M(100)|234:06:13 |
|   3 |    NESTED LOOPS               |                    |     9 |   450 |       |   168M(100)|234:06:13 |
|   4 |     VIEW                      | VW_SQ_1            |   286 | 10010 |       |   168M(100)|234:06:11 |
|   5 |      HASH GROUP BY            |                    |   286 |  7722 |       |   168M(100)|234:06:11 |
|   6 |       MERGE JOIN              |                    |   456G|    11T|       |  9153K(100)| 12:42:50 |
|   7 |        SORT JOIN              |                    |   202K|  2960K|       |   548   (2)| 00:00:03 |
|*  8 |         INDEX RANGE SCAN      | XO_STOCK_TRANS_IX2 |   202K|  2960K|       |   548   (2)| 00:00:03 |
|*  9 |        SORT JOIN              |                    |  4045K|    46M|   154M| 19043   (6)| 00:01:36 |
|* 10 |         INDEX FAST FULL SCAN  | XO_PRICES_IX1      |  4045K|    46M|       |  1936  (10)| 00:00:10 |
|* 11 |     TABLE ACCESS BY USER ROWID| XO_STOCK_TRANS     |     1 |    15 |       |     1   (0)| 00:00:01 |
|* 12 |    INDEX RANGE SCAN           | XO_PRICES_IX1      |     1 |       |       |     2   (0)| 00:00:01 |
|  13 |   TABLE ACCESS BY INDEX ROWID | XO_PRICES          |     1 |    20 |       |     3   (0)| 00:00:01 |
------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   8 - access("TR"."TRANS_DATE">=TO_DATE(' 2014-08-01 00:00:00', 'syyyy-mm-dd hh24:mi:ss') AND
              "TR"."TRANS_DATE"<=TO_DATE(' 2014-08-31 00:00:00', 'syyyy-mm-dd hh24:mi:ss'))
   9 - access(INTERNAL_FUNCTION("PR2"."PRICE_DATE")<=INTERNAL_FUNCTION("TR"."TRANS_DATE"))
       filter(INTERNAL_FUNCTION("PR2"."PRICE_DATE")<=INTERNAL_FUNCTION("TR"."TRANS_DATE"))
  10 - filter("PR2"."PRICE_DATE"<=TO_DATE(' 2014-08-31 00:00:00', 'syyyy-mm-dd hh24:mi:ss'))
  11 - filter("TR"."TRANS_DATE">=TO_DATE(' 2014-08-01 00:00:00', 'syyyy-mm-dd hh24:mi:ss') AND
              "TR"."TRANS_DATE"<=TO_DATE(' 2014-08-31 00:00:00', 'syyyy-mm-dd hh24:mi:ss'))
  12 - access("ITEM_1"="PR"."ITEM" AND "PR"."PRICE_DATE"="MAX(PR2.PRICE_DATE)")
       filter("PR"."ITEM"="TR"."ITEM")

The query was limited to August 2014, which was about 198,000 rows in my table, so we might expect some signs of a brute-force approach (tablescans and hash joins rather than indexes and nested loops) – but what we get ends up with a high-precision approach with a very bad cardinality estimate after a brute-force unnesting of the “max(price_date)” subquery. The unnesting has done a range scan over 200,000 stock_trans rows, and an index fast full scan on 4.5 million prices to do a merge join and hash aggregation to find the maximum price_date for each target row in the xo_stock_trans table. (See my earlier posting on table duplication for a variation and explanation of what Oracle has done here). This step is a lot of work, but the optimizer thinks it’s going to produce only 286 rows in the aggregated result, so the next steps in the plan are indexed nested loops – which actually operate 198,000 times.

With the clue from my initial description, we need to aim for a strategy where Oracle doesn’t unnest that subquery – so let’s experiment with a basic /*+ no_unnest */ hint in the subquery and see what happens. Here’s the resulting execution plan:


--------------------------------------------------------------------------------------------------------
| Id  | Operation                     | Name           | Rows  | Bytes |TempSpc| Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT              |                |   527 | 18445 |       |  6602M  (1)|999:59:59 |
|*  1 |  FILTER                       |                |       |       |       |            |          |
|*  2 |   HASH JOIN                   |                |  3423M|   111G|  5336K| 76973  (90)| 00:06:25 |
|*  3 |    TABLE ACCESS FULL          | XO_STOCK_TRANS |   202K|  2960K|       |  2531  (13)| 00:00:13 |
|   4 |    TABLE ACCESS FULL          | XO_PRICES      |  4571K|    87M|       |  2275  (11)| 00:00:12 |
|   5 |   SORT AGGREGATE              |                |     1 |    12 |       |            |          |
|   6 |    FIRST ROW                  |                |     1 |    12 |       |     3   (0)| 00:00:01 |
|*  7 |     INDEX RANGE SCAN (MIN/MAX)| XO_PRICES_IX1  |     1 |    12 |       |     3   (0)| 00:00:01 |
--------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter("PR"."PRICE_DATE"= (SELECT /*+ NO_UNNEST */ MAX("PR2"."PRICE_DATE") FROM
              "XO_PRICES" "PR2" WHERE "PR2"."PRICE_DATE"<=:B1 AND "PR2"."ITEM"=:B2))
   2 - access("PR"."ITEM"="TR"."ITEM")
   3 - filter("TR"."TRANS_DATE">=TO_DATE(' 2014-08-01 00:00:00', 'syyyy-mm-dd hh24:mi:ss') AND
              "TR"."TRANS_DATE"<=TO_DATE(' 2014-08-31 00:00:00', 'syyyy-mm-dd hh24:mi:ss'))
   7 - access("PR2"."ITEM"=:B1 AND "PR2"."PRICE_DATE"<=:B2)

The subquery now survives, and we can see a min/max range scan in the plan – but the subquery is a filter() subquery and is applied to the result of joining the 200,000 transactions to every price that applies for the item in each transaction. The optimizer thinks that this join will produce roughly 3.4 million rows but in fact with the sample data set (which had many prices per item) the join resulted in 4.4 Billion rows. The min/max subquery is as efficient as it can be, but it’s running far too often; ideally we would like it to run at most once per transaction, so why is it running late ? We could try adding the /*+ push_subq */ hint to the subquery but if we do the plan doesn’t change.

Our rapid “most recent occurrence” revolved around accessing the prices table by index while “pre-querying” for the date using a min/max subquery that knew the relevant item code already. In this case, though, we’re doing a full tablescan of the xo_prices table so the method doesn’t apply. So let’s manipulate the query to force an indexed access path for the join to the xo_prices table by adding the hints /*+ leading(tr pr) use_nl(pr) index(pr) */ to the main body of the query. This is the resulting plan:


--------------------------------------------------------------------------------------------------
| Id  | Operation                       | Name           | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                |                |   527 | 18445 |  6614M  (1)|999:59:59 |
|   1 |  NESTED LOOPS                   |                |  3413K|   113M|    11M  (2)| 16:29:13 |
|   2 |   NESTED LOOPS                  |                |  3413K|   113M|    11M  (2)| 16:29:13 |
|*  3 |    TABLE ACCESS FULL            | XO_STOCK_TRANS |   202K|  2960K|  2531  (13)| 00:00:13 |
|*  4 |    INDEX RANGE SCAN             | XO_PRICES_IX1  |    16 |       |    52   (2)| 00:00:01 |
|   5 |     SORT AGGREGATE              |                |     1 |    12 |            |          |
|   6 |      FIRST ROW                  |                |     1 |    12 |     3   (0)| 00:00:01 |
|*  7 |       INDEX RANGE SCAN (MIN/MAX)| XO_PRICES_IX1  |     1 |    12 |     3   (0)| 00:00:01 |
|   8 |   TABLE ACCESS BY INDEX ROWID   | XO_PRICES      |    17 |   340 |    59   (2)| 00:00:01 |
--------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   3 - filter("TR"."TRANS_DATE">=TO_DATE(' 2014-08-01 00:00:00', 'syyyy-mm-dd
              hh24:mi:ss') AND "TR"."TRANS_DATE"<=TO_DATE(' 2014-08-31 00:00:00', 'syyyy-mm-dd
              hh24:mi:ss'))
   4 - access("PR"."ITEM"="TR"."ITEM")
       filter("PR"."PRICE_DATE"= (SELECT /*+ NO_UNNEST */ MAX("PR2"."PRICE_DATE") FROM
              "XO_PRICES" "PR2" WHERE "PR2"."PRICE_DATE"<=:B1 AND "PR2"."ITEM"=:B2))
   7 - access("PR2"."ITEM"=:B1 AND "PR2"."PRICE_DATE"<=:B2)

We’re nearly there, the shape of the execution plan – lines 4 to 7, at any rate – matches the shape of the very simple example at the start of this article, we seem to be driving from the min/max subquery at line 7; unfortunately when we look at the predicate section of line 4 of the plan we can see that the subquery is still a filter() subquery not an access() subquery – it’s (nominally) being performed for every index entry in the range scan of the xo_prices index that we do for each xo_stock_trans row. What we want to see is an access() subquery – and checking the SQL we can see how to get there: the subquery currently correlates the item back to the xo_prices table, not to the xo_stock_trans table,  so let’s correct that correlation. Here’s our final query (though not formatted to my preference) with execution plan:


select /*+ leading(tr pr) use_nl(pr) index(pr) */  -- hint added
       tr.item, tr.trans_date, tr.quantity
    , pr.gross_price
    , pr.net_price
    , pr.special_price
from xo_stock_trans tr
join xo_prices pr on pr.item = tr.item
                and pr.price_date = (select /*+ no_unnest */  -- hint added
                                         max(pr2.price_date)
                                     from xo_prices pr2
                                     where pr2.item = tr.item  -- correlate to tr, not pr
                                       and pr2.price_date <= tr.trans_date
                                     )
where tr.trans_date between '01-AUG-2014' and '31-AUG-2014'
;

--------------------------------------------------------------------------------------------------
| Id  | Operation                       | Name           | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                |                |  3423M|   111G|  1824K  (1)| 02:32:02 |
|   1 |  NESTED LOOPS                   |                |  3413K|   113M|  1824K  (1)| 02:32:02 |
|   2 |   NESTED LOOPS                  |                |  3413K|   113M|  1824K  (1)| 02:32:02 |
|*  3 |    TABLE ACCESS FULL            | XO_STOCK_TRANS |   202K|  2960K|  2531  (13)| 00:00:13 |
|*  4 |    INDEX RANGE SCAN             | XO_PRICES_IX1  |    16 |       |     2   (0)| 00:00:01 |
|   5 |     SORT AGGREGATE              |                |     1 |    12 |            |          |
|   6 |      FIRST ROW                  |                |     1 |    12 |     3   (0)| 00:00:01 |
|*  7 |       INDEX RANGE SCAN (MIN/MAX)| XO_PRICES_IX1  |     1 |    12 |     3   (0)| 00:00:01 |
|   8 |   TABLE ACCESS BY INDEX ROWID   | XO_PRICES      |    17 |   340 |     9   (0)| 00:00:01 |
--------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   3 - filter("TR"."TRANS_DATE">=TO_DATE(' 2014-08-01 00:00:00', 'syyyy-mm-dd
              hh24:mi:ss') AND "TR"."TRANS_DATE"<=TO_DATE(' 2014-08-31 00:00:00', 'syyyy-mm-dd
              hh24:mi:ss'))
   4 - access("PR"."ITEM"="TR"."ITEM" AND "PR"."PRICE_DATE"= (SELECT /*+ NO_UNNEST */
              MAX("PR2"."PRICE_DATE") FROM "XO_PRICES" "PR2" WHERE "PR2"."PRICE_DATE"<=:B1 AND
              "PR2"."ITEM"=:B2))
   7 - access("PR2"."ITEM"=:B1 AND "PR2"."PRICE_DATE"<=:B2)

Finally we can see (from the predicate for line 4) the we run the subquery at most once for each row from xo_stock_trans and we use the result of each subquery execution to drive the index range scan to pick up the matching rows from xo_prices with no further filtering. The order of operation is: 3, 7, 6, 5, 4, 2, 8, 1, 0

The only thing we can do now is decide whether the strategy for indexing into the xo_prices table 200,000 times (for our 30 day requirement) is better than a brute force approach that does a massive join and sort, or a data duplication approach that puts a “price end date” on each xo_prices row to avoid the need to check all prices for an item to find the appropriate one. Ultimately the choice may depend on trading off the human development resources against the machine run-time resources, with an eye on the number of times the query runs and the size of the date range typically involved.

Footnote:

There’s plenty more I could say about this query and how to handle it – but there are too many questions about the correctness of the data definition and content to make it worth pursuing in detail.  You will note, however, that the various execution plans which logically should be returning the same data report dramatically different cardinalities for the final row source; if nothing else this should warn you that maybe the optimizer is going to have trouble producing a good plan because it’s model produced a bad cardinality estimate at some point in a series of transformations.

In fact, when I first saw this query I converted to traditional Oracle syntax (anticipating, incorrectly, a need to do something messy with hints), corrected the subquery correlation to the “obvious” choice, and put in a cardinality hint /*+ cardinality(tr 100) */ for the xo_stock_trans table, and got the execution plan that I’ve managed to produce as the final plan above.

Tactically the correlation column is the really important bit – if that can be set up suitably we just have to work around the optimizer’s arithmetic assumptions.

 

 

January 6, 2015

Count (*)

Filed under: Oracle,Performance — Jonathan Lewis @ 6:04 pm GMT Jan 6,2015

The old chestnut about comparing speeds of count(*), count(1), count(non_null_column) and count(pk_column) has come up in the OTN database forum (at least) twice in the last couple of months. The standard answer is to point out that they will all execute the same code, and that the corroborating evidence for that claim is that, for a long time, the 10053 trace files have had a rubric reporting: CNT – count(col) to count(*) transformation or, for an even longer time, that the error message file (oraus.msg for the English Language version) has had an error code 10122 which produced (from at least Oracle 8i, if not 7.3):


SQL> execute dbms_output.put_line(sqlerrm(-10122))
ORA-10122: Disable transformation of count(col) to count(*)

But the latest repetition of the question prompted me to check whether a more recent version of Oracle had an even more compelling demonstration, and it does. I extracted the following lines from a 10053 trace file generated by 11.2.0.4 (and I know 10gR2 is similar) in response to selecting count(*), count(1) and count({non-null column}) respectively:


Final query after transformations:******* UNPARSED QUERY IS *******
SELECT COUNT(*) "COUNT(*)" FROM "TEST_USER"."SAVED_ASH" "SAVED_ASH"

Final query after transformations:******* UNPARSED QUERY IS *******
SELECT COUNT(*) "COUNT(1)" FROM "TEST_USER"."SAVED_ASH" "SAVED_ASH"

Final query after transformations:******* UNPARSED QUERY IS *******
SELECT COUNT(*) "COUNT(SAMPLE_ID)" FROM "TEST_USER"."SAVED_ASH" "SAVED_ASH"

As you can see, Oracle has transformed all three select lists into count(*), hiding the transformation behind the original column alias. As an outsider’s proof of what’s going on, I don’t think you could get a more positive indicator than that.

 

January 5, 2015

Re-optimization

Filed under: 12c,Oracle,Performance — Jonathan Lewis @ 12:54 pm GMT Jan 5,2015

The spelling is with a Z rather than an S because it’s an Oracle thing.

Tim Hall has just published a set of notes on Adaptive Query Optimization, so I thought I’d throw in one extra little detail.

When the optimizer decides that a query execution plan involves some guesswork the run-time engine can monitor the execution of the query and collect some information that may allow the optimizer to produce a better execution plan. The interaction between all the re-optimization mechanisms can get very messy, so I’m not going to try to cover all the possibilities – read Tim’s notes for that – but one of the ways in which this type of information can be kept is now visible in a dynamic performance view.


SQL> select hash_value, sql_id, child_number, hint_text from V$sql_reoptimization_hints;

HASH_VALUE SQL_ID        CHILD_NUMBER HINT_TEXT
---------- ------------- ------------ ----------------------------------------------------------------
3680288808 d5k20s7dpth18            0 OPT_ESTIMATE (@"SEL$6E65FD6A" GROUP_BY ROWS=28.000000 )

1 row selected.

SQL> select sql_id, child_number, is_reoptimizable, sql_text from v$sql where sql_id = 'd5k20s7dpth18';

SQL_ID        CHILD_NUMBER I SQL_TEXT
------------- ------------ - --------------------------------------------------
d5k20s7dpth18            0 Y select * from V$OPTIMIZER_PROCESSING_RATE
d5k20s7dpth18            1 N select * from V$OPTIMIZER_PROCESSING_RATE

2 rows selected.

To get the results above I had flushed the shared pool and then (using the SYS schema) executed the indicated query twice. Pulling the execution plans from memory I got (with a little cosmetic cleaning) these results:


SQL> select * from table(dbms_xplan.display_cursor('d5k20s7dpth18',null));

SQL_ID  d5k20s7dpth18, child number 0
-----------------------------------------------------------------------------------------------------
| Id  | Operation            | Name                         | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT     |                              |       |       |     1 (100)|          |
|   1 |  VIEW                | GV$OPTIMIZER_PROCESSING_RATE |    28 |  2072 |     1 (100)| 00:00:01 |
|   2 |   HASH GROUP BY PIVOT|                              |    28 |  1568 |     1 (100)| 00:00:01 |
|*  3 |    FIXED TABLE FULL  | X$OPTIM_CALIB_STATS          |    28 |  1568 |     0   (0)|          |
-----------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   3 - filter("INST_ID"=USERENV('INSTANCE'))

Note
-----
   - dynamic statistics used: dynamic sampling (level=2)

SQL_ID  d5k20s7dpth18, child number 1
-----------------------------------------------------------------------------------------------------
| Id  | Operation            | Name                         | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT     |                              |       |       |     1 (100)|          |
|   1 |  VIEW                | GV$OPTIMIZER_PROCESSING_RATE |    28 |  2072 |     1 (100)| 00:00:01 |
|   2 |   HASH GROUP BY PIVOT|                              |    28 |  1568 |     1 (100)| 00:00:01 |
|*  3 |    FIXED TABLE FULL  | X$OPTIM_CALIB_STATS          |    28 |  1568 |     0   (0)|          |
-----------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   3 - filter("INST_ID"=USERENV('INSTANCE'))

Note
-----
   - statistics feedback used for this statement

The Note section for child zero tells us that we used “dynamic statistics” (and the other_xml column in v$sql_plan still calls it dynamic sampling) as part of the statement optimization; but the Note section for child one tells us that we used “statistics feedback” (and the other_xml column in v$sql_plan still calls it cardinality feedback) – in other words we didn’t do dynamic sampling, we re-used the sample (saved as that in-memory opt_estimate hint) from the previous child.

There’s no great significance in the SQL statement I chose to demonstrate this point – I just happened to be doing some investigation into the Optimizer Processing Rates when I paused to read Tim’s notes, so I based this post on one of the statements I had run a few minutes before.

January 3, 2015

Table Duplication

Filed under: Execution plans,Oracle,subqueries — Jonathan Lewis @ 11:54 am GMT Jan 3,2015

I’ve probably seen a transformation like the following before and I may even have written about it (though if I have I can’t the article), but since it surprised me when I was experimenting with a little problem a few days ago I thought I’d pass it on as an example of how sophisticated the optimizer can be with query transformation.  I’ll be talking about the actual problem that I was working on in a later post so I won’t give you the table and data definitions in this post, I’ll just show some SQL and its plan:


select
        tr.item, tr.trans_date, tr.quantity
    , pr.gross_price
    , pr.net_price
    , pr.special_price
from
        xo_stock_trans tr,
        xo_prices pr
where
        tr.trans_date between '01-AUG-2014' and '3-AUG-2014'
and     pr.item = tr.item
and     pr.price_date = (
                select
                        max(pr2.price_date)
                from
                        xo_prices pr2
                where   pr2.item = tr.item
                and     pr2.price_date <= tr.trans_date
        )
;

The code is a fairly standard expression of “find me the most recent price available for each stock item as at the stock date of that item”. As you can see I’ve referenced the stock table once and the pricing table twice – the second appearance being in a “max()” correlated subquery. Oracle has decided to unnest the subquery – but spot the interesting detail in the plan:


-----------------------------------------------------------------------------------------------------
| Id  | Operation                  | Name           | Rows  | Bytes |TempSpc| Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT           |                |    14M|   748M|       | 77955  (84)| 00:06:30 |
|*  1 |  HASH JOIN                 |                |    14M|   748M|    37M| 77955  (84)| 00:06:30 |
|*  2 |   HASH JOIN                |                |   829K|    28M|       | 70867  (92)| 00:05:55 |
|   3 |    JOIN FILTER CREATE      | :BF0000        | 25274 |   370K|       |  2530  (13)| 00:00:13 |
|*  4 |     TABLE ACCESS FULL      | XO_STOCK_TRANS | 25274 |   370K|       |  2530  (13)| 00:00:13 |
|   5 |    VIEW                    | VW_SQ_1        |   210M|  4206M|       | 64135  (94)| 00:05:21 |
|   6 |     HASH GROUP BY          |                |   210M|  5408M|       | 64135  (94)| 00:05:21 |
|   7 |      JOIN FILTER USE       | :BF0000        |   210M|  5408M|       | 11807  (67)| 00:01:00 |
|*  8 |       HASH JOIN            |                |   210M|  5408M|       | 11807  (67)| 00:01:00 |
|*  9 |        TABLE ACCESS FULL   | XO_STOCK_TRANS | 25274 |   370K|       |  2530  (13)| 00:00:13 |
|* 10 |        INDEX FAST FULL SCAN| XO_PRICES_IX1  |  3918K|    44M|       |  1936  (10)| 00:00:10 |
|  11 |   TABLE ACCESS FULL        | XO_PRICES      |  4571K|    87M|       |  2275  (11)| 00:00:12 |
-----------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - access("PR"."ITEM"="TR"."ITEM" AND "PR"."PRICE_DATE"="MAX(PR2.PRICE_DATE)")
   2 - access("ITEM_1"=ROWID)
   4 - filter("TR"."TRANS_DATE">=TO_DATE(' 2014-08-01 00:00:00', 'syyyy-mm-dd hh24:mi:ss')
              AND "TR"."TRANS_DATE"<=TO_DATE(' 2014-08-03 00:00:00', 'syyyy-mm-dd hh24:mi:ss'))
   8 - access("PR2"."ITEM"="TR"."ITEM")
       filter("PR2"."PRICE_DATE"<="TR"."TRANS_DATE")
   9 - filter("TR"."TRANS_DATE">=TO_DATE(' 2014-08-01 00:00:00', 'syyyy-mm-dd hh24:mi:ss')
              AND "TR"."TRANS_DATE"<=TO_DATE(' 2014-08-03 00:00:00', 'syyyy-mm-dd hh24:mi:ss'))
  10 - filter("PR2"."PRICE_DATE"<=TO_DATE(' 2014-08-03 00:00:00', 'syyyy-mm-dd hh24:mi:ss'))

I was running 11.2.0.4 at the time, which is why we can get a serial Bloom filter on the hash join – and it’s interesting to see that the filter has been pushed inside the view operator; but the really interesting part of the plan is the second appearance of the XO_STOCK_TRANS table.

My correlated subquery returns a value that is used in a comparison with a column in the XO_PRICES table, but the correlation predicates referred back to the XO_STOCK_TRANS table, so the optimizer has added the XO_STOCK_TRANS to the subquery as it unnested it.

I’ve written several examples of how we can optimise SQL manually by rewriting it to introduce extra copies of some of the tables (typically in a fashion analogous to the optimizer’s mechanism for star transformations), so it’s nice to see another variation on the theme of the optimizer using table duplication to optimise a statement.

Footnote:

The execution plan in 10.2.0.5 is slightly different, but it still unnests the subquery, introducing a second occurrence of XO_STOCK_TRANS as it does so.

Next Page »

The Rubric Theme. Blog at WordPress.com.

Follow

Get every new post delivered to your Inbox.

Join 4,750 other followers