Oracle Scratchpad

September 13, 2013

Quiz Night

Filed under: Exadata,Execution plans,Oracle,Parallel Execution — Jonathan Lewis @ 6:32 pm BST Sep 13,2013

Here’s a little quiz about Bloom filtering. There seem to be at least three different classes of query where Bloom filters can come into play – all involving hash joins: partition elimination, aggregate reduction on non-mergeable aggregate views, and parallelism.

This quiz is about parallel queries – and all you have to do is work out how many Bloom filters were used in the following two execution plans (produced by 11.2.0.2), and where they were used.

(more…)

September 9, 2013

Parallel Execution

Filed under: Oracle,Parallel Execution,Performance — Jonathan Lewis @ 6:51 am BST Sep 9,2013

While checking out potential scalability threats recently on a client system, I was directed to a time-critical task that was currently executing the same PL/SQL procedure 16 times (with different parameters) between 6:00 and 7:00 am; as the system went through its next phase of expansion the number of executions of this procedure was likely to grow. An interesting detail, though, was that nothing else was going on while the task was running so the machine (which had 6 cores) was running at 16% CPU.

An obvious strategy for handling the required growth target was to make sure that four (possibly 5) copies of the procedure were allowed to run concurrently. Fortunately the different executions were completely independent of each other and didn’t interfere with each other’s data, so the solution simply required a mechanism to control the parallelism. Conveniently 11gR2 gave us one.
(more…)

July 25, 2013

Parallel to Serial

Filed under: Execution plans,Oracle,Parallel Execution — Jonathan Lewis @ 5:21 pm BST Jul 25,2013

Here’s a little problem that came up on the Oracle-L listserver today:

I’m trying to write a query which reads the corresponding partition of the fact, extracts the list of join keys, materialises this result set, and finally joins the necessary dimensions. The key thing I’m trying to do is to run the initial query on the fact in parallel and then the rest of the query serially.

The full requirement, if you follow the link, may seem a little puzzling but there’s no point in second-guessing every question that people ask – there’s usually a reason for doing something in a particular way – so I just rattled off the first thing that came to mind, which was this:  when you include rownum in a parallel query Oracle has to serialise to generate the rownum – so create an inline view which does the parallel but adds a rownum to the select list, then join to the inline view. The plan should include a VIEW operator holding the parallel bit, and then you can hint as necessary to make the subsequent activity serial.

(more…)

July 11, 2013

12c Top N (px)

Filed under: 12c,Oracle,Parallel Execution — Jonathan Lewis @ 9:04 am BST Jul 11,2013

A comment from Greg Rahn in response to my posting yesterday prompted me to do a quick follow-up (test time ca. 3 minutes, write-up time, ca. 50 minutes – thanks for the temptation, Greg ;). Greg asked if the “Top N” would push down for a parallel query, so all I had to do was re-run my script with a parallel hint in place.  (Such is the joy of constructing test cases – when you get a new idea you may not need to do much work to test it.)

(more…)

July 3, 2013

maxthr – 3

Filed under: CBO,Oracle,Parallel Execution,Statistics,System Stats — Jonathan Lewis @ 6:29 pm BST Jul 3,2013

In part 1 of this mini-series we looked at the effects of costing a tablescan serially and then parallel when the maxthr and slavethr statistics had not been set.

In part 2 we looked at the effect of setting just the maxthr – and this can happen if you don’t happen to do any parallel execution while the stats collection is going on.

In part 3 we’re going to look at the two variations the optimizer displays when both statistics have been set. So here are the starting system stats:

begin
	dbms_stats.delete_system_stats;
	dbms_stats.set_system_stats('MBRC',        64);
	dbms_stats.set_system_stats('MREADTIM',    10);
	dbms_stats.set_system_stats('SREADTIM',     5);
	dbms_stats.set_system_stats('CPUSPEED',  2000);
	dbms_stats.set_system_stats('MAXTHR',  262144);
	dbms_stats.set_system_stats('SLAVETHR', 65536);
	dbms_stats.set_system_stats('SLAVETHR', 47000);
	dbms_stats.set_system_stats('SLAVETHR', 16384);
end;
/

You’ll notice that I’ve shown three options for slavethr so, when running the tests, I will be commenting out two of them. The middle value is the important one as I’ve set it just below a critical breakpoint. You’ll recall that the optimizer is programmed to behave as if a parallel slave will operate at 90% of the speed of a serial process. If we take the 64 block read, at 8KB per block, completed in 10 ms, this represents 52428.8 bytes per ms. 90% of that is 47,186 bytes per ms – hence the choice for slavethr in the second of the tests.

Remember that a serial tablescan of my data had an I/O cost of 1,251 (or 1,250 is you ignore the “tablescan cost plus 1″ effect) and that we could investigate the parallel costs by reference to the original serial cost compared to the degree of parallelism. We’re going to do that again, but in this case I’m going to run my tablescan just once (at parallel degree 5) for each of the three values of slavethr (lowest to highest) in turn.

Here are the resulting execution plans:

slavethr=16384
----------------------------------------------------------------------------------------------------------------
| Id  | Operation              | Name     | Rows  | Bytes | Cost (%CPU)| Time     |    TQ  |IN-OUT| PQ Distrib |
----------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT       |          |     1 |     5 |   800   (0)| 00:00:05 |        |      |            |
|   1 |  SORT AGGREGATE        |          |     1 |     5 |            |          |        |      |            |
|   2 |   PX COORDINATOR       |          |       |       |            |          |        |      |            |
|   3 |    PX SEND QC (RANDOM) | :TQ10000 |     1 |     5 |            |          |  Q1,00 | P->S | QC (RAND)  |
|   4 |     SORT AGGREGATE     |          |     1 |     5 |            |          |  Q1,00 | PCWP |            |
|   5 |      PX BLOCK ITERATOR |          | 40000 |   195K|   800   (0)| 00:00:05 |  Q1,00 | PCWC |            |
|   6 |       TABLE ACCESS FULL| T1       | 40000 |   195K|   800   (0)| 00:00:05 |  Q1,00 | PCWP |            |
----------------------------------------------------------------------------------------------------------------

   IO_COST   CPU_COST       COST
---------- ---------- ----------
       800    1333333        800

slavethr=47000
----------------------------------------------------------------------------------------------------------------
| Id  | Operation              | Name     | Rows  | Bytes | Cost (%CPU)| Time     |    TQ  |IN-OUT| PQ Distrib |
----------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT       |          |     1 |     5 |   279   (0)| 00:00:02 |        |      |            |
|   1 |  SORT AGGREGATE        |          |     1 |     5 |            |          |        |      |            |
|   2 |   PX COORDINATOR       |          |       |       |            |          |        |      |            |
|   3 |    PX SEND QC (RANDOM) | :TQ10000 |     1 |     5 |            |          |  Q1,00 | P->S | QC (RAND)  |
|   4 |     SORT AGGREGATE     |          |     1 |     5 |            |          |  Q1,00 | PCWP |            |
|   5 |      PX BLOCK ITERATOR |          | 40000 |   195K|   279   (0)| 00:00:02 |  Q1,00 | PCWC |            |
|   6 |       TABLE ACCESS FULL| T1       | 40000 |   195K|   279   (0)| 00:00:02 |  Q1,00 | PCWP |            |
----------------------------------------------------------------------------------------------------------------

   IO_COST   CPU_COST       COST
---------- ---------- ----------
       279    1333333        279

slavethr=65536
----------------------------------------------------------------------------------------------------------------
| Id  | Operation              | Name     | Rows  | Bytes | Cost (%CPU)| Time     |    TQ  |IN-OUT| PQ Distrib |
----------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT       |          |     1 |     5 |   278   (0)| 00:00:02 |        |      |            |
|   1 |  SORT AGGREGATE        |          |     1 |     5 |            |          |        |      |            |
|   2 |   PX COORDINATOR       |          |       |       |            |          |        |      |            |
|   3 |    PX SEND QC (RANDOM) | :TQ10000 |     1 |     5 |            |          |  Q1,00 | P->S | QC (RAND)  |
|   4 |     SORT AGGREGATE     |          |     1 |     5 |            |          |  Q1,00 | PCWP |            |
|   5 |      PX BLOCK ITERATOR |          | 40000 |   195K|   278   (0)| 00:00:02 |  Q1,00 | PCWC |            |
|   6 |       TABLE ACCESS FULL| T1       | 40000 |   195K|   278   (0)| 00:00:02 |  Q1,00 | PCWP |            |
----------------------------------------------------------------------------------------------------------------

13 rows selected.

   IO_COST   CPU_COST       COST
---------- ---------- ----------
       278    1333333        278

As a starting point, we can say that the modified cost is always going to be: 1250 * serial throughput rate / parallel throughput rate where, in this test suite, the serial throughput rate in bytes per ms is 64 * 8K / 10 = 52428.8

Working from the top down:
When slavethr = 16384 the aggregate throughput rate is 5 * 16384 = 81920, so the I/O cost should be 1250 * 52428.8/81920 = 800 (Q.E.D)

When slavethr = 41000 the aggregate throughput rate is 5 * 47000 = 235,000 so the I/O cost should be 1250 * 52428.8/205000 = 279 (Q.E.D) You’ll notice that this is very close to the figure I had from the first test when I didn’t have maxthr or slavethr set and the optimizer used its “90% of serial” trick.

When slavethr = 65536, something odd has happened – instead of a significant change in I/O cost, the result actually matches the figure we got slavethr wasn’t set. The rule is simple – if slavethr is larger than the throughput implied by mbrc (etc.) the optimizer ignores it and falls back to the “90% of serial” model.

Reminder.

I’ve been showing you how Oracle does the arithmetic with the statistics it has. It’s very important to remember that this is just arithmetic – it’s Oracle trying to work out the best (likely) execution plan given some assumptions about what ought to be the limiting factors when the query runs. In effect the arithmetic can have the effect of saying: “if we assume (based on the statistics) that we can’t do better than parallel 6 then the best plan is P” – but if the hint actually says /*+ parallel(t1 42) */ then at run time Oracle will take the plan that’s appropriate for running parallel 6 and try to run it at parallel 42 – and that may be a big mistake.

Warning: The manuals say that maxthr and slavethr are stored as bytes per second; it seems that they’re really bytes per millisecond in (at least) 10g and 11g, but change to bytes per second in 12c. If you upgrade to 12c, make sure you check your system statistics before and after the upgrade to make sure that you have allowed for this change otherwise you may find that Oracle becomes very unenthusiastic about running parallel queries.

June 27, 2013

maxthr – 2

Filed under: CBO,Oracle,Parallel Execution,Statistics,System Stats — Jonathan Lewis @ 5:08 pm BST Jun 27,2013

Actually, there hasn’t been a “maxthr – 1″, I called the first part of this series“System Stats”. If you look back at it you’ll see that I set up some system statistics, excluding the maxthr and slavethr values, and described how the optimizer would calculate the cost of a serial tablescan, then I followed this up with a brief description of how the calculations changed if I hinted the optimizer into a parallel tablescan.

(more…)

June 25, 2013

System Stats

Filed under: CBO,Oracle,Parallel Execution,Statistics,System Stats — Jonathan Lewis @ 5:27 pm BST Jun 25,2013

Several years ago I wrote the following in “Cost Based Oracle – Fundamentals” (p.47):

The maxthr and slavethr figures relate to throughput for parallel execution slaves. I believe that the figures somehow control the maximum degree of parallelism that any given query may operate at by recording the maximum rate at which slaves have historically been able to operate—but I have not been able to verify this.

Browsing the internet recently, I discovered that that no-one else seems to have published anything to verify my comment, so I decided it was about time I did so myself.  I’m going to work up to it in two blog notes , so if you do happen to know of any document that describes the impact of maxthr and slavethr on the optimizer’s costing algorithms please give me a reference in the comments – that way I might not have to write the second note.

(more…)

June 6, 2013

Parallel DML

Filed under: Execution plans,Oracle,Parallel Execution — Jonathan Lewis @ 8:06 am BST Jun 6,2013

Parallel DML is not enabled by default, and it’s easy to forget this and think you’ve made an operation “as parallel as possible” by declaring objects parallel or putting in hints for parallelism.

A recent question on OTN asked about speeding up a  materialized view refresh that seemed to be spending a lot of its time waiting on “PX Deq Credit: send blkd”. The manuals describe this as an “idle event”; but that’s not always true. The OP had supplied the output from tkprof for one of the sessions showing the “insert as select” that was the (complete) refresh and it was clear that the select was running in parallel, but the insert wasn’t – and that’s one case in which the “PX Deq Credit: send blkd” is arguably an “idle” wait (with a timeout of 2 seconds). It’s possible that the refresh could go faster if the OP enabled parallel DML.

(more…)

June 14, 2012

PX and system allocation

Filed under: fragmentation,Infrastructure,Oracle,Parallel Execution — Jonathan Lewis @ 5:00 pm BST Jun 14,2012

A few years ago (2007) I wrote about a problem that could appear when you mixed parallel execution with system managed extent allocation. A couple of years later I added a note that Christian Antognini had observed a patch in 11.1.0.7 that addressed the specific issue I had raised. Today, thanks to an email exchange with Christo Kutrovsky of Pythian, I can report that there is a variation of this issue still available even in 11.2.0.3.

The basic problem is that you can end up with a very large number of very small extents, leading to poor performance in parallel queries and a significant waste of space in a data segment. Here’s a simple, though not particularly realistic, way to demonstrate the problem.
(more…)

June 30, 2011

Virtual bug

Filed under: Bugs,Function based indexes,Indexing,Oracle,Parallel Execution,Statistics,Troubleshooting — Jonathan Lewis @ 5:37 pm BST Jun 30,2011

I’ve said in the past that one of the best new features, in my view, in 11g was the appearance of proper virtual columns; and I’ve also been very keen on the new “approximate NDV” that makes it viable to collect stats with the “auto_sample_size”.

Who’d have guessed that if you put them both together, then ran a parallel stats collection it would break :(

The bug number Karen quotes (10013177.8) doesn’t (appear to) mention extended stats – but since virtual columns, function-based indexes, and extended stats share a number of implementation details I’d guess that they might be affected as well.

January 3, 2010

Pseudo-parallel

Filed under: Infrastructure,Parallel Execution — Jonathan Lewis @ 11:32 am BST Jan 3,2010

There’s a posting on the OTN database forum asking for suggestions about restricting a query to data from one datafile of a tablespace.  It seems to be a slightly odd thing to do, but it reminded me of a “pseudo-parallel” trick I once used (and described) more than 10 years ago on a system running Oracle version 7. It was a long time ago, but I think I needed to do this because I wanted to execute a massive update with as many parallel threads as possible in the days when parallel DML wasn’t possible.

(more…)

January 19, 2009

Space Checks

Filed under: Infrastructure,Parallel Execution,Troubleshooting — Jonathan Lewis @ 7:20 pm BST Jan 19,2009

At least twice in the last few weeks, there’s been a question on OTN about a mismatch between dba_extents and dba_segments  after rebuilding some indexes.

Here’s one of the threads, where Randolf Geist supplies the explanation for the anomaly.

December 13, 2008

Direct Path

Filed under: Infrastructure,Parallel Execution,Performance — Jonathan Lewis @ 10:31 am BST Dec 13,2008

Doug Burn’s sent me an email a couple of days ago about a question that had appeared on his blog in response to a comment I made several months ago.

I’ve followed up with an answer, of course, but reading the blog reminded me that some interesting stuff had come up about direct path reads and writes, as well as some comments on parallel query. So I thought I’d publish a link to the item so you could see the whole discussion.

November 5, 2008

PX Buffer

Filed under: Execution plans,Parallel Execution,Performance,trace files,Troubleshooting — Jonathan Lewis @ 7:11 pm BST Nov 5,2008

Here’s a surprising anomaly that showed up in a question on the OTN forums a little while ago. Consider a simple query that uses a hash join between two tables.

select
	/*+
		ordered
		use_hash(t2)
		parallel(t1 2)
		parallel(t2 2)
		pq_distribute(t2 hash hash)
	*/
	t1.padding,
	t2.padding
from 	t1, t2
where	t2.n1 = t1.n1
and	t2.small_vc = t1.small_vc
;

When it runs serially the join completes in memory and the only I/O you see comes from the two tablescans. When the query runs parallel something causes a spill to the temporary tablespace.
(more…)

June 25, 2007

qb_name

Filed under: Execution plans,Hints,Parallel Execution,Tuning — Jonathan Lewis @ 8:15 pm BST Jun 25,2007

When Oracle tries to optimize a complex SQL statement one of the first steps it takes is to transform it into something simpler. The ideal target (from the optimizer’s perspective) is to eliminate any subqueries and in-line views so that your SQL statement ends up as a list of tables in a single from clause. Every in-line view and subquery is a separate query block, and the optimizer really wants to turn the whole thing into a single query block. Unfortunately this isn’t always possible, and sometimes it isn’t even a good idea – which is why 10g introduced ‘cost base query transformation’.

(more…)

« Previous PageNext Page »

The Rubric Theme. Blog at WordPress.com.

Follow

Get every new post delivered to your Inbox.

Join 5,218 other followers