Oracle Scratchpad

September 7, 2009

Analytic Agony

Filed under: Infrastructure,Oracle,Performance,sorting,trace files,Troubleshooting — Jonathan Lewis @ 5:30 pm BST Sep 7,2009

When I wrote Practical Oracle 8i, version 8.1.5 was the most recent version of Oracle but version 8.1.6  came out just before I finished writing – and the only thing in 8.1.6 I thought important enough to add to the book was a section on Analytic Functions because they were the best new coding feature in the product.

Since then I’ve always warned people to be a little careful about how they use analytic functions because of the amount of sorting they can introduce. My suggestion has always been to crunch “large” volumes of data down to “small” volumes of data before applying any analytic functions to add “intelligence” to the intermediate result.

This advice was based on my awareness of how much sorting analytic functions can introduce  – but I’ve only just discovered how badly Oracle handles the sort operation if the data spills to disc.

Here’s a simple example to demonstrate the problem. This is a test case I’ve run on 10g (10.2.0.3) and 11g (11.1.0.6) with workarea_size_policy set to auto and the pga_aggregate_target set to 128MB (which means an individual workarea operation can acquire 25.6MB of memory).

First we set the environment – I’ve disabled system statistics (a.k.a. CPU costing) for the purposes of the demonstration just to get a stable environment:

rem
rem     Script:         analytic_problem.sql
rem     Author:         Jonathan Lewis
rem     Dated:          Sept 2009
rem

begin
	begin		execute immediate 'begin dbms_stats.delete_system_stats; end;';
	exception	when others then null;
	end;

	begin		execute immediate 'alter session set "_optimizer_cost_model"=io';
	exception	when others then null;
	end;
end;
/

Then I create a simple data set with a little help from the random number generator – which I’ve seeded so that every run on every version of Oracle gets the same random data set. I create 1,000,000 rows and collect basic statistics on the table.

execute dbms_random.seed(0)

create table t1
nologging
as
with generator as (
	select	--+ materialize
		rownum id
	from dual
	connect by
		rownum <= 1000
)
select
	rownum					id,
	trunc(dbms_random.value(1,1000))	n1,
	trunc(dbms_random.value(1,1000))	n2,
	lpad(rownum,30,'0')			small_vc
from
	generator	v1,
	generator	v2
where
	rownum <= 1000000
;

begin
	dbms_stats.gather_table_stats(
		ownname		 => user,
		tabname		 =>'T1',
		method_opt 	 => 'for all columns size 1'
	);
end;
/

Then I’ve turned on some tracing and enabled autotrace. Apart from the extended sql_trace (event 10046 at level 8 ) I’ve also enabled a trace on sort statistics (event 10032) and sort I/O operations (event 10033).

The call to autotrace is there so that I can run the query, pulling the data into the SQL*Plus session and throwing it away as cheaply as possible. I’ve also set the arraysize to 1,000 to make sure that each SQL*Net roundtrip pulls a lot of data from the database in case the data fetch size affects the I/O pattern.

alter session set events '10032 trace name context forever';
alter session set events '10033 trace name context forever';
alter session set events '10046 trace name context forever, level 8';

set arraysize 1000
set autotrace traceonly statistics

Finally, here’s the query. It’s a very simple example of using an analytic function.

The inline view partitions the data by n1, sorting each partition by n2, picking up the id value in the last row of the partition. By wrapping this query inside a view we can add a predicate to identify just those rows where the value of the id is the same as the last id of the partition – and then report the n1, n2, and small_vc values from those rows.

To execute this query Oracle has to sort the data into (n1, n2) order then walk through it to find the value for id for the last row in each n1 partition in turn. So we are going to need quite a lot of memory – and we’re not going to have enough because of my settings for the pga_aggregate_target.

select
	n1,
	n2,
	small_vc
from
	(
	select
		n1,
		n2,
		id,
		small_vc,
		last_value(id) over (
			partition by n1
			order by n2
		)	last_id
	from
		t1
	)
where
	id = last_id
;

Here’s the history of how Oracle handles this query – as reported by the 10032 and 10033 traces.
The first step is to sort 1 million rows of (n1, n2, id, small_vc) which – with sort-related overheads – totals about 50MB of data. Since the session is limited to 25.6MB for a single, serial, sort operation the sort spills to disc and the 10033 trace file shows the following runs of data being written (for more information about sorting and Oracle’s sort traces, see Cost Based Oracle – Fundamentals, chapter 13):


Recording run at 400a89 for 2264 blocks
Recording run at 401ce2 for 419 blocks
Recording run at 401b85 for 531 blocks
Recording run at 401898 for 701 blocks
Recording run at 401655 for 839 blocks
Recording run at 40129c for 1023 blocks
Recording run at 400e9b for 637 blocks

The first run is written as the workarea hits the limit, at which point Oracle falls back from attempting an “optimal” operation to a “one-pass” operation – which is why the session releases some PGA memory and the sizes of the subsequent runs are smaller (the final 637 block run corresponded to a workarea size of about 8MB).

At this point a small disaster happens. Having completed the first phase of the sort Oracle releases more memory before starting the merge phase, and at this point we see the following report from the 10032 trace:


---- Sort Parameters ------------------------------
sort_area_size                    3555328
sort_area_retained_size           3555328
sort_multiblock_read_count        31
max intermediate merge width      5

Oracle has reduced the workarea to 3.5MB – which it believes is enough to merge five runs simultaneously using multiblock reads (of the temporary tablespace) of 31 blocks.  Given that the 10033 trace has just reported that it has written seven runs it is a pity that Oracle hasn’t decided to use smaller reads to be able to read all seven runs simultaneously – our sort is (apparently) going to turn into a multipass (specifically two-pass) workarea operation.

But the actual behaviour is worse than the promise – look what the 10033 trace reports over the next couple of minutes:


Merging run at 401ce2 for 419 blocks
Merging run at 401b85 for 531 blocks
Merging run at 400e9b for 637 blocks
Merging run at 401898 for 701 blocks
Total number of blocks to read: 2288 blocks

Recording run at 400c18 for 2286 blocks

Merging run at 401655 for 839 blocks
Merging run at 40129c for 1023 blocks
Total number of blocks to read: 1862 blocks

Recording run at 400cc6 for 1861 blocks

Merging run at 400cc6 for 1861 blocks
Merging run at 400a89 for 2264 blocks
Total number of blocks to read: 4125 blocks

Recording run at 400f0b for 4125 blocks

Merging run at 400c18 for 2286 blocks
Merging run at 400f0b for 4125 blocks
Total number of blocks to read: 6411 blocks

Recording run at 401cc8 for 6411 blocks

Merging run at 401cc8 for 6411 blocks
Total number of blocks to read: 6411 blocks

On its first pass Oracle didn’t even manage to handle the predicted five runs, it only managed to merge four of them and then write out a single run of 2288 blocks. Then it merged two more runs and wrote out a single run of 1,862 blocks. Then it merged another two runs and wrote out 4,125 blocks. Then it got to the point where it merged the last two remaining runs to get the entire result set of 6,411 blocks sorted – and wrote out the whole sorted set. Finally it started to read back the 6,411 blocks to generate the last_value() results.

So it’s done a four-pass workarea operation – when it should have been possible to do a one-pass operation; and it seems to have written the entire result set to disk after the final merge when it really shouldn’t have needed to. (In earlier versions of Oracle an extra write of the whole sorted data set used to take place when Oracle spilled to disc during an index build – I don’t remember when the redundant write was eliminated, but it seems to be back for analytic functions).

So the analytic sort is pretty unhealthy so far – but there’s still scope for things to get worse. Here are some results from the 10046 trace showing what’s going on with the “direct path reads” as Oracle reads back data from the temporary segment in the merge passes. Remember how the 10032 trace said that the sort_multiblock_read was going to be 31 blocks – look what really happened – this sample appears about halfway through the merging phase:

WAIT #33: nam='direct path read temp' ela= 250 file number=201 first dba=4842 block cnt=1 obj#=101556 tim=761555607096
WAIT #33: nam='direct path read temp' ela= 473 file number=201 first dba=4843 block cnt=7 obj#=101556 tim=761555607915
WAIT #33: nam='direct path read temp' ela= 217 file number=201 first dba=5529 block cnt=1 obj#=101556 tim=761555608880
WAIT #33: nam='direct path read temp' ela= 452 file number=201 first dba=5530 block cnt=7 obj#=101556 tim=761555609602
WAIT #33: nam='direct path read temp' ela= 239 file number=201 first dba=4850 block cnt=1 obj#=101556 tim=761555610227
WAIT #33: nam='direct path read temp' ela= 480 file number=201 first dba=4851 block cnt=7 obj#=101556 tim=761555610954
WAIT #33: nam='direct path read temp' ela= 1122 file number=201 first dba=5537 block cnt=1 obj#=101556 tim=761555613237
WAIT #33: nam='direct path read temp' ela= 1500 file number=201 first dba=4858 block cnt=1 obj#=101556 tim=761555614889
WAIT #33: nam='direct path read temp' ela= 426 file number=201 first dba=5538 block cnt=7 obj#=101556 tim=761555615530
WAIT #33: nam='direct path read temp' ela= 557 file number=201 first dba=4859 block cnt=7 obj#=101556 tim=761555616151
WAIT #33: nam='direct path read temp' ela= 228 file number=201 first dba=4866 block cnt=1 obj#=101556 tim=761555617379
WAIT #33: nam='direct path read temp' ela= 29213 file number=201 first dba=5545 block cnt=1 obj#=101556 tim=761555646867
WAIT #33: nam='direct path read temp' ela= 422 file number=201 first dba=5546 block cnt=7 obj#=101556 tim=761555647595

Notice how we are probably merging two runs at this pont. By stepping through the list of “first dba” values and adding the “block cnt”, you can see (picking out just one of the two runs) the following pattern:

    • 4842 + 1 = 4843
    • 4843 + 7 = 4850  (then skip 2 lines)
    • 4850 + 1 = 4851
    • 4851 + 7 = 4858  (then skip 1 line)
    • 4858 + 1 = 4859  (then skip 1 line)
    • 4859 + 7 = 4866
    4855 + 1 = 4867

We’re reading great long ranges of adjacent blocks – in little pieces.

And it gets still worse. By the time we get to the end of the merge operation and start reading back the final (and redundantly written) stream, look what the read pattern shows – the examples come from two separate parts of the final read back, one near the start and one near the end:

WAIT #33: nam='direct path read temp' ela= 325 file number=201 first dba=7368 block cnt=1 obj#=101556 tim=761573681919
WAIT #33: nam='direct path read temp' ela= 327 file number=201 first dba=4937 block cnt=1 obj#=101556 tim=761573685178
WAIT #33: nam='direct path read temp' ela= 494 file number=201 first dba=7368 block cnt=1 obj#=101556 tim=761573685888
WAIT #33: nam='direct path read temp' ela= 264 file number=201 first dba=4938 block cnt=1 obj#=101556 tim=761573687691
WAIT #33: nam='direct path read temp' ela= 272 file number=201 first dba=7368 block cnt=1 obj#=101556 tim=761573688149
WAIT #33: nam='direct path read temp' ela= 260 file number=201 first dba=4939 block cnt=1 obj#=101556 tim=761573689642
WAIT #33: nam='direct path read temp' ela= 245 file number=201 first dba=7368 block cnt=1 obj#=101556 tim=761573690062
WAIT #33: nam='direct path read temp' ela= 245 file number=201 first dba=4940 block cnt=1 obj#=101556 tim=761573691804
WAIT #33: nam='direct path read temp' ela= 254 file number=201 first dba=7368 block cnt=1 obj#=101556 tim=761573692211
WAIT #33: nam='direct path read temp' ela= 270 file number=201 first dba=4941 block cnt=1 obj#=101556 tim=761573693620
WAIT #33: nam='direct path read temp' ela= 273 file number=201 first dba=7368 block cnt=1 obj#=101556 tim=761573694030
WAIT #33: nam='direct path read temp' ela= 312 file number=201 first dba=4942 block cnt=1 obj#=101556 tim=761573695493
WAIT #33: nam='direct path read temp' ela= 295 file number=201 first dba=7368 block cnt=1 obj#=101556 tim=761573696101
WAIT #33: nam='direct path read temp' ela= 671 file number=201 first dba=4943 block cnt=1 obj#=101556 tim=761573697864
WAIT #33: nam='direct path read temp' ela= 288 file number=201 first dba=4944 block cnt=1 obj#=101556 tim=761573699601
WAIT #33: nam='direct path read temp' ela= 395 file number=201 first dba=4945 block cnt=1 obj#=101556 tim=761573702930
WAIT #33: nam='direct path read temp' ela= 289 file number=201 first dba=4946 block cnt=1 obj#=101556 tim=761573711065
WAIT #33: nam='direct path read temp' ela= 270 file number=201 first dba=4942 block cnt=1 obj#=101556 tim=761573711462
WAIT #33: nam='direct path read temp' ela= 271 file number=201 first dba=4947 block cnt=1 obj#=101556 tim=761573712867
WAIT #33: nam='direct path read temp' ela= 274 file number=201 first dba=4942 block cnt=1 obj#=101556 tim=761573713269
WAIT #33: nam='direct path read temp' ela= 287 file number=201 first dba=4948 block cnt=1 obj#=101556 tim=761573714687
WAIT #33: nam='direct path read temp' ela= 238 file number=201 first dba=4942 block cnt=1 obj#=101556 tim=761573715052
WAIT #33: nam='direct path read temp' ela= 335 file number=201 first dba=4949 block cnt=1 obj#=101556 tim=761573716733
WAIT #33: nam='direct path read temp' ela= 235 file number=201 first dba=4950 block cnt=1 obj#=101556 tim=761573718257
WAIT #33: nam='direct path read temp' ela= 522 file number=201 first dba=4951 block cnt=1 obj#=101556 tim=761573719886
WAIT #33: nam='direct path read temp' ela= 270 file number=201 first dba=4952 block cnt=1 obj#=101556 tim=761573721269
WAIT #33: nam='direct path read temp' ela= 322 file number=201 first dba=4953 block cnt=1 obj#=101556 tim=761573722709
WAIT #33: nam='direct path read temp' ela= 307 file number=201 first dba=4948 block cnt=1 obj#=101556 tim=761573723237

...

WAIT #33: nam='direct path read temp' ela= 223 file number=201 first dba=2793 block cnt=1 obj#=101556 tim=761574694654
WAIT #33: nam='direct path read temp' ela= 226 file number=201 first dba=2794 block cnt=1 obj#=101556 tim=761574695481
WAIT #33: nam='direct path read temp' ela= 203 file number=201 first dba=2795 block cnt=1 obj#=101556 tim=761574696263
WAIT #33: nam='direct path read temp' ela= 207 file number=201 first dba=2796 block cnt=1 obj#=101556 tim=761574697067
WAIT #33: nam='direct path read temp' ela= 225 file number=201 first dba=2797 block cnt=1 obj#=101556 tim=761574697880
WAIT #33: nam='direct path read temp' ela= 239 file number=201 first dba=2799 block cnt=1 obj#=101556 tim=761574703204
WAIT #33: nam='direct path read temp' ela= 222 file number=201 first dba=2800 block cnt=1 obj#=101556 tim=761574704035
WAIT #33: nam='direct path read temp' ela= 223 file number=201 first dba=2801 block cnt=1 obj#=101556 tim=761574704855
WAIT #33: nam='direct path read temp' ela= 225 file number=201 first dba=2802 block cnt=1 obj#=101556 tim=761574705674
WAIT #33: nam='direct path read temp' ela= 210 file number=201 first dba=2803 block cnt=1 obj#=101556 tim=761574706447
WAIT #33: nam='direct path read temp' ela= 226 file number=201 first dba=2804 block cnt=1 obj#=101556 tim=761574707252
WAIT #33: nam='direct path read temp' ela= 211 file number=201 first dba=2805 block cnt=1 obj#=101556 tim=761574708047
WAIT #33: nam='direct path read temp' ela= 213 file number=201 first dba=2806 block cnt=1 obj#=101556 tim=761574708852
WAIT #33: nam='direct path read temp' ela= 225 file number=201 first dba=2807 block cnt=1 obj#=101556 tim=761574709663
WAIT #33: nam='direct path read temp' ela= 232 file number=201 first dba=2808 block cnt=1 obj#=101556 tim=761574710748
WAIT #33: nam='direct path read temp' ela= 206 file number=201 first dba=2809 block cnt=1 obj#=101556 tim=761574714850
WAIT #33: nam='direct path read temp' ela= 208 file number=201 first dba=2810 block cnt=1 obj#=101556 tim=761574715640

We start off doing something quite surprising – jumping back and fore when we should be reading a continuous stream of blocks and even re-reading a few blocks multiple times (possibly this is a small set of “map” blocks) before settling down and reading great long continuous chains of adjacent blocks: one block at a time!

Summary:

This is a small demonstration – it is repeatable, and I built it to raise an SR for a problem where a client had to process an 8GB data set to the temporary tablespace, did a nine pass workarea operation (that should have been a one-pass operation) and ended up doing 500,000 single block reads (16KB block size) to return the final result set. The I/O to the temporary tablespace was the largest time component of the entire query by a significant margin – probably taking about four times as long as it should have.

The code to handle analytic sorts (the WINDOW SORT operation) has some serious defects which show up if the sort spills to disc – and a sort of more the 100MB  will spill to disc if you set workarea_size_policy to auto – no matter how large your pga_aggregate_target (and the derived _smm_max_size)

On the plus side, by adjusting a couple of hidden parameters I was able to bypass the 100MB limit for the “smaller” analytic sorts and do some damage limitation on the merge passes for larger analytic sorts; but I couldn’t find any way of avoiding the single block reads after the last (apparently redundant) write.

Workarounds:

As ever, you need to get approval from Oracle support for using hidden parameters, but if you need to keep large analytic sorts in memory the limit is set (suprisingly) by the hidden parameter _smm_isort_cap which defaults 100MB – not by the _smm_max_size.

If you have very large analytic sorts that really have to spill to disc then you may need to switch a session to manual workarea_size_policy so that the memory doesn’t shrink in the merge phase, and then set the hidden parameter _sort_multiblock_read_count to something reasonably large (the equivalent of 256KB to 1MB perhaps – the value is the number of Oracle blocks for a multiblock read).

Finally, if you find a way to get a large multiblock read on the final read – please let me know.

Footnote: While waiting for permission from the client to publish this note, I passed my test case to the OakTable Network for comment, and Charles Hooper ran it on his copy of 11.2. The problem with the memory management and multi-pass sort seems to be fixed in that version – but the (apparently) redundant final write is still there, and the final read is still done with single block reads. I’m still trying to think  of a good reason why they could be necessary.

Update: It’s not just the WINDOW SORT operation that suffers from the sharp reduction in the workarea size and consequential multipass operation. We’re getting it all over the place in very large sort operations.  The analytics become very visible, though, because they get limited by the _smm_isort_cap and start to spill to disc at 100MB whereas the simple sorts use the _smm_max_size limit and rarely spill to disc.  (Aggregates have the same problem – they also follow the _smm_isort_cap limit, but that’s s a story for another day).

Update 2: The problem of the multi-pass sort is bug 6817844 – the bug note states that this is fixed in 11.2, but it’s actually in the list of bugs fixed in 11.1.0.7 (Metalink note 601739.1). A workaround suggested in the bug note is to set hidden parameter _newsort_enabled to false – and that suggestion is worth a little discussion in a separate blog entry, if I can find the time.

Update 3: 1st Oct 2009: We’ve now got a backport of the patch to 10.2.0.4 (for AIX) and installed it. It seems to be working, the example I had where a query turned into a 9-pass sort operation now runs as a one-pass operation. I still have an issue with the sizes of the direct path reads and writes – especially the single block read on the final pass – but the worst part of the problem has been fixed.

Update 4: 3rd Oct 2009: In a comment on one of my postings about bugs, Timur Akhmadeev has referenced an OTN thread that references Metalink document id 330818.1 and bug 4047116, with the suggestion that in similar circumstances to the above, you can improve the I/O behaviour by setting two hidden parameters (_smm_auto_min_io_size and _smm_auto_max_io_size) to matching values – with 248 or 1008 being the two suggested sizes.

This had some effect on the I/O sizes in my case, but only while writing and merging intermediate sort runs. However, I still saw alternating single block reads during the merges, and nothing but single block reads for the final read back. Ultimately, this didn’t change the performance of the example I was running on production, so I didn’t pursue it further – but you may want to look at it.

Update Feb 2010

In comment 13, Greg Rahn has identified this as bug 9041800 with a 10.2.0.4 backport already available.

 

30 Comments »

  1. Jonathan.

    Thanks for the concrete test case and kind explanation.
    I’ve been reported similar cases from our customers which also made them disappointed. Metalink also have similar bug reports – 4417415, 8261915 – whose status is identifed as “NOT A BUG”. :(

    Comment by Dion Cho — September 8, 2009 @ 1:00 am BST Sep 8,2009 | Reply

  2. Hi Jonathan,
    how did you get 25.6MB as the max value of a single workarea does the value always default to 20% of pga or it varies

    Thanks

    Comment by josh — September 8, 2009 @ 2:51 am BST Sep 8,2009 | Reply

  3. Jonathan,

    thank you for clean explanation. One question is not mentioned/answered though. Disk sorts usually does not indicate themselves as IO-bound with asynchronous IO in place. Since elapsed times for sort IO suggests that that was async IO calls (with one exception which lasted 29213 us), how would you confirm that IO and not something else which is done by Oracle in the mean time accounts for the execution response time in such cases?

    Comment by Timur Akhmadeev — September 8, 2009 @ 2:24 pm BST Sep 8,2009 | Reply

  4. Dion,
    I’d seen 8261915, when researching this, but hadn’t found 4417415 – so thanks for that. The visible information on both notes leaves much to be desired – and my impression from the one I saw was that the poster hadn’t quite worked out which parameter meant what, but I think both notes are probably talking about the same symptoms; all I’ve done is prepare a reproducible test case.

    Josh,
    From 10gR2 (possibly even 10.1) the limit for a single serial operation depends on the absolute value of the pga_aggregate_target and allows for variation in percentage and an absolute limit.

    Timur,
    This was just a small test case running on my laptop with lots of memory acting as a local file system cache. The production system dumped 8GB to temp and did a 9-pass merge when a single pass should have been sufficient (we had 16GB for the pga_aggregate_target, of which only 3GB was in use at the time). The excess I/O time contributed a reasonable fraction of the total response time.

    I’ve added a footnote to the page: it’s not just analytic sorts that have problems with sharp reductions in memory used, it’s all over the place.

    Comment by Jonathan Lewis — September 9, 2009 @ 6:55 pm BST Sep 9,2009 | Reply

  5. Jonathan,

    thanks for the test case. I have to say I have witnessed similar behaviour on several large warehouses, where the analytical queries produced excessive number of reads from temp. Since we used analytics to ensure uniqueness of data sets during the load, this happened quite often. But somehow it never occured to me to suspect Oracle – I have always thought our ETL has some performance issues in it.

    Comment by Jakub Illner — September 10, 2009 @ 2:28 pm BST Sep 10,2009 | Reply

  6. […] functions can be a pain in all databases. The detailed article from Johnatan Lewis‘ Analytic Agony looks into ways to alleviate that […]

    Pingback by Log Buffer #161: a Carnival of the Vanities for DBAs | Pythian Group Blog — September 11, 2009 @ 5:42 pm BST Sep 11,2009 | Reply

  7. […] 1-Dangerous sorting behaviour of Oracle for Analytic functions Jonathan Lewis – Analytic Agony […]

    Pingback by Blogroll Report 04/09/2009 – 11/09/2009 « Coskan’s Approach to Oracle — September 19, 2009 @ 6:03 pm BST Sep 19,2009 | Reply

  8. […] 6:01 pm UTC Oct 1,2009 I mentioned the hidden parameter _smm_isort_cap recently in my solution to a problem with analytic functions applied to large volumes of data; but in that note I didn’t give you much detail about what it […]

    Pingback by _smm_isort_cap « Oracle Scratchpad — October 1, 2009 @ 6:01 pm BST Oct 1,2009 | Reply

  9. Jonathan,

    I’ve also noticed the 31/1 read aspect, and one thing that seems to help is setting the min and max io sizes:

    *._smm_auto_max_io_size=2048
    *._smm_auto_min_io_size=256

    When I was testing these, it made a very significant difference in performance.

    The small IO sizes are an even bigger problem with JBOD disks and ASM, where there is no SAN to spot the pattern and apply read-ahead.

    Comment by Christo Kutrovsky — October 2, 2009 @ 5:19 pm BST Oct 2,2009 | Reply

  10. […] “Analytic Agony” – by Jonathan Lewis […]

    Pingback by Pure DBA Science – some good postings « ocpdba oracle weblog — October 6, 2009 @ 5:18 pm BST Oct 6,2009 | Reply

  11. Hello Jonathan,

    while reading the last chapter on sorting of your book “Cost Based Optimizer”, you mention
    that when setting workarea_size_policy set to auto Oracle allocate 5% of pga_aggregate_target

    I don’t understand why you show in your test case 25.6 MB

    Is the example in your book is run on 10.1 ?

    Thanks

    Comment by Henish — October 15, 2009 @ 3:25 pm BST Oct 15,2009 | Reply

    • Henish,

      The book is based on 9.2 with some comments on differences between 8i and 10g.

      The details of hard and soft limits based on the pga_aggregate_target changed quite siginficantly across versions. The most comprehensive description of 10g behaviour comes from a paper by Joze Senegacnik

      Comment by Jonathan Lewis — October 17, 2009 @ 10:33 am BST Oct 17,2009 | Reply

  12. […] of uses, and the lag/lead functions are among my favourites. I always like to remind people of the sorting overheads involved with analytic functions – but if you are prepared to accept the overhead, the […]

    Pingback by Log Switching « Oracle Scratchpad — January 1, 2010 @ 3:08 pm GMT Jan 1,2010 | Reply

  13. This related to bug 9041800 and there is a 10.2.0.4 backport available as of 01/29/10. For other versions/platforms, request a backport from Oracle Support.

    Comment by Greg Rahn — February 7, 2010 @ 4:27 pm GMT Feb 7,2010 | Reply

  14. Hi Jonathan,

    I think we can correlate the “one-block-at-a-time” reads with phases when the v$session_longops.OPNAME=’Sort Output’ and V$SESSION_WAIT.event=’direct path read temp’; The bad news is currently there is no way to inject the v$session_longops.OPNAME values in the 10046 traces so we can’t get more objective breakdown. The good news is I managed to convince a few people at Oracle (Carlos Sierra and Hector Pujol) to open a new enhancement request #9381412 for the following:
    “For diagnostic purposes it helps to correlate the value
    v$session_longops.opname with waits or CPU related to the OPName. This
    request is for additional info to be placed into the 10046 trace file that
    would allow TKProf or other tools to report how much cpu, waits, and elapsed
    time occured for an OPName as well as track specific waits that occurred
    under that OPName. To illustrate, this would appear as:

    LONGOP #2 opname=”Sort Output” cpu=12345 elapsed=4567
    WAIT #18: nam=’direct path read temp’ ela= 132 tim=1232507925946426
    WAIT #18: nam=’direct path read temp’ ela= 141 tim=1232507925947219
    WAIT #18: nam=’direct path read temp’ ela= 132 tim=1232507925947762

    TKprof could then report how much time occured under each longop.opname ”

    If you think that this is a good idea please let me know if/how we can make a stronger case for such enhancement.

    Rgds,
    Alex

    Comment by Alex Haralampiev — February 16, 2010 @ 4:35 am GMT Feb 16,2010 | Reply

    • Alex.
      That sounds like a nice idea – but there are lots of little changes like that that would be really nice so it’s hard to make a very strong case for any one of them. For example, it would be nice if the wait states reported the line of the execution plan that they were operating on when they started to wait.

      Since trace files tend to be after the event, linking a trace file with a dynamic performance view might be a little awkward for the DBA.

      Comment by Jonathan Lewis — February 16, 2010 @ 6:55 pm GMT Feb 16,2010 | Reply

      • Jonathan,

        > it would be nice if the wait states reported the line of the execution plan that they were operating on when they started to wait

        I know it’s an old thread and comment but just a couple of thoughts on this.

        As you will know, ASH gives the ability to do some of that. Using the v$active_session_history buffer, I know it’s only sampled data every second but

        1. it reports waits with execution lines via SQL_PLAN_LINE_ID / SQL_PLAN_OPERATION / SQL_PLAN_OPTIONS and EVENT.

        2. real-time SQL Monitoring can report similar (also based on the ASH data but also tying in with longops data which is useful when for operation progress % when using it in real-time which obviously it doesn’t have to be despite the title).

        Cheers,
        Dominic

        Comment by Dom Brooks — March 9, 2011 @ 5:15 pm GMT Mar 9,2011 | Reply

  15. I have noticed WINDOW (SORT) operation is running serial (I mean only one slave is processing the data) , do you have any suggestions to improve this.

    Comment by Sudheer — July 26, 2011 @ 12:47 pm BST Jul 26,2011 | Reply

    • I can’t think of any definite reason why not – as far as I know there is nothing that forces analytic functions to run serially. I suggest you post the query and the execution plan to the OTN database forum and ask for ideas.

      Comment by Jonathan Lewis — August 1, 2011 @ 11:03 am BST Aug 1,2011 | Reply

  16. […] it; it’s a window sort of the entire sampled data set which could be quite large and, as we’ve seen elsewhere, could be very […]

    Pingback by Histogram Generation « Oracle Scratchpad — September 13, 2012 @ 5:21 pm BST Sep 13,2012 | Reply

  17. […] before reading it again and applying it to the index (for a couple of points on tracing sorts, see this posting). I don’t know why Oracle chose to use a much smaller read slot size in the second case, […]

    Pingback by Flashback logging | Oracle Scratchpad — March 9, 2015 @ 7:02 pm GMT Mar 9,2015 | Reply

  18. […] was being spent we observed the same type of problem documented by Jonathan Lewis in his blog post Analytic Agony. Tracing the sort operations performed by the queries we noticed the following which is determined […]

    Pingback by 11.2.0.2 Analytic Sort Performance — November 18, 2017 @ 3:33 pm GMT Nov 18,2017 | Reply

  19. Have you investigated using an inline view with an order by to change the “window sort” into a “sort order by” + “window buffer”?

    select  sum(b) over (partition by a order by b range between 10 preceding and current row)
    from    ( 
            select /*+ no_eliminate_oby */ * 
            from t1
            order by a,b
            )
    ;
    

    Comment by Larry — December 3, 2017 @ 1:55 am GMT Dec 3,2017 | Reply

    • Larry,

      I hadn’t investigated this idea, but I’ve just done a quick test on 12.1.0.2 with the data set and query above. I have a note that the problem of very small reads was fixed by 11.2.0.3, so I wasn’t too worried about checking that aspect of the problem (I assume you had suggested suggested this strategy to work around the small reads problem by changing the type of sort mechanism Oracle used). There were a couple of details to consider though.

      a) I got the wrong “order by” clause the first time around (though it looked like an appropriate order by), so my plan had a “sort order by” and a “window sort”
      b) I did get the “sort order by” followed by “window buffer” with the right order by clause.

      Observations:
      The sort order by in 12.1.0.2 was less efficient than the original window sort performed by 12.1.0.2 – possibly because the optimizer limited the workarea size for sorting because it anticipated two sort operations being live in the same cursor. As a result the two-pass approach used a smaller slot size for I/O.
      The only real effect of the “window buffer” two-pass approach was to increase the CPU usage from the original code.

      The SQL I used was as follows:

      
      select
              n1,
              n2,
              small_vc
      from
              (
              select
                      n1,
                      n2,
                      id,
                      small_vc,
                      last_value (id) over (
                              partition by n1
                              order by n2
                      )       last_id
              from
                      (
                      select /*+ no_eliminate_oby */
                              n1, n2, id, small_vc
                      from    t1
                      order by
                              n1, n2, id
                      )
              )
      where
              id = last_id
      ;
      
      

      I had initially assumed that I would get a buffer sort if the order by clause was simply “order by n1, n2”, but that produced a window sort until I added the “id” column to the list.

      Comment by Jonathan Lewis — December 10, 2017 @ 11:39 am GMT Dec 10,2017 | Reply

  20. Hi

    I have a small issue trying to get this to work

    SELECT gurwapp_release, GURWAPP_APPLICATION_NAME, GURWAPP_PRODUCT, GURWAPP_STAGE_DATE, GURWAPP_BUILD_NO,
           RANK() OVER (PARTITION BY gurwapp_application_name ORDER BY gurwapp_release) RANK
      FROM general.gurwapp 
      WHERE gurwapp_application_name = 'FacultySelfService'
      ORDER BY RANK ,  gurwapp_release 
    

    the output is as follows

    9.10	FacultySelfService	STUDENT	21-JAN-20	24877	1
    9.10.0.1	FacultySelfService	STUDENT	21-JAN-20	25027	2
    9.6	FacultySelfService	STUDENT	22-AUG-19	19997	3
    9.7	FacultySelfService	STUDENT	22-AUG-19	21319	4
    9.8	FacultySelfService	STUDENT	22-AUG-19	21941	5
    9.9	FacultySelfService	STUDENT	06-NOV-19	24172	6
    9.9.0.1	FacultySelfService	STUDENT	21-JAN-20	25028	7
    

    Comment by Charan LARQUIER — February 27, 2020 @ 5:45 pm GMT Feb 27,2020 | Reply

  21. oh dear …I hit the button to quickly ..basically I cant get any sql and I have tried several ( really ugly ones) to show me the highest release which is not 9.10 but 9.10.0.1 which in the case above is ranked 2 !

    the data in this field is varchar2 and I tried converting it to_number ..still so that I can apply a max function…forget that !…Any insights to this will help …thanks

    Comment by Charan LARQUIER — February 27, 2020 @ 5:47 pm GMT Feb 27,2020 | Reply

    • Charan,

      Thanks for the comment.
      I guess what you’re after is:

      9.6 FacultySelfService  STUDENT 22-AUG-19   19997   3
      9.7 FacultySelfService  STUDENT 22-AUG-19   21319   4
      9.8 FacultySelfService  STUDENT 22-AUG-19   21941   5
      9.9 FacultySelfService  STUDENT 06-NOV-19   24172   6
      9.9.0.1 FacultySelfService  STUDENT 21-JAN-20   25028   7
      9.10    FacultySelfService  STUDENT 21-JAN-20   24877   1
      9.10.0.1    FacultySelfService  STUDENT 21-JAN-20   25027   2
      

      Which means Oracle has to do a type of descent through the level in the character strings, treating each component as a separate ordering.
      It’s a generic problem, and I think I’ve seen a very elegant solution on the ODC forum ( https://community.oracle.com/community/groundbreakers/database/developer-tools/sql_and_pl_sql/content ) so you might want to register there and ask for suggestions.

      Off the top of my head I’d probably write a simple pl/sql function (possibly using the 12c “with function” mechanism to embed the function in the SQL, or using pragma UDF in a stored pl/sql function). My target would be to take an input string, cut it into substrings at the dots; left pad each to a fixed length with zeros, then stitch the pieces back together, and order on that (but report the original gurwapp_release).

      Regards
      Jonathan Lewis

      Comment by Jonathan Lewis — February 27, 2020 @ 6:11 pm GMT Feb 27,2020 | Reply

  22. my first rough solution was to to use regexp_substr, with the (\d) switch for decimals and then COALESCE and change the whole thing to to_number and then apply a max function to get the highest for each part of the string ..once the max was found ..then concatenate the entire string to get to 9.10. …which worked…except that when I started going for 9.10.0.1 ….it all went south …

    select distinct(SELECT MAX( TO_NUMBER( REGEXP_SUBSTR( gurwapp_release, ‘(\d)’ ), ‘0’ ) )
    FROM general.gurwapp where GURWAPP_APPLICATION_NAME = ‘FacultySelfService’)||’.’
    ||(SELECT MAX( TO_NUMBER( COALESCE( REGEXP_SUBSTR( gurwapp_release, ‘(\d)(\d+)’ ), ‘0’ ) ) )
    FROM general.gurwapp where GURWAPP_APPLICATION_NAME = ‘FacultySelfService’)
    from general.gurwapp g

    Comment by Charan LARQUIER — February 27, 2020 @ 6:58 pm GMT Feb 27,2020 | Reply

  23. But I’ll try what you said ….thank you …

    Comment by Charan LARQUIER — February 27, 2020 @ 6:59 pm GMT Feb 27,2020 | Reply


RSS feed for comments on this post. TrackBack URI

Leave a reply to Dom Brooks Cancel reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Website Powered by WordPress.com.