Buffer Sorts

December 17, 2006

Buffer Sorts

Filed under: CBO,Execution plans,Performance,sorting,trace files — Jonathan Lewis @ 9:48 pm GMT Dec 17,2006

In an earlier article I mentioned the buffer sort in a footnote; I thought I would expand a little more on what I think it does and why it appears as a buffer sort in an execution plan rather than the more traditional sort (join).

Consider the trivial script:

execute dbms_random.seed(0)          

create table t1
as
select	trunc(dbms_random.value(1,4))	n1
from	all_objects
where	rownum <= 10
;         

--  gather table stats at this point         

alter session set events '10032 trace name context forever';         

select
	ta.n1, tb.n1
from
	t1	ta,
	t1	tb
;         

alter session set events '10032 trace name context off'         

--------------------------------------------------------------------
| Id  | Operation            |  Name       | Rows  | Bytes | Cost  |
--------------------------------------------------------------------
|   0 | SELECT STATEMENT     |             |   100 |   600 |    22 |
|   1 |  MERGE JOIN CARTESIAN|             |   100 |   600 |    22 |
|   2 |   TABLE ACCESS FULL  | T1          |    10 |    30 |     2 |
|   3 |   BUFFER SORT        |             |    10 |    30 |    20 |
|   4 |    TABLE ACCESS FULL | T1          |    10 |    30 |     2 |
--------------------------------------------------------------------

If you run the query you will find that the generated data is a random collection of 1s, 2s, and 3s – and there is no obvious sorted order in the output. Despite an apparent “sort” in the execution plan, the 100 rows generated by the Cartesian (everything to everything) join have not been sorted in any way.

I believe the operation at line 3 is labelled a buffer sort to give you a clue that it’s using the buffering mechanism of a traditional sort but not really doing a sort. And, of course, you don’t actually need a sort as you are simply buffering the data to avoid multiple tablescans against real data blocks.

You can confirm that there is no real sort by using the 10032 trace (one of the sort traces) as I have done. The output is shown below:

---- Sort Statistics ------------------------------
Input records                             10
Output records                            10
Total number of comparisons performed     9
  Comparisons performed by in-memory sort 9
Total amount of memory used               8192

The indications from the 10032 trace for this example are that Oracle is “comparing” rows as it reads them. But when you examine the output it isn’t doing anything with those comparison – the data doesn’t get ordered. Moreover, if you simply select the 10 rows from t1, you can see that it would take more than 9 comparisons to sort the rows into order.

So buffer sort is using the buffering mechanism of a sort but not doing the entire sort itself.

Footnote: if you run this test on 8i, you would see a sort (join) in the execution plan – but the 10032 shows that even in 8i, the sort isn’t really sorting.

Addendum (Aug 2008)

The execution plan shown above is from 9i, and the cost of 20 in the buffer sort line comes from multiplying the cost of doing the sort once by 10 – because of the cardinality of 10 reported for the first table. This is an arithmetic error that has been fixed in 10g – which means you are much more likely to see plans in 10g that make use of this buffer sort / merge strategy.

[More on Buffer Sorts]

Comments (28)

28 Comments »

Run it in 10.2.0.2, same results.

I’ve run also a variant: I’ve joined t1 to
create table t2
as
select rownum n1 from dual connect by level < 20000
order by dbms_random.random;

------------------------------------------------------------------ | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| ------------------------------------------------------------------ | 0 | SELECT STATEMENT | | 180K| 1230K| 79 (4)| | 1 | MERGE JOIN CARTESIAN| | 180K| 1230K| 79 (4)| | 2 | TABLE ACCESS FULL | T1 | 9 | 27 | 3 (0)| | 3 | BUFFER SORT | | 20000 | 80000 | 76 (4)| | 4 | TABLE ACCESS FULL | T2 | 20000 | 80000 | 8 (0)| ------------------------------------------------------------------

Data comes out unsorted as well – but what’s curious is that the CBO chooses to buffer the biggest table, I was expecting the opposite … especially since it starts to “sort” to disk (unnecessarily IMHO) if the sort workarea is too small (eg 60000 bytes). I wonder whether you know some reason for this behaviour I’m missing.

Comment by Alberto Dell'Era — December 17, 2006 @ 11:26 pm GMT Dec 17,2006 | Reply
Alberto, the only thought I have is that the CPU costing made Oracle decide that sweeping the 20,000 rows in the buffer 9 times was cheaper than sweeping a 9 row buffer 20,000 times. Note that there is no temp_space column reported, so the optimizer didn’t see the dump to disk comig.

Comment by Jonathan Lewis — December 17, 2006 @ 11:35 pm GMT Dec 17,2006 | Reply
“Footnote: if you run this test on 8i, ”

…ahhh…I envy folks that still have 8i running…stability rules!

Comment by kevinclosson — December 18, 2006 @ 12:39 am GMT Dec 18,2006 | Reply
Alberto, I suddenly realised this morning that the reason Oracle ‘sorts’ the larger table is because it doesn’t cost for the merge join.

If you check the 10053, it costs a nested loop, then operates a merge. So it doesn’t even check if the second table will “sort” in memory. The comparison is simply. In your example, the cost of 79 is: 3 + 76, which (allowing for rounding etc) is 3 + 9 * 8. The cost the other way around would roughly: 8 + 20,000 * 3.

Comment by Jonathan Lewis — December 18, 2006 @ 9:24 pm GMT Dec 18,2006 | Reply
The interesting part is, anyway, that the whole “buffer sort” mechanism is re-used, eg including the swap to disk when not enough sort workarea memory is available. So it really looks like, as you described, that the very same code is used – just running in “no sort mode”.

Comment by Alberto Dell'Era — December 18, 2006 @ 9:36 pm GMT Dec 18,2006 | Reply
I wonder if you can somehow request a specific part of a plan to be executed via buffer sort. That would be quite usefull in my opinion.

It’s like a in memory on the fly temp table.

Comment by Christo Kutrovsky — December 21, 2006 @ 6:06 pm GMT Dec 21,2006 | Reply
Thinking a bit more about this.

What’s the difference between a temporary table and a buffer sort ?

temporary table – in SGA
buffer sort – in PGA

Both of these are “swapable” in the temporary files.

Comment by Christo Kutrovsky — December 21, 2006 @ 6:39 pm GMT Dec 21,2006 | Reply
Chris, I guess the biggest difference is that the temporary table has to go through the db cache, which introduces latching effects that don’t appear for buffer sorts. I can’t think of any hint (or parameter) which would induce buffer sorts – though there is a ‘cache_temp_table’ hint that I haven’t investigated yet.

Comment by Jonathan Lewis — December 21, 2006 @ 8:22 pm GMT Dec 21,2006 | Reply
- Jonathan,
  
  From http://www.hellodba.com/Download/OracleSQLHints.pdf :
  
  “CACHE_TEMP_TABLE instructs Oracle to put the read temp block to the end of MRU in LRU chain. This hint is normally be used in internal statements.”
  
  Best regards,
  
  Nenad
  
  Comment by Nenad Noveljic — October 2, 2020 @ 2:48 pm BST Oct 2,2020 | Reply
Hi Lewis
when you say
“you are simply buffering the data to avoid multiple tablescans against real data blocks”
You mean it is buffering in PGA.
Thank you very much
CT

Comment by CT — January 10, 2007 @ 4:38 pm GMT Jan 10,2007 | Reply
CT, that’s correct (although technically it might be the UGA – I haven’t tried to check the exact location).

Comment by Jonathan Lewis — January 10, 2007 @ 7:34 pm GMT Jan 10,2007 | Reply
[…] — Jonathan Lewis @ 7:26 pm UTC Jan 12,2007 Just a little follow-up on my earlier note on buffer sorts. The following is an extract from a a tkprof output showing the rowsource operation for a query. […]

Pingback by Buffer Sorts - 2 « Oracle Scratchpad — January 12, 2007 @ 7:26 pm GMT Jan 12,2007 | Reply
As in the past it is a treat for me to read what you tell in your typical mathematician style. Thanks a lot..

Comment by Vikas Atrey — April 2, 2007 @ 11:27 am BST Apr 2,2007 | Reply
Nice article!

Comment by Madbo — July 28, 2007 @ 12:04 pm BST Jul 28,2007 | Reply
Thank you very much,clarification.

Comment by puika — July 29, 2007 @ 4:38 pm BST Jul 29,2007 | Reply
Why is the buffer sort cost so high then? If it only buffers the data and no sorting is done?
I have a situation where table access full costs around 10.000 and step after is buffer sort with 23.000.000.000. Seems insane.

Comment by HeSh — October 4, 2007 @ 10:39 am BST Oct 4,2007 | Reply
Hi,

I also have a situation where the buffer is returning 284395648 rows , and in the TKPROF output it is the first step. Any idea how it can be avoided as it takes around 20 minutes for the query to execute.

thanks

Comment by Himanshu — August 15, 2008 @ 8:42 pm BST Aug 15,2008 | Reply
HeSh,

Sorry I missed your question back in 2007. I’ve just added a note to the end of the article which probably answers your question. In 9i there is a multiplication involved that should not be there.

Himanshu,
There isn’t enough information in your question for me to give a sensible answer. I don’t think you are doing a merge join cartesian, though, if your first step is a buffer sort.

Comment by Jonathan Lewis — August 19, 2008 @ 6:44 pm BST Aug 19,2008 | Reply
Jonathan,

thanks for the time. I am sorry , I should have send you the complete execution plan, the plan was showing a full table acess as the first step , then a buffer sort and then a Merge join cartesian.
After trying different things it was corrected by creating a index so that the full table scan was avoided which removed the buffer sort and the also removed the Merge join cartesian.

thanks

Comment by Himanshu — August 26, 2008 @ 12:14 pm BST Aug 26,2008 | Reply
I was able to improve the performance of the query when a index was created on one of the tables, but when the changes were moved to production the query was taking lot of time to execute and when the plan was checked it showed the buffer sort and then a Merge join cartesian.
I tried to put a hint and force the query to use the index but that also did not help as it was still doing the Merge join. The QA and DEV enviroment are same. Any idea how to improve the perfermonce. Please let me know if you need more info.
thanks

Comment by Himanshu — September 4, 2008 @ 7:18 pm BST Sep 4,2008 | Reply
Himanshu,

I’m sorry, I don’t do query tuning on the blog. It’s not a very good use of the limited amount of time I have for the Oracle community.

A general strategy you need to consider: Look at the list of tables, and the list of join conditions between tables.

Think about which table would return the smallest amount of data if you used the available predicates on that table.

Decide which table should be the next table, based on how the data size would change if you joined to it, and the join mechanism you would probably want to use to get there. Repeat for each table in turn.

As you move from table to table, an important general strategy is to try to keep the volume of the result set as small a possible.

For a more detailed strategy, Dan Tow’s book on SQL Tuning is probably the best option.

Comment by Jonathan Lewis — September 6, 2008 @ 9:45 pm BST Sep 6,2008 | Reply
[…] https://jonathanlewis.wordpress.com/2006/12/17/buffer-sorts/ […]

Pingback by xpchild » Buffer sort 引发的血案 — June 22, 2011 @ 12:56 pm BST Jun 22,2011 | Reply
[…] Pay attention on the step 4 – BUFFER SORT. BUFFER SORT it is buffering technic using sort area size, without actual sorting. Note: additional details: Jonathan Lewis – buffer-sorts […]

Pingback by Change in parallel insert between 11.1 and 11.2 with serial data access « Alexander Anokhin — June 22, 2012 @ 9:35 am BST Jun 22,2012 | Reply

Dear mr. Lewis,

I did the following test on 11.2.0.1.0:

- create table e3 (id number(10), clu number(10), sca number(10), ls varchar2(20));
- insert into e3 select level ,level/100,mod(level,100),'c' from dual connect by level&lt;=10;
- collect statistics on the table

SQL&gt; select count(*) from e3,e3;


Execution Plan
----------------------------------------------------------
Plan hash value: 12326319

----------------------------------------------------------------------
| Id  | Operation             | Name | Rows  | Cost (%CPU)| Time     |
----------------------------------------------------------------------
|   0 | SELECT STATEMENT      |      |     1 |    18   (0)| 00:00:01 |
|   1 |  SORT AGGREGATE       |      |     1 |            |          |
|   2 |   MERGE JOIN CARTESIAN|      |   100 |    18   (0)| 00:00:01 |
|   3 |    TABLE ACCESS FULL  | E3   |    10 |     3   (0)| 00:00:01 |
|   4 |    BUFFER SORT        |      |    10 |    15   (0)| 00:00:01 |
|   5 |     TABLE ACCESS FULL | E3   |    10 |     2   (0)| 00:00:01 |
----------------------------------------------------------------------


Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
         14  consistent gets
          0  physical reads
          0  redo size
        422  bytes sent via SQL*Net to client
        419  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          1  sorts (memory)
          0  sorts (disk)
          1  rows processed

SQL&gt;

So I think in oracle 11.2 a buffer sort is still multiplies by some factor.
Would you say the same, or am I mistaken?

Best Regards,
Martijn Bos

Comment by Martijn Bos — September 7, 2013 @ 11:03 am BST Sep 7,2013 | Reply

Martijn,

I think you’re right – and it’s fairly easy to demonstrate.
Unfortunately I don’t have any notes about what I did to make myself think that it wasn’t happening in 10g (especially odd, since the first comment points out that the pattern is similar in 10g).

There is a difference, given that the cost isn’t just “cardinality of first step * cost of second step”, so some extra sophistication has appeared (and perhaps that fooled me in 2008, and I didn’t pursue the details.)

The interesting thing about your example (and Alberto’s in comment 1) is that the cost of the tablescan the second time it appears is lower than the first time. A first thought is that the “_tablescan_cost_plus_1” parameter is not applied for costing in a Cartesian merge; alternatively Oracle may be introducing a factor to allow for “self-induced caching”.

Comment by Jonathan Lewis — September 7, 2013 @ 2:28 pm BST Sep 7,2013 | Reply

[…] If you are wondering what a buffer sort is check this out. […]

Pingback by 04 Explain Plan – Life Less Ordinary — October 12, 2016 @ 7:37 pm BST Oct 12,2016 | Reply
[…] difference is much less significant. The critical point, though, appears at operation 4 – the Buffer Sort. I set my sort_area_size to something that I know was smaller than the data set I needed to buffer […]

Pingback by Cartesian Join | Oracle Scratchpad — March 4, 2019 @ 1:38 pm GMT Mar 4,2019 | Reply
[…] sort operation before being crunched down to 2,613 rows. It’s probably the buffer sort (which is buffering but not sorting) that has mostly passed through a single server and spilled to disc in the 2019 report. We just […]

Pingback by Case Study | Oracle Scratchpad — June 17, 2022 @ 3:00 pm BST Jun 17,2022 | Reply

RSS feed for comments on this post. TrackBack URI

Comments and related questions are welcome. Cancel reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Oracle Scratchpad

December 17, 2006

Buffer Sorts

Addendum (Aug 2008)

28 Comments »

Comments and related questions are welcome. Cancel reply

Click on the motto below

Everything Changes Eventually

Search this blog

Categories

Special Links

Recent Posts

Recent Comments

Popular articles

Popular References

Archives

Blogroll

Posts by RSS

Oracle Scratchpad

December 17, 2006

Buffer Sorts

Addendum (Aug 2008)

Share this:

Related

28 Comments »

Comments and related questions are welcome. Cancel reply

Click on the motto below

Everything Changes Eventually

Search this blog

Categories

Special Links

Recent Posts

Recent Comments

Popular articles

Popular References

Archives

Blogroll

Posts by RSS