Oracle Scratchpad

December 17, 2006

Buffer Sorts

Filed under: CBO,Execution plans,Performance,sorting,trace files — Jonathan Lewis @ 9:48 pm GMT Dec 17,2006

In an earlier article I mentioned the buffer sort in a footnote; I thought I would expand a little more on what I think it does and why it appears as a buffer sort in an execution plan rather than the more traditional sort (join).

Consider the trivial script:

execute dbms_random.seed(0)          

create table t1
select	trunc(dbms_random.value(1,4))	n1
from	all_objects
where	rownum <= 10

--  gather table stats at this point         

alter session set events '10032 trace name context forever';         

	ta.n1, tb.n1
	t1	ta,
	t1	tb

alter session set events '10032 trace name context off'         

| Id  | Operation            |  Name       | Rows  | Bytes | Cost  |
|   0 | SELECT STATEMENT     |             |   100 |   600 |    22 |
|   1 |  MERGE JOIN CARTESIAN|             |   100 |   600 |    22 |
|   2 |   TABLE ACCESS FULL  | T1          |    10 |    30 |     2 |
|   3 |   BUFFER SORT        |             |    10 |    30 |    20 |
|   4 |    TABLE ACCESS FULL | T1          |    10 |    30 |     2 |

If you run the query, you will find that the generated data is a random collection of 1s, 2s, and 3d – and there is no obvious sorted order in the output. Despite an apparent “sort” in the execution plan, the 100 rows generated by the Cartesian (everything to everything) join have not been sorted in any way.

I believe the operation at line 3 is labelled a buffer sort to give you a clue that it’s using the buffering mechanism of a traditional sort, but not really doing a sort. And you don’t actually need a sort, you are simply buffering the data to avoid multiple tablescans against real data blocks.

You can confirm that there is no real sort by using the 10032 trace (one of the sort traces) as I have done. The output is shown below:

---- Sort Statistics ------------------------------
Input records                             10
Output records                            10
Total number of comparisons performed     9
  Comparisons performed by in-memory sort 9
Total amount of memory used               8192

The indications from the 10032 trace for this example are that Oracle is “comparing” rows as it reads them. But when you examine the output it isn’t doing anything with those comparison – the data doesn’t get ordered. Moreover, if you simply select the 10 rows from t1, you can see that it would take more than 9 comparisons to sort the rows into order.

So buffer sort is using the buffering mechanism of a sort, but not doing the sort itself.

Footnote: if you run this test on 8i, you would see a sort (join) in the execution plan – but the 10032 shows that even in 8i, the sort isn’t really sorting.

Addendum (Aug 2008): The execution plan show above is from 9i, and the cost of 20 in buffer sort line comes from multiplying the cost of doing the sort once by 10 – because of the cardinality of 10 reported for the first table. This is an arithmetic error that has been fixed in 10g – which means you are much more likely to see plans in 10g that make use of this buffer sort / merge strategy.

[More on Buffer Sorts]


  1. Run it in, same results.

    I’ve run also a variant: I’ve joined t1 to
    create table t2
    select rownum n1 from dual connect by level < 20000
    order by dbms_random.random;

    | Id  | Operation            | Name | Rows  | Bytes | Cost (%CPU)|
    |   0 | SELECT STATEMENT     |      |   180K|  1230K|    79   (4)|
    |   1 |  MERGE JOIN CARTESIAN|      |   180K|  1230K|    79   (4)|
    |   2 |   TABLE ACCESS FULL  | T1   |     9 |    27 |     3   (0)|
    |   3 |   BUFFER SORT        |      | 20000 | 80000 |    76   (4)|
    |   4 |    TABLE ACCESS FULL | T2   | 20000 | 80000 |     8   (0)|

    Data comes out unsorted as well – but what’s curious is that the CBO chooses to buffer the biggest table, I was expecting the opposite … especially since it starts to “sort” to disk (unnecessarily IMHO) if the sort workarea is too small (eg 60000 bytes). I wonder whether you know some reason for this behaviour I’m missing.

    Comment by Alberto Dell'Era — December 17, 2006 @ 11:26 pm GMT Dec 17,2006 | Reply

  2. Alberto, the only thought I have is that the CPU costing made Oracle decide that sweeping the 20,000 rows in the buffer 9 times was cheaper than sweeping a 9 row buffer 20,000 times. Note that there is no temp_space column reported, so the optimizer didn’t see the dump to disk comig.

    Comment by Jonathan Lewis — December 17, 2006 @ 11:35 pm GMT Dec 17,2006 | Reply

  3. “Footnote: if you run this test on 8i, ”

    …ahhh…I envy folks that still have 8i running…stability rules!

    Comment by kevinclosson — December 18, 2006 @ 12:39 am GMT Dec 18,2006 | Reply

  4. Alberto, I suddenly realised this morning that the reason Oracle ‘sorts’ the larger table is because it doesn’t cost for the merge join.

    If you check the 10053, it costs a nested loop, then operates a merge. So it doesn’t even check if the second table will “sort” in memory. The comparison is simply. In your example, the cost of 79 is: 3 + 76, which (allowing for rounding etc) is 3 + 9 * 8. The cost the other way around would roughly: 8 + 20,000 * 3.

    Comment by Jonathan Lewis — December 18, 2006 @ 9:24 pm GMT Dec 18,2006 | Reply

  5. The interesting part is, anyway, that the whole “buffer sort” mechanism is re-used, eg including the swap to disk when not enough sort workarea memory is available. So it really looks like, as you described, that the very same code is used – just running in “no sort mode”.

    Comment by Alberto Dell'Era — December 18, 2006 @ 9:36 pm GMT Dec 18,2006 | Reply

  6. I wonder if you can somehow request a specific part of a plan to be executed via buffer sort. That would be quite usefull in my opinion.

    It’s like a in memory on the fly temp table.

    Comment by Christo Kutrovsky — December 21, 2006 @ 6:06 pm GMT Dec 21,2006 | Reply

  7. Thinking a bit more about this.

    What’s the difference between a temporary table and a buffer sort ?

    temporary table – in SGA
    buffer sort – in PGA

    Both of these are “swapable” in the temporary files.

    Comment by Christo Kutrovsky — December 21, 2006 @ 6:39 pm GMT Dec 21,2006 | Reply

  8. Chris, I guess the biggest difference is that the temporary table has to go through the db cache, which introduces latching effects that don’t appear for buffer sorts. I can’t think of any hint (or parameter) which would induce buffer sorts – though there is a ‘cache_temp_table’ hint that I haven’t investigated yet.

    Comment by Jonathan Lewis — December 21, 2006 @ 8:22 pm GMT Dec 21,2006 | Reply

  9. Hi Lewis
    when you say
    “you are simply buffering the data to avoid multiple tablescans against real data blocks”
    You mean it is buffering in PGA.
    Thank you very much

    Comment by CT — January 10, 2007 @ 4:38 pm GMT Jan 10,2007 | Reply

  10. CT, that’s correct (although technically it might be the UGA – I haven’t tried to check the exact location).

    Comment by Jonathan Lewis — January 10, 2007 @ 7:34 pm GMT Jan 10,2007 | Reply

  11. […] — Jonathan Lewis @ 7:26 pm UTC Jan 12,2007 Just a little follow-up on my earlier note on buffer sorts. The following is an extract from a a tkprof output showing the rowsource operation for a query. […]

    Pingback by Buffer Sorts - 2 « Oracle Scratchpad — January 12, 2007 @ 7:26 pm GMT Jan 12,2007 | Reply

  12. As in the past it is a treat for me to read what you tell in your typical mathematician style. Thanks a lot..

    Comment by Vikas Atrey — April 2, 2007 @ 11:27 am GMT Apr 2,2007 | Reply

  13. Nice article!

    Comment by Madbo — July 28, 2007 @ 12:04 pm GMT Jul 28,2007 | Reply

  14. Thank you very much,clarification.

    Comment by puika — July 29, 2007 @ 4:38 pm GMT Jul 29,2007 | Reply

  15. Why is the buffer sort cost so high then? If it only buffers the data and no sorting is done?
    I have a situation where table access full costs around 10.000 and step after is buffer sort with Seems insane.

    Comment by HeSh — October 4, 2007 @ 10:39 am GMT Oct 4,2007 | Reply

  16. Hi,

    I also have a situation where the buffer is returning 284395648 rows , and in the TKPROF output it is the first step. Any idea how it can be avoided as it takes around 20 minutes for the query to execute.


    Comment by Himanshu — August 15, 2008 @ 8:42 pm GMT Aug 15,2008 | Reply

  17. HeSh,

    Sorry I missed your question back in 2007. I’ve just added a note to the end of the article which probably answers your question. In 9i there is a multiplication involved that should not be there.

    There isn’t enough information in your question for me to give a sensible answer. I don’t think you are doing a merge join cartesian, though, if your first step is a buffer sort.

    Comment by Jonathan Lewis — August 19, 2008 @ 6:44 pm GMT Aug 19,2008 | Reply

  18. Jonathan,

    thanks for the time. I am sorry , I should have send you the complete execution plan, the plan was showing a full table acess as the first step , then a buffer sort and then a Merge join cartesian.
    After trying different things it was corrected by creating a index so that the full table scan was avoided which removed the buffer sort and the also removed the Merge join cartesian.


    Comment by Himanshu — August 26, 2008 @ 12:14 pm GMT Aug 26,2008 | Reply

  19. I was able to improve the performance of the query when a index was created on one of the tables, but when the changes were moved to production the query was taking lot of time to execute and when the plan was checked it showed the buffer sort and then a Merge join cartesian.
    I tried to put a hint and force the query to use the index but that also did not help as it was still doing the Merge join. The QA and DEV enviroment are same. Any idea how to improve the perfermonce. Please let me know if you need more info.

    Comment by Himanshu — September 4, 2008 @ 7:18 pm GMT Sep 4,2008 | Reply

  20. Himanshu,

    I’m sorry, I don’t do query tuning on the blog. It’s not a very good use of the limited amount of time I have for the Oracle community.

    A general strategy you need to consider: Look at the list of tables, and the list of join conditions between tables.

    Think about which table would return the smallest amount of data if you used the available predicates on that table.

    Decide which table should be the next table, based on how the data size would change if you joined to it, and the join mechanism you would probably want to use to get there. Repeat for each table in turn.

    As you move from table to table, an important general strategy is to try to keep the volume of the result set as small a possible.

    For a more detailed strategy, Dan Tow’s book on SQL Tuning is probably the best option.

    Comment by Jonathan Lewis — September 6, 2008 @ 9:45 pm GMT Sep 6,2008 | Reply

  21. […] there is an appropriate index on the second table Oracle still reports a sort operation. Mind you, as I’ve pointed out in the past, sorts are necessarily what they seem to be. Leave a […]

    Pingback by Joins – MJ « Oracle Scratchpad — August 15, 2010 @ 5:52 pm GMT Aug 15,2010 | Reply

  22. […] Pay attention on the step 4 – BUFFER SORT. BUFFER SORT it is buffering technic using sort area size, without actual sorting. Note: additional details: Jonathan Lewis – buffer-sorts […]

    Pingback by Change in parallel insert between 11.1 and 11.2 with serial data access « Alexander Anokhin — June 22, 2012 @ 9:35 am GMT Jun 22,2012 | Reply

  23. Dear mr. Lewis,

    I did the following test on

    - create table e3 (id number(10), clu number(10), sca number(10), ls varchar2(20));
    - insert into e3 select level ,level/100,mod(level,100),'c' from dual connect by level&lt;=10;
    - collect statistics on the table
    SQL&gt; select count(*) from e3,e3;
    Execution Plan
    Plan hash value: 12326319
    | Id  | Operation             | Name | Rows  | Cost (%CPU)| Time     |
    |   0 | SELECT STATEMENT      |      |     1 |    18   (0)| 00:00:01 |
    |   1 |  SORT AGGREGATE       |      |     1 |            |          |
    |   2 |   MERGE JOIN CARTESIAN|      |   100 |    18   (0)| 00:00:01 |
    |   3 |    TABLE ACCESS FULL  | E3   |    10 |     3   (0)| 00:00:01 |
    |   4 |    BUFFER SORT        |      |    10 |    15   (0)| 00:00:01 |
    |   5 |     TABLE ACCESS FULL | E3   |    10 |     2   (0)| 00:00:01 |
              0  recursive calls
              0  db block gets
             14  consistent gets
              0  physical reads
              0  redo size
            422  bytes sent via SQL*Net to client
            419  bytes received via SQL*Net from client
              2  SQL*Net roundtrips to/from client
              1  sorts (memory)
              0  sorts (disk)
              1  rows processed

    So I think in oracle 11.2 a buffer sort is still multiplies by some factor.
    Would you say the same, or am I mistaken?

    Best Regards,
    Martijn Bos

    Comment by Martijn Bos — September 7, 2013 @ 11:03 am GMT Sep 7,2013 | Reply

    • Martijn,

      I think you’re right – and it’s fairly easy to demonstrate.
      Unfortunately I don’t have any notes about what I did to make myself think that it wasn’t happening in 10g (especially odd, since the first comment points out that the pattern is similar in 10g).

      There is a difference, given that the cost isn’t just “cardinality of first step * cost of second step”, so some extra sophistication has appeared (and perhaps that fooled me in 2008, and I didn’t pursue the details.)

      The interesting thing about your example (and Alberto’s in comment 1) is that the cost of the tablescan the second time it appears is lower than the first time. A first thought is that the “_tablescan_cost_plus_1” parameter is not applied for costing in a Cartesian merge; alternatively Oracle may be introducing a factor to allow for “self-induced caching”.

      Comment by Jonathan Lewis — September 7, 2013 @ 2:28 pm GMT Sep 7,2013 | Reply

  24. […] If you are wondering what a buffer sort is check this out. […]

    Pingback by 04 Explain Plan – Life Less Ordinary — October 12, 2016 @ 7:37 pm GMT Oct 12,2016 | Reply

RSS feed for comments on this post. TrackBack URI

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Blog at