Oracle Scratchpad

December 28, 2009

Short Sorts

Filed under: Infrastructure,Oracle,Performance,sorting,trace files,Tuning — Jonathan Lewis @ 7:29 pm GMT Dec 28,2009

I posted a little holiday quiz – timed to appear just before midnight (GMT) on 24th December – that asked about the number of rows sorted and the memory used for queries like:

select sortcode
from
        (
        select sortcode
        from   t1
        order by
                sortcode
        )
where
        rownum <= 10
;

The number and variety of the responses was gratifying. It’s always interesting to see how many important little details appear as people start to tackle even fairly straight-forward questions like this.

The first question fired back at me was the obvious one – what was the type definition of sortcode ? Without this critical piece of information it’s not possible to say much about the memory requirements. The answer was char(6).  (I hadn’t omitted this deliberately, it just happens that in the UK banking system sort-codes are 6 character fixed length codes, and that’s what the column was supposed to represent – not that anyone else would have known that, of course.)

Knowing that sortcode is char(6) isn’t enough – after character sets can vary from fixed single-byte at one extreme to variable length up to four bytes at the other.  On top of that if you want to estimate the total memory used you also need to know whether you are talking about 32-bit Oracle or 64-bit Oracle because the size of the pointers used in Oracle’s sort mechanism varies with the width of the CPU. Then there’s the problem of Oracle version – 10gR2 introduced a new sorting mechanism which (if it gets used) consumes less overhead.

One respondent even suggested that I might have set the table up as an index-organized table (IOT) or as a table in an index cluster – thus allowing myself an indexed access path without explicitly “creating an index on the table”.  (Tha’ts a reasonable suggestion in a real-life scenario – but I would (probably) have given you a little warning if I were trying to be tricky).  A suggestion that didn’t appear, but may have been relevant, was the possibility that the table was range partitioned on the column I was querying – it’s possible (but I haven’t checked it ) that even in the absence of indexes the optimizer might have done something  clever with partition elimination in some  versions of Oracle.  [Update: I see Charles Hooper thought of, and tested, that idea while I was typing up this note].

Various people then modelled the problem, using explain plan, event 10046 with tkprof, queries against v$sessstat and so on, to get an idea of what happens – and found some (apparently) contradictory results. In one case checking v$sesstat showed 1,000,000 rows sorted while a simultaneous check of the tkprof output reported the actual number of rows from the sort order by stopkey operation was 10.  In another case the execution plan from explain plan for this query and the simpler “select sortcode from t1 order by sortcode” both showed the same results for rows, bytes and temporary (sort) space.

The one thing that didn’t get mentioned in any of the comments was event 10032 – the event to trace sort statistics. Running 10.2.0.3 this is the content of the tracefile I got for the query returning just 10 rows (the table held 1,048,576 rows rather than the 1,000,000 stated in the original problem):


---- Sort Parameters ------------------------------
sort_area_size                    65536
sort_area_retained_size           65536
sort_multiblock_read_count        1
max intermediate merge width      3
---- Sort Statistics ------------------------------
Input records                             1048576
Output records                            114
Total number of comparisons performed     1048968
  Comparisons performed by in-memory sort 1048968
Total amount of memory used               8192
Uses version 1 sort
Does not use asynchronous IO
---- End of Sort Statistics -----------------------

You’ll notice that (a) there are no statistics about disk I/O – this sort completed in memory; (b) the memory allocated as a workarea by the end of the sort was 65,536  bytes – I was using automatic workarea_size_policy, so the sort_area_size and sort_area_retained_size lines report the final memory allocation); and (c) the total memory used was only 8192 bytes.

It is a little odd, you may think, that the number of output rows was reported as 114 rather than the 10 we asked for; it’s also fairly important that Oracle has reported a version 1 sort – the mechanism that was replaced in 10gR2 by a new sorting algorithm.

In comparison look at the trace information for the query “select sortcode from t1 order by sortcode”


--- Sort Parameters ------------------------------
sort_area_size                    19013632
sort_area_retained_size           19013632
sort_multiblock_read_count        1
max intermediate merge width      1159
*** 2009-12-23 14:41:30.062
---- Sort Statistics ------------------------------
Input records                             1048576
Output records                            1048576
Total number of comparisons performed     11324792
  Comparisons performed by in-memory sort 11324792
Total amount of memory used               19013632
Uses version 2 sort
Does not use asynchronous IO

Note particularly that the memory demand is much higher, the sort is a version 2 sort, and the number of comparisons performed is much higher. Oracle is doing something a little bit clever when is handles the “short” sort but it can only do it by using the version 1 sort.

I pointed out in Cost Based Oracle  – Fundamentals that Oracle does not “sort” data in the way that a person would sort a deck of cards (say); basic sorting algorithms for computers involve copying the data into memory and then manipulating a set of pointers to the data so that walking through the pointers in order causes you to jump around the copied data, reading it back in sorted order.

For a version 1 sort Oracle’s algorithm uses a “binary insertion tree”. This means it is basically building a balanced B-tree index on the data as the data is read into memory; moreover there are only two entries in each node of the tree (each node points to just two child nodes or, ultimately, just two items of data). This balanced binary tree algorithm is resource-intensive on memory and CPU – which is why Oracle finally introduced a better algorithm in 10gR2. (When I forced Oracle to use the version 1 sort on the full data set the memory required was 24MB with 20M comparisons compared to the 19MB and 11M comparisons above).

However, if you are building a binary index as you read the data an interesting advantage appears when you only want to report a tiny subset of the full sorted data set. And at this point, you uncover the ambiguity in the question: “How many rows does Oracle sort?”

Consider the sequence of steps, which must go something like this:

  • we wish to report the top 10 rows, and the optimizer is aware of this and has special code to handle the task.
  • we copy the first 10 rows from the table into memory and build the binary tree – and we keep track of the largest value we have linked into the binary tree.
  • from this point onwards we read a value into memory and compare it with the known largest value in the tree
  • if the value is too large we discard it and read the next value, otherwise we insert it into the binary tree and discard the previous largest value from the tree

The total number of comparisons we perform is made up of two components – comparing every row with the current largest (so N-1 comparisons if we have N rows), and doing the comparisons (log2(N) for each row) to insert a few “replacement” values into a “small” binary tree.

So how many rows have we sorted ? You could argue for three possibilities –

  • the 10 rows we finally get
  • the 1,000,000 rows we looked at
  • or the “X” rows that we inserted into the binary tree as we went along

Notice in our “short sort” trace above that the number of output rows reports was 114 – I assume that this is the number of rows that were inserted into the binary tree. Since we have a binary tree holding 10 items it has a height of 4, so we should expect the total number of comparisons to be roughly 1048576 + 4 * 114 … and the actual result is pretty close.

As a final comment on how unfair my original question was, just think about the effect the order of the incoming data has on the efficiency of the algorithm. The number of inserts into the B-tree could vary dramatically with the pre-existing ordering of the data.  Here’s an extreme data set to make the point.

rem
rem     Script:         short_sort.sql
rem     Author:         Jonathan Lewis
rem     Dated:          Sep 2004
rem

execute dbms_random.seed(0)

create table t1
as
with generator as (
        select	--+ materialize
                rownum id
        from	all_objects
        where   rownum <= 5000
)
select
        rownum - 1 ordered,
        1000000 - rownum reversed,
        trunc(dbms_random.value(1,1000000)) randomised
from
        generator v1,
        generator v2
where
        rownum <= 1 * 1048576
;

select * from (
        select * from t1 order by ordered
        -- select * from t1 order by reversed
        -- select * from t1 order by randomised
)
where
        rownum <= 10
;

Depending on your choice of column in the order by clause, the output rows, comparisons, and memory used are as follows:

Ordered
-------
Output records                            10
Total number of comparisons performed     1048575
Total amount of memory used               8192

Reversed
--------
Output records                            1048576
Total number of comparisons performed     5242856
Total amount of memory used               1130496

Randomised
----------
Output records                            135
Total number of comparisons performed     1049032
Total amount of memory used               8192

21 Comments »

  1. Hmm…
    So the statement The table has a column called sortcode which has no nulls, and has been generated in a highly random way so that no value appears more than four times was just a smoke ?
    But thanks. I was not even aware of 10032 event and reading other comments made me realise what other possibilities I did not even think of.

    Comment by Narendra — December 28, 2009 @ 10:51 pm GMT Dec 28,2009 | Reply

    • Narenda,

      No smoke, at least not intentionally. If you think about the two comments I made about the data, one of them tells you what it looks like, the other tells you what it is supposed to represent.

      Comment by Jonathan Lewis — December 30, 2009 @ 4:27 pm GMT Dec 30,2009 | Reply

  2. I ran it on 11.2.0.1.0

    ordered
    ---- Sort Statistics ------------------------------
    Input records                             1048576
    Output records                            10
    Total number of comparisons performed     1048575
      Comparisons performed by in-memory sort 1048575
    Total amount of memory used               2048
    Uses version 1 sort
    
    
    reversed
    ---- Sort Statistics ------------------------------
    Input records                             1048576
    Output records                            1048576
    Total number of comparisons performed     5242856
      Comparisons performed by in-memory sort 5242856
    Total amount of memory used               1337344
    Uses version 1 sort
    
    randomised
    ---- Sort Statistics ------------------------------
    Input records                             1048576
    Output records                            135
    Total number of comparisons performed     1049022
      Comparisons performed by in-memory sort 1049022
    Total amount of memory used               2048
    Uses version 1 sort
    

    All of them are using Version 1 sort. How can I force Oracle to use Version 2 sort?
    and how can I force it to use version 1 sort?

    Comment by Winston — December 29, 2009 @ 12:02 am GMT Dec 29,2009 | Reply

  3. I don’t think your original question “How many rows does Oracle sort?” was unfair and ambiguous.

    But I think that here you are mixing up a bit of ideas.

    The first is the number of rows that there is in the table. It’s 1.000.000 full stop. So Oracle sort 1.000.000 rows, i.e. Oracle reads 1.000.000 rows and “do-some-magic” so we get them ordered.

    The second idea is how many comparisons Oracle does to sort that rows. “Comparisons” are a complete different kind of objects and they depends on the algorithm that is used to sort (i.e. qsort is better that bubble sort because it does less comparisons).

    Reading your post, I get the idea that you mean “Oracle have 1.000.000 rows but it sorts 2.000.000 rows” (as it multiplies it), well this is wrong, at least “we can say that Oracle do 3.000.000 comparisons to sort 1.000.000” but the number of rows doesn’t change.

    Bye,
    Antonio

    Comment by lascoltodelvenerdi — December 30, 2009 @ 7:58 am GMT Dec 30,2009 | Reply

    • Antonio,

      I’ve re-read the posting and don’t see any confusion between sorting and comparing. What I see is that we can infer from the change in the number of comparisons that something different has happened during sorting.

      One of your comments, however, highlights the ambiguity that surrounds the word “sorting” in this context.

      When we ask for 10 rows the mechanism does not “sort” 1,000,000 rows because it does NOT get 1,000,000 rows into order; it gets 10 rows into order and 999,990 rows that are not well-ordered.

      Comment by Jonathan Lewis — December 30, 2009 @ 4:35 pm GMT Dec 30,2009 | Reply

  4. Shall we understand that, given the nature of the 10gr2 binary tree sorting, it is not possible to respond in advance to the Quiz unless an exact measure of the disorganization of the table is also provided for sorting load is now related to clustering factor.

    Comment by Bernard Polarski — December 30, 2009 @ 8:32 am GMT Dec 30,2009 | Reply

    • Bernard,

      I wouldn’t have used the expression “clustering factor” in this context (data can be well clustered without being well-ordered) but you are essentially correct.

      For this type of “short sort”, the work done and the memory used do depend on the arrival order of the data.

      (Footnote: My memory from the days when I did computer science is that every sort algorithm was subject to some boundary cases where the data order was “bad for” the algorithm – basically the order of input always affected the amount of work needed and the worst case was always very bad.)

      Comment by Jonathan Lewis — December 30, 2009 @ 4:47 pm GMT Dec 30,2009 | Reply

      • I’m sorry for commenting on an old topic, I got there reading some of the latest stuff :)

        I believe, some algorithms are indeed very dependant on initial ordering of data, while others don’t care much. E.g. selection sort is always quadratic and merge sort is always NlogN, no matter what you through at it. While insertion sort will run in linear time on a sorted array, but quadratic on a randomly shuffled one.

        Something that caught my attention is that we still do less than NlogN comparisons for the reverted set. The knee jerk was the assumption that it’s another clever optimisation, but on the second thought I figure, it’s the nature of the algorithm: you start with an empty tree and while it is growing, the number of comparisons to add new node increases from lg0 to lg1048576, which in this case presumably converges to 1/4*NlgN. But that’s just thinking out loud, rather than a mathematical proof :)

        Comment by Viacheslav Andzhich — April 6, 2020 @ 1:09 pm BST Apr 6,2020 | Reply

        • Viacheslav,

          Thanks for the comment.

          I like the idea – the fact that the data is in exact reverse order means a completely predictable (logarithmic) pattern to the depth of tree.
          I’m now wondering, though, whether the 2009 results would be the same on 19c.

          Regards,
          Jonathan Lewis

          Comment by Jonathan Lewis — April 6, 2020 @ 4:01 pm BST Apr 6,2020

        • Hi Jonathan,

          I did a small test on my local 19.3.0.0 (or 19.0.0.0.0, I’m already lost in their versioning :)) with automatic workarea size policy. Results happened to be exactly the same, that you have got in 2009.
          To get a reversed or ordered dataset though, I had to create the table in an MSSM tablespace, since ASSM still shuffles things around quite a bit even when you get the rows from an explicitly ordered cursor.
          Here is the output:
          –Randomised

          ---- Sort Statistics ------------------------------
          Input records                             1048576
          Output records                            139
          Total number of comparisons performed     1049052
            Comparisons performed by in-memory sort 1049052
          Total amount of memory used (in KB)       2
          Uses version 1 sort
          ---- End of Sort Statistics -----------------------
          

          –Reversed

          ---- Sort Statistics ------------------------------
          Input records                             1048576
          Output records                            1048576
          Total number of comparisons performed     5242856
            Comparisons performed by in-memory sort 5242856
          Total amount of memory used (in KB)       2
          Uses version 1 sort
          ---- End of Sort Statistics -----------------------
          

          –Ordered

          ---- Sort Statistics ------------------------------
          Input records                             1048576
          Output records                            10
          Total number of comparisons performed     1048575
            Comparisons performed by in-memory sort 1048575
          Total amount of memory used (in KB)       2
          Uses version 1 sort
          ---- End of Sort Statistics -----------------------
          

          Table T1 was created with this script:

          drop table t1 purge;
          
          create table t1 
            (sortcode char(6),
            padding varchar2(500));
          
          insert into t1  
          select 
            dbms_random.string('U', 6),
            lpad('x', 500, '*')
          from dual
          connect by level &lt;= 1048576;
          
          commit;
          

          Then I used ordered CTAS to create T2 in a MSSM tablespace.

          PS: Out of interest I also tried to deliberately reverse the data in ASSM, but still we get way better picture than the worst case:
          –Partially reversed (ASSM)

          ---- Sort Statistics ------------------------------
          Input records                             1048576
          Output records                            19170
          Total number of comparisons performed     1125232
            Comparisons performed by in-memory sort 1125232
          Total amount of memory used (in KB)       2
          Uses version 1 sort
          ---- End of Sort Statistics -----------------------
          

          What leads me to idea, that ASSM is generally a very good thing :)

          Thank you,
          Viacheslav

          Comment by Viacheslav Andzhich — April 10, 2020 @ 12:46 pm BST Apr 10,2020

        • I’m not quite sure though why it still reports Total amount of memory used (in KB) 2 for the reversed set. That doesn’t seem right for a tree with 2^20 elements

          Comment by Viacheslav Andzhich — April 10, 2020 @ 12:54 pm BST Apr 10,2020

        • Viacheslav,

          Thanks for doing the tests.

          It’s possible that if you did a “create table as select” in ASSM you wouldn’t need to switch to MSSM to get a fully reversed set.

          I think the 2KB is probably a bug fix – the saved tree only has to hold 10 leaf entries, and 2KB should be enough for that. I’ve always thought that the original large memory for the reversed tree was a mistake similar to the effect that used to exist in 9i and allowed me to reach the limiting height of a B-tree index with just 24 rows in a table: https://web.archive.org/web/20200509033755/www.jlcomp.demon.co.uk/22_how_high.doc

          Regards
          Jonathan Lewis

          Comment by Jonathan Lewis — April 12, 2020 @ 11:23 am BST Apr 12,2020

        • Hi Jonathan,

          Thank you for the remarks. You’re absolutely right: CTAS with Order By gets data sorted in ASSM. I guess, I tried regular insert in ASSM, that always shuffled data for me, and then I switched both from insert to CTAS and from ASSM to MSSM, overlooking CTAS in ASSM. Another proof that one should change only one variable at a time )
          Also, thank you for explaining the memory issue. I totally forgot that for stopkey queries we try to keep no more than requested number of records in the tree, and took the 10g results for granted. I assume, 2kB is indeed more than enough to hold 10 nodes.

          Thank you,
          Viacheslav

          Comment by Viacheslav Andzhich — April 13, 2020 @ 1:38 pm BST Apr 13,2020

  5. […] 5-How many rows are sorted and how much memory used by your query? Jonathan Lewis-Short Sorts […]

    Pingback by Blogroll Report 25/12/2009 – 01/01/2010 « Coskan’s Approach to Oracle — January 8, 2010 @ 6:50 pm GMT Jan 8,2010 | Reply

  6. […] for a sort aggregate operation can be very small compared to the volume of data processed. (See this note for an example of how Oracle can handle one type of special case involving […]

    Pingback by Count(*) « Oracle Scratchpad — March 6, 2012 @ 4:58 pm GMT Mar 6,2012 | Reply

  7. […] this operation (like SORT ORDER BY STOPKEY) is able to limit the data volume sorted (See https://jonathanlewis.wordpress.com/2009/12/28/short-sorts/ ), but does the older, less efficient, “Version 1″ sort to achieve this. If you […]

    Pingback by 12c First N | Oracle Scratchpad — October 28, 2013 @ 12:46 pm GMT Oct 28,2013 | Reply

  8. […] can see that the algorithm used for sorting is “version 1 sort”, as explained in this jonathan lewis blog this algorithm is a “binary insertion tree” so oracle builds a binary tree as […]

    Pingback by Inserting million of time series data (rows) per second inside oracle database part2 « Oracle and beyond... — July 9, 2015 @ 9:51 am BST Jul 9,2015 | Reply

  9. […] statistics, pick up the actual execution plan with rowsource execution statistics, and enable the 10032 and 10033 (sort) traces. So let’s fold the query into a longer script, something […]

    Pingback by Quiz Night | Oracle Scratchpad — July 16, 2018 @ 4:43 pm BST Jul 16,2018 | Reply

  10. […] but if I have it wasn’t on this blog.  I do have a note on how the mechanism to handle “rownum <= N” with a preceding “order by” minimises its use of memory, and that note should give you […]

    Pingback by dense_rank | Oracle Scratchpad — March 12, 2020 @ 6:42 pm GMT Mar 12,2020 | Reply

  11. […] and sorted them. Fortunately the optimizer knows that it needs only the top 40 rows so it has been discarding rows as it sorts, hence the appearance of the “pushed rank” in the “window sort” at […]

    Pingback by Pagination cost | Oracle Scratchpad — July 21, 2022 @ 3:59 pm BST Jul 21,2022 | Reply


RSS feed for comments on this post. TrackBack URI

Comments and related questions are welcome.

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Website Powered by WordPress.com.