Oracle Scratchpad

October 1, 2009


Filed under: Infrastructure,Performance,sorting — Jonathan Lewis @ 6:01 pm BST Oct 1,2009

I mentioned the hidden parameter _smm_isort_cap recently in my solution to a problem with analytic functions applied to large volumes of data; but in that note I didn’t give you much detail about what it does.

The description given in x$ksppi is: “maximum work area for insertion sort(v1)” which refers to the mechanism used for all sort operations in earlier releases of Oracle. In 10gR2, however, Oracle introduced a new sorting mechanism (commonly called the V2 sort) which uses less memory, less CPU, and can reduce the chances of a sort operation spilling to disc. But the new sort mechanism doesn’t get used for all sort operations – so it’s worth checking in v$sql_workarea(_active) – and even the 10032 trace file – from time to time to see which operations have used the V1 sort and which have used the V2 sort. Here, for example is a simple query against a system just after startup:

	operation_type, count(*)
group by
order by

-------------------- ----------
BUFFER                       19
CONNECT-BY (SORT)             1
GROUP BY (SORT)              18
HASH-JOIN                    23
SORT (v1)                     4
SORT (v2)                    49

You’ll notice that we don’t have many V1 sorts reported – but you might wonder which version of sorting is used for that “connect by” and the “group by” operations.

The problem with the V1 sort is that the “sorting” mechanism works by building a balanced binary index as it reads the data. Although the implementation is made a little more complicated by the complexities of dynamic memory allocation the basic mechanism simply stacks your unsorted data at one end of the workarea memory while dynamically maintaining an index into that data at the other end of the workarea.

If the data set (plus the index) is too large to fit into memory then the code has to dump the data to disc in sorted order – which means walking the index, jumping around the data stack of data to get the items in the right order. That’s a lot of pointer following, and potentially a lot of CPU cycles.

There are two important details to consider:

    The index is a balanced binary tree, which means every leaf node is the same distance from the root note (and leaf node split propagate upwards) and every node points to at most two child nodes; so if you have N items of data then, starting from the root node, it takes log2(n) steps to get to the leaf level – which is the level that points into the data stack. To give a concrete example, if you have about 1,000,000 items of data to sort you have to follow 20 pointers to get from the root to a leaf. By the time the index is that deep, it’s a lot of work to add just one more item.

    A node consists of three pointers (one up, two down) which means 12 bytes to a node if you’re using 32 bit Oracle and 24 bytes if you’re using 64 bit Oracle. Think about what that means if (for example) you want to build an index on a date column. The data entry for the thing you sort will only be about 22 bytes in the layout used for sorting (7 bytes for the date, 6 for a rowid, and the rest in column and row overheads) so if you’re running 64-bit Oracle more than 50% of the memory you use will go into the binary tree. (See Footnote***).

So the insertion sort can be both memory intensive and CPU intensive, and the larger the number of items you have to sort the faster the rate of CPU usage goes up. If you’ve ever been told that the way to optimise your sorts is to get them to complete in memory, it’s not always true. If your bottleneck is CPU, rather than disc, it may be faster to allow a large sort (specifically a sort involving a large number of small items) to turn into a one-pass disc sort because the time you spend on the extra disc I/O may be less than the CPU time you spend trying to maintain a very tall binary tree.

If you’re running with automatic workarea_size_policy, you can have an enormous amount of memory made available for a sort or hash operation – but With an appreciation of how the V1 (insertion) sort works you can understand why Oracle might choose to put a fairly arbitrary limit on the amount of memory it’s allowed to use – hence the parameter _smm_isort_cap.

The strange thing that we saw in my analytic example, though, was that the sort was reported as a V2 sort – even in the 10032 trace file – but the _smm_isort_cap was still applied to the sort operation; and the only way I discovered this was using the trial and error (or empirical, if you want to be pretentious) approach of running the query several times adjusting the parameter and watching the view v$sql_workarea_active.

In other cases, though, the standard 10032 trace will tell you when a V1 sort is used, which gives you the option to dcide whether or not you can justify messing with hidden parameters (perhaps just at the session level). Here’s a query against a table of 1 million rows, the resulting execution plan in a 10g database, followed by the first few lines of the resulting 10032 trace:

alter session set events '10032 trace name context forever';

	n1, sum(n2)
group by
order by

Execution Plan
Plan hash value: 3946799371

| Id  | Operation          | Name | Rows  | Bytes | Cost (%CPU)| Time     |
|   0 | SELECT STATEMENT   |      |   968K|    24M|  1923   (5)| 00:00:24 |
|   1 |  SORT GROUP BY     |      |   968K|    24M|  1923   (5)| 00:00:24 |
|   2 |   TABLE ACCESS FULL| T1   |   968K|    24M|  1851   (2)| 00:00:23 |

You’ll notice, of course, that the optimizer has chosen to use the “sort group by” operation instead of the new 10gR2 “hash group by” operation. This has allowed it to avoid the “sort order by” operation that would normally be required to handle our “order by” clause.

---- Sort Statistics ------------------------------
Input records                             1000000
Output records                            999
Total number of comparisons performed     9154007
  Comparisons performed by in-memory sort 9154007
Total amount of memory used               49152
Uses version 1 sort
Does not use asynchronous IO
---- End of Sort Statistics -----------------------

Remember, the expense of the V1 sort comes from the binary tree, and you pay for it in memory and CPU. If the items you want to sort are small (compared to the 24 bytes per node of the binary tree) then the disc I/O you save by incresing the memory may not be worth the extra CPU that you use. In my case with the analytic functions the items were large – so the cost of the binary tree was relatively insignificant – and we were avoiding a bug that increased the disk I/O dramatically if the sort spilled to disc.


One of the myths of performance tuning in the early days of Oracle was that you should set the temporary extent size to be twice the sort_area_size because then if you had to spill the sort to disc you didn’t have to allocate two extents at once. The strategy wasn’t necessarily a bad thing, of course, but the justification was bizarre since the volume of data that goes to disc on any one spill is always less than, possibly much less than, the sort_area_size. Fortunately examples of this type of “spurious justifcation” are dying out as increasing numbers of people are now testing ideas like this and publishing their observations.


  1. Most interesting.
    Could you elaborate a bit on the passage : “the optimizer has chosen to use the “sort group by….” followed next lined by “This has allowed it to avoid the “sort group by”, this is very confusing.

    Comment by B. Polarski — October 2, 2009 @ 6:19 am BST Oct 2,2009 | Reply

  2. […] Jonathan Lewis – _smm_isort_cap […]

    Pingback by Blogroll Report 25/09/2009-02/09/2009 « Coskan’s Approach to Oracle — October 6, 2009 @ 7:35 pm BST Oct 6,2009 | Reply

RSS feed for comments on this post. TrackBack URI

Comments and related questions are welcome.

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Powered by