I mentioned the hidden parameter _smm_isort_cap recently in my solution to a problem with analytic functions applied to large volumes of data; but in that note I didn’t give you much detail about what it does.
The description given in x$ksppi is: “maximum work area for insertion sort(v1)” which refers to the mechanism used for all sort operations in earlier releases of Oracle. In 10gR2, however, Oracle introduced a new sorting mechanism (commonly called the V2 sort) which uses less memory, less CPU, and can reduce the chances of a sort operation spilling to disc. But the new sort mechanism doesn’t get used for all sort operations – so it’s worth checking in v$sql_workarea(_active) – and even the 10032 trace file – from time to time to see which operations have used the V1 sort and which have used the V2 sort. Here, for example is a simple query against a 10.2.0.3 system just after startup:
select operation_type, count(*) from v$sql_workarea group by operation_type order by operation_type ; OPERATION_TYPE COUNT(*) -------------------- ---------- BUFFER 19 CONNECT-BY (SORT) 1 GROUP BY (SORT) 18 HASH-JOIN 23 SORT (v1) 4 SORT (v2) 49
You’ll notice that we don’t have many V1 sorts reported – but you might wonder which version of sorting is used for that “connect by” and the “group by” operations.
The problem with the V1 sort is that the “sorting” mechanism works by building a balanced binary index as it reads the data. Although the implementation is made a little more complicated by the complexities of dynamic memory allocation the basic mechanism simply stacks your unsorted data at one end of the workarea memory while dynamically maintaining an index into that data at the other end of the workarea.
If the data set (plus the index) is too large to fit into memory then the code has to dump the data to disc in sorted order – which means walking the index, jumping around the data stack of data to get the items in the right order. That’s a lot of pointer following, and potentially a lot of CPU cycles.
There are two important details to consider:
-
The index is a balanced binary tree, which means every leaf node is the same distance from the root note (and leaf node split propagate upwards) and every node points to at most two child nodes; so if you have N items of data then, starting from the root node, it takes log2(n) steps to get to the leaf level – which is the level that points into the data stack. To give a concrete example, if you have about 1,000,000 items of data to sort you have to follow 20 pointers to get from the root to a leaf. By the time the index is that deep, it’s a lot of work to add just one more item.
A node consists of three pointers (one up, two down) which means 12 bytes to a node if you’re using 32 bit Oracle and 24 bytes if you’re using 64 bit Oracle. Think about what that means if (for example) you want to build an index on a date column. The data entry for the thing you sort will only be about 22 bytes in the layout used for sorting (7 bytes for the date, 6 for a rowid, and the rest in column and row overheads) so if you’re running 64-bit Oracle more than 50% of the memory you use will go into the binary tree. (See Footnote***).
So the insertion sort can be both memory intensive and CPU intensive, and the larger the number of items you have to sort the faster the rate of CPU usage goes up. If you’ve ever been told that the way to optimise your sorts is to get them to complete in memory, it’s not always true. If your bottleneck is CPU, rather than disc, it may be faster to allow a large sort (specifically a sort involving a large number of small items) to turn into a one-pass disc sort because the time you spend on the extra disc I/O may be less than the CPU time you spend trying to maintain a very tall binary tree.
If you’re running with automatic workarea_size_policy, you can have an enormous amount of memory made available for a sort or hash operation – but With an appreciation of how the V1 (insertion) sort works you can understand why Oracle might choose to put a fairly arbitrary limit on the amount of memory it’s allowed to use – hence the parameter _smm_isort_cap.
The strange thing that we saw in my analytic example, though, was that the sort was reported as a V2 sort – even in the 10032 trace file – but the _smm_isort_cap was still applied to the sort operation; and the only way I discovered this was using the trial and error (or empirical, if you want to be pretentious) approach of running the query several times adjusting the parameter and watching the view v$sql_workarea_active.
In other cases, though, the standard 10032 trace will tell you when a V1 sort is used, which gives you the option to dcide whether or not you can justify messing with hidden parameters (perhaps just at the session level). Here’s a query against a table of 1 million rows, the resulting execution plan in a 10g database, followed by the first few lines of the resulting 10032 trace:
alter session set events '10032 trace name context forever'; select n1, sum(n2) from t1 group by n1 order by n1 ; Execution Plan ---------------------------------------------------------- Plan hash value: 3946799371 --------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | --------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 968K| 24M| 1923 (5)| 00:00:24 | | 1 | SORT GROUP BY | | 968K| 24M| 1923 (5)| 00:00:24 | | 2 | TABLE ACCESS FULL| T1 | 968K| 24M| 1851 (2)| 00:00:23 | ---------------------------------------------------------------------------
You’ll notice, of course, that the optimizer has chosen to use the “sort group by” operation instead of the new 10gR2 “hash group by” operation. This has allowed it to avoid the “sort order by” operation that would normally be required to handle our “order by” clause.
---- Sort Statistics ------------------------------ Input records 1000000 Output records 999 Total number of comparisons performed 9154007 Comparisons performed by in-memory sort 9154007 Total amount of memory used 49152 Uses version 1 sort Does not use asynchronous IO ---- End of Sort Statistics -----------------------
Remember, the expense of the V1 sort comes from the binary tree, and you pay for it in memory and CPU. If the items you want to sort are small (compared to the 24 bytes per node of the binary tree) then the disc I/O you save by incresing the memory may not be worth the extra CPU that you use. In my case with the analytic functions the items were large – so the cost of the binary tree was relatively insignificant – and we were avoiding a bug that increased the disk I/O dramatically if the sort spilled to disc.
Footnote***
One of the myths of performance tuning in the early days of Oracle was that you should set the temporary extent size to be twice the sort_area_size because then if you had to spill the sort to disc you didn’t have to allocate two extents at once. The strategy wasn’t necessarily a bad thing, of course, but the justification was bizarre since the volume of data that goes to disc on any one spill is always less than, possibly much less than, the sort_area_size. Fortunately examples of this type of “spurious justifcation” are dying out as increasing numbers of people are now testing ideas like this and publishing their observations.
Most interesting.
Could you elaborate a bit on the passage : “the optimizer has chosen to use the “sort group by….” followed next lined by “This has allowed it to avoid the “sort group by”, this is very confusing.
Comment by B. Polarski — October 2, 2009 @ 6:19 am BST Oct 2,2009 |
My error – that should have read ‘… avoid the “sort order by”,…’
Thanks for highlighting it; now corrected.
Comment by Jonathan Lewis — October 2, 2009 @ 8:10 am BST Oct 2,2009 |
[…] Jonathan Lewis – _smm_isort_cap […]
Pingback by Blogroll Report 25/09/2009-02/09/2009 « Coskan’s Approach to Oracle — October 6, 2009 @ 7:35 pm BST Oct 6,2009 |