Oracle Scratchpad

July 14, 2018

Quiz Night

Filed under: Execution plans,Oracle,Performance,sorting — Jonathan Lewis @ 7:07 pm BST Jul 14,2018

Because it’s been a long time since the last quiz night.  Here’s a question prompted by a recent thread on the ODevCom database forum – how many rows will Oracle sorts (assuming you have enough rows to start with in all_objects) for the final query, and how many sort operations will that take ?

drop table t1 purge;

create table t1 nologging as select * from all_objects where rownum < 50000;

select owner, count(distinct object_type), count(distinct object_name) from t1 group by owner;

Try to resist the temptation of doing a cut-n-paste and running the code until after you’ve thought about the answer.

And the answer is:

It was nice to see a few ideas being volunteered in response to this question; I think that getting a diverse set of comments makes a nice point about how it’s always worth spending a little time to think along the lines of: “If I do X how might Oracle handle it”. Having the ideas before trying to check the effects can make it a lot easier to understand what’s happening, and sometimes how to take advantage of what Oracle does do to improve the way you design a query.

The first point to make, as Michael D O’Shea  pointed out in comment #2, is that computer systems don’t usually “sort data” – they tend to create pointers to data and shuffle the pointers in some way. In Oracle’s case “sorting” used to mean inserting pointers into a balanced binary tree, and aggregating used to be a case of accumulating values at the leaf nodes of the insertion tree. In 10g Oracle then introduced a new sorting algorithm that often works more efficiently than the binary insertion tree. I’m still going to refer to the binary tree method as “sorting”, though.

Looking at the query, we can see that there is no “order by” clause so it’s possible that Oracle will do whatever it does using hash aggregation throughout and no sorting, but that leaves open the question of how a hash table on owner can also record a distinct count of both object_type and object_name because every single owner hash bucket would have to link to its own hash tables for object_type and object_name and do a sort of “recursive hash aggregation” which starts to sound a little complicated. Maybe the alternative suggested by Kaley in comment #1 is closer to the truth – maybe Oracle just “buckets” all the data by owner and then sorts within each owner twice to do the count distincts, but then we’re still going to be hanging on to a lot of data, doing a two-level open-ended process.

Having waved hands for a little bit to try and head in the direction of possible solutions we need to look for clues that tell us whether we ought to eliminate or refine some of our guesses. There are several bits of information we could look at and running the query (although I asked you not to) is the next step we have to take. But when we run the query we want to see the session statistics, pick up the actual execution plan with rowsource execution statistics, and enable the 10032 and 10033 (sort) traces. So let’s fold the query into a longer script, something like:

set linesize 255
set trimspool on
set pagesize 60

set serveroutput off
alter session set statistics_level = all;

execute snap_my_stats.start_snap

alter session set events '10032 trace name context forever';
alter session set events '10033 trace name context forever';

select owner ... etc.

select * from table(dbms_xplan.display_cursor(null,null,'allstats last'));

alter session set events '10033 trace name context off';
alter session set events '10032 trace name context off';

execute snap_my_stats.end_snap

Of course, to avoid blurring around the edges, we may have to isolate the three different tests – the query against dbms_xplan.display_cursor(), for example, is obviously going to have some impact on the session stats – and we may then want to run each test twice in succession so that any warm-up or parsing activities don’t confuse the issue. It would also be a good idea to run the tests after creating a new session in case there are some distracting side effects from creating the data set. But with these details addressed, here are a few results:

First the execution plan (I got these results from, all recent versions of Oracle behave similarly):

| Id  | Operation          | Name | Starts | E-Rows | A-Rows |   A-Time   | Buffers |  OMem |  1Mem | Used-Mem |
|   0 | SELECT STATEMENT   |      |      1 |        |     16 |00:00:00.34 |    1018 |       |       |          |
|   1 |  SORT GROUP BY     |      |      1 |     16 |     16 |00:00:00.34 |    1018 |  5014K|  1445K| 4456K (0)|
|   2 |   TABLE ACCESS FULL| T1   |      1 |  50000 |  50000 |00:00:00.07 |    1018 |       |       |          |

Oracle uses a SORT GROUP BY, not a HASH GROUP BY, and the indications are that we used about 4.4MB of memory to sort 50,000 rows. Then there’s the session statistics:

Name                                                                         Value

session uga memory max                                                   3,255,648
session pga memory max                                                   3,211,264

table scan rows gotten                                                      50,000

sorts (memory)                                                                   1
sorts (rows)                                                               150,000

This says we did only one sort operation, and sorted 150,000 rows using an excess over starting pga/uga of 3.2MB (rather than the 4.4MB suggested by the plan); possibly the variation in apparent memory usage could be explained by the way that Oracle allocates reasonably large chunks to grow the PGA when you grow a workarea, but since I’m taking deltas there’s some scope for being misled by the change in maximum pga/uga memory.

Finally the 10032 trace file shows the following (we didn’t spill to disc, so the 10033 trace wasn’t triggered):

---- Sort Parameters ------------------------------
sort_area_size                    4562944
sort_area_retained_size           4562944
sort_multiblock_read_count        7
max intermediate merge width      38

---- Sort Statistics ------------------------------
Input records                             150000
Output records                            49907
Total number of comparisons performed     1945863
  Comparisons performed by in-memory sort 1945863
Total amount of memory used               4562944
Uses version 1 sort
---- End of Sort Statistics -----------------------

These figures are ones we have to trust – it seems we really did d0 one sort operation of 150,000 rows, and we did allocate 4.5MB of memory. There’s an obvious guess for the 150,000 input rows – Oracle has turned every row into three rows, the original row, one to count the object_type, and one to count the object_name, and with that in mind maybe we will be able to make sense of getting an output of 49,907 rows using 4.5M of memory. Let’s create a query that could produce the right numbers, though with a differently arranged output:

select distinct owner, 0, null        from t1
union all
select distinct owner, 1, object_type from t1
union all
select distinct owner, 2, object_name from t1
order by 1, 2, 3

For my data set this query produced 49,907 rows of output (which is a number we wanted to see) and here are the first 9 rows of output – followed by the first row of output from the original query:

OWNER                    0 NULL
--------------- ---------- ---------------------
APPQOSSYS                0

APPQOSSYS                1 SYNONYM
APPQOSSYS                1 TABLE

APPQOSSYS                2 DBMS_WLM

--------------- -------------------------- --------------------------
APPQOSSYS                                2                          6

Spot the pattern ? All I have to do after this “union all with simple sort” is walk the result set in order accumulating distinct counts as I do so.

But what about the memory requirements ? Check back to the original 10032 trace file, it reported 4562944 bytes as the total memory needed, but it also reported using a “Version 1” sort – so before I ran my query I set “_newsort_enabled”=false, to get the following 10032 trace details:

---- Sort Statistics ------------------------------
Input records                             49907
Output records                            49907
Total number of comparisons performed     730667
  Comparisons performed by in-memory sort 730667
Total amount of memory used               4562944
Uses version 1 sort
---- End of Sort Statistics -----------------------

The memory used is exactly the number I wanted to see. (The new version 2 sort got exactly the same result using 3.5MB of memory, so I don’t know why it’s not used at this point – but maybe that’s because the implementation isn’t quite what I think.)

So, conclusion (to date, until a reader shows me that my answer is incomplete – or wrong): As the “SORT GROUP BY” operation accepts each row from the “TABLE ACCESS FULL” operation it converts each row into N rows (one base row and one for each column for which there is a count(distinct)). The base row holds just the set of “group by” columns and is tagged with a zero, each subsequent row holds the “group by” columns, an tag value to identify which column it carries, and one of the “count(distinct)” columns. Oracle then operates the normal “group by” aggregation mechanism but is actually aggregating on the “group by” columns plus the “tag” column. So each leaf node in the binary tree ends up holding {owner_value, tag_value, count}, and once all the data has been aggregated into the binary tree Oracle can walk the tree and perform a pivot to turn three rows for each owner value into a single row.

If you start thinking about nasty scenarios you will realise that the upshot of this implementation (if my hypothesis is correct) is that if you have a query with a long “group by” list, and several columns where there are lots of distinct values for each combination of the “group by” list then the volume of the B-tree could actually be much larger than the volume of the original table, and the amount of memory and CPU needed to build that tree (before collapsing it) could be huge.


There is one special case with this count(distinct …) query. If you have only ONE distinct operation in the query Oracle can use the “distinct aggregation” transformation with “view merging” to produce a completely different plan.


  1. I would think: firstly, Oracle would “bucket” the rows with a “group by” by the OWNER. I’d guess this would happen with a hash aggregate, rather than a sort aggregate. So my guess is the “group by” doesn’t use a sort.

    Next, I would guess Oracle would iterate over each of the groups (buckets) and sort them twice… First, sort them by OBJECT_TYPE and do a count distinct, next sort them by OBJECT_NAME and do a count distinct.

    Soooo I would guess the total number of rows sorted would be 50,000 x 2 = 100,000, and the total number of sort operations would be equal to the number of distinct owners x 2.

    But I’m probably waaaay off. ¯\_(ツ)_/¯

    Comment by Kaley — July 14, 2018 @ 7:42 pm BST Jul 14,2018 | Reply

  2. > Try to resist the temptation of doing a cut-n-paste and


    > how many rows will Oracle sort … for the final query …how many sort operations will that take ?

    In terms of an data structure in an undergraduate computer science course, one pass of the table upserting into an in-memory tree updating a leaf/node count++ for a owner||object_type or owner||object_name and the number of formal sort operations required would be … wait for it …. 0

    How Oracle actually does it (and different in different Oracle version nos too I suspect), I do not know. I am waiting to be educated here, and of course receiving my exam grade.


    Comment by /* Michael D O'Shea */ (@MichaelDOShea) — July 14, 2018 @ 8:24 pm BST Jul 14,2018 | Reply

  3. I remember you had posted similar question a while ago (I am resisting the temptation to search…) and the definitive answer then was to enable the trace that captures number of sorts.
    Without googling anything, I would say final query may not end up sorting anything as it can use the HASH GROUP BY mechanism (but I am not convinced that is correct…)

    Comment by Narendra — July 15, 2018 @ 9:50 am BST Jul 15,2018 | Reply

  4. I hope no sorts will happen as by default group by uses HASH GROUP BY process.

    Comment by Manju — July 15, 2018 @ 7:34 pm BST Jul 15,2018 | Reply

RSS feed for comments on this post. TrackBack URI

Comments and related questions are welcome.

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Powered by