Hi Jonathan,

Thank you for the remarks. You’re absolutely right: CTAS with Order By gets data sorted in ASSM. I guess, I tried regular insert in ASSM, that always shuffled data for me, and then I switched both from insert to CTAS and from ASSM to MSSM, overlooking CTAS in ASSM. Another proof that one should change only one variable at a time )

Also, thank you for explaining the memory issue. I totally forgot that for stopkey queries we try to keep no more than requested number of records in the tree, and took the 10g results for granted. I assume, 2kB is indeed more than enough to hold 10 nodes.

Thank you,

Viacheslav

Viacheslav,

Thanks for doing the tests.

It’s possible that if you did a “create table as select” in ASSM you wouldn’t need to switch to MSSM to get a fully reversed set.

I think the 2KB is probably a bug fix – the saved tree only has to hold 10 leaf entries, and 2KB should be enough for that. I’ve always thought that the original large memory for the reversed tree was a mistake similar to the effect that used to exist in 9i and allowed me to reach the limiting height of a B-tree index with just 24 rows in a table: https://web.archive.org/web/20200509033755/www.jlcomp.demon.co.uk/22_how_high.doc

Regards

Jonathan Lewis

I’m not quite sure though why it still reports Total amount of memory used (in KB) 2 for the reversed set. That doesn’t seem right for a tree with 2^20 elements

]]>Hi Jonathan,

I did a small test on my local 19.3.0.0 (or 19.0.0.0.0, I’m already lost in their versioning :)) with automatic workarea size policy. Results happened to be exactly the same, that you have got in 2009.

To get a reversed or ordered dataset though, I had to create the table in an MSSM tablespace, since ASSM still shuffles things around quite a bit even when you get the rows from an explicitly ordered cursor.

Here is the output:

–Randomised

---- Sort Statistics ------------------------------ Input records 1048576 Output records 139 Total number of comparisons performed 1049052 Comparisons performed by in-memory sort 1049052 Total amount of memory used (in KB) 2 Uses version 1 sort ---- End of Sort Statistics -----------------------

–Reversed

---- Sort Statistics ------------------------------ Input records 1048576 Output records 1048576 Total number of comparisons performed 5242856 Comparisons performed by in-memory sort 5242856 Total amount of memory used (in KB) 2 Uses version 1 sort ---- End of Sort Statistics -----------------------

–Ordered

---- Sort Statistics ------------------------------ Input records 1048576 Output records 10 Total number of comparisons performed 1048575 Comparisons performed by in-memory sort 1048575 Total amount of memory used (in KB) 2 Uses version 1 sort ---- End of Sort Statistics -----------------------

Table T1 was created with this script:

drop table t1 purge; create table t1 (sortcode char(6), padding varchar2(500)); insert into t1 select dbms_random.string('U', 6), lpad('x', 500, '*') from dual connect by level <= 1048576; commit;

Then I used ordered CTAS to create T2 in a MSSM tablespace.

PS: Out of interest I also tried to deliberately reverse the data in ASSM, but still we get way better picture than the worst case:

–Partially reversed (ASSM)

---- Sort Statistics ------------------------------ Input records 1048576 Output records 19170 Total number of comparisons performed 1125232 Comparisons performed by in-memory sort 1125232 Total amount of memory used (in KB) 2 Uses version 1 sort ---- End of Sort Statistics -----------------------

What leads me to idea, that ASSM is generally a very good thing :)

Thank you,

Viacheslav

Viacheslav,

Thanks for the comment.

I like the idea – the fact that the data is in exact reverse order means a completely predictable (logarithmic) pattern to the depth of tree.

I’m now wondering, though, whether the 2009 results would be the same on 19c.

Regards,

Jonathan Lewis

I’m sorry for commenting on an old topic, I got there reading some of the latest stuff :)

I believe, some algorithms are indeed very dependant on initial ordering of data, while others don’t care much. E.g. selection sort is always quadratic and merge sort is always NlogN, no matter what you through at it. While insertion sort will run in linear time on a sorted array, but quadratic on a randomly shuffled one.

Something that caught my attention is that we still do less than NlogN comparisons for the reverted set. The knee jerk was the assumption that it’s another clever optimisation, but on the second thought I figure, it’s the nature of the algorithm: you start with an empty tree and while it is growing, the number of comparisons to add new node increases from lg0 to lg1048576, which in this case presumably converges to 1/4*NlgN. But that’s just thinking out loud, rather than a mathematical proof :)

]]>