Oracle Scratchpad

November 9, 2020

I wish

Filed under: CBO,Execution plans,Oracle,sorting,Wishlist — Jonathan Lewis @ 12:01 pm GMT Nov 9,2020

Here’s a lovely little mechanism new to Postgres 13 that can minimise sorting costs: the “incremental sort”. It would be nice to see it in Oracle as well as it could make an enormous difference to “fetch first N” queries.

The concept is simple – if a rowsource moving up a plan is known to be in “partially sorted” order when it reaches a sort operation the optimizer can choose whether or not to sort the entire rowsource in one go or to sort it in batches as it arrives.

For example if you have a query which has “order by cola, colb” as its final clause and the plan has used an index range scan on an index on just (cola) then the “sort order by” operation can accept the rows for the first value of cola, sort them by colb and pass them on, then accept the rows for the second value of cola, sort them by colb and pass them on, and so on.

Better still, if you have an “order by table1.colA, table2,colB” and the optimizer used an indexed access path on table1.colA and a nested loop into table2, then the optimizer will still recognize that the generated data is already partially sorted, and sort batches for table1.colA to order them by table2.colB. (Oracle has a mechanism for dealing with sorted hash clusters that is roughly similar.)

Obviously the benefit is that you avoid doing a very large sort that might spill to disc; slightly less obviously is that you might avoid sorting the whole data set if you have a “fetch first N” query. Here’s an Oracle model setting up a demonstration of the principle:

rem
rem     Script:         fetch_first_postgres_2.sql
rem     Author:         Jonathan Lewis
rem     Dated:          Oct 2020
rem

create table t1 as
with generator(id) as (
        select 1 from dual
        union all
        select id + 1 from generator where id < 1e5
)
select
        ceil(id/10)             n1,
        mod(id,13)              n2,
        lpad(id,10)             small_vc,
        rpad('x',50,'x')        padding
from
        generator
/

alter table t1 modify n1 not null;

explain plan for
select  * 
from    t1
where   n1 between 25 and 30
order by
        n1, n2
;

select * from table(dbms_xplan.display);

create index t1_i1 on t1(n1);

explain plan for
select  * 
from    t1
where   n1 between 25 and 30
order by
        n1, n2
;

select * from table(dbms_xplan.display);

drop index t1_i1;
create index t1_i1 on t1(n1, n2);

explain plan for
select  * 
from    t1
where   n1 between 25 and 30
order by
        n1, n2
;

select * from table(dbms_xplan.display);

I’ve created a table with 100,000 rows where the value of n1 is repeated 10 times and well-clustered, while for each value of n1, the n2 column has 10 distinct values (not necessarily in order thanks to the mod(,13)). Using this data set I’ve executed the same query three times – selecting the rows for 6 consecutive values of n1, ordering by n1, n2.

The first test will have to do a tablescan, the second can use the index I’ve created on (n1) but will have to do a sort (after visiting the table), the third can walk the index I’ve created on (n1,n2) and complete without sorting.

Here, in order, are the three execution plans:

---------------------------------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |    70 |  4830 |   145   (9)| 00:00:01 |
|   1 |  SORT ORDER BY     |      |    70 |  4830 |   145   (9)| 00:00:01 |
|*  2 |   TABLE ACCESS FULL| T1   |    70 |  4830 |   144   (8)| 00:00:01 |
---------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - filter("N1"<=30 AND "N1">=25)


----------------------------------------------------------------------------------------------
| Id  | Operation                            | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                     |       |    70 |  4830 |     4  (25)| 00:00:01 |
|   1 |  SORT ORDER BY                       |       |    70 |  4830 |     4  (25)| 00:00:01 |
|   2 |   TABLE ACCESS BY INDEX ROWID BATCHED| T1    |    70 |  4830 |     3   (0)| 00:00:01 |
|*  3 |    INDEX RANGE SCAN                  | T1_I1 |    70 |       |     2   (0)| 00:00:01 |
----------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   3 - access("N1">=25 AND "N1"<=30)


-------------------------------------------------------------------------------------
| Id  | Operation                   | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |       |    70 |  4830 |     4   (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID| T1    |    70 |  4830 |     4   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN          | T1_I1 |    70 |       |     2   (0)| 00:00:01 |
-------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("N1">=25 AND "N1"<=30)

Having created the Oracle model I asked Phil Florent (who had prompted this note by emailing me with a question about why Postgres 13 was executing a “Fetch First” so much faster than Oracle 19 (the basic answer is at this URL) if he would run it under Postgres and send me the execution plans.

The code required two changes – the first to handle the change in dialect, the second to supply a change in scale because my model produced such a small output that Postgres didn’t bother to use the new feature. Here’s the modified SQL to generate the original test data:

create table t1 as
with recursive generator(id) as (
        select 1
        union all
        select id + 1 from generator where id < 1e5
)
select
        ceil(id/10)             n1,
        mod(id,13)              n2,
        lpad(id::text,10)       small_vc,
        rpad('x',50,'x')        padding
from
        generator;

And the three plans (so that you can compare the content and style of output with Oracle plans) that went with this small data set – first the no-index plan, then the plan for the (n1) index, then the plan with (n1,n2) indexed:

                                               QUERY PLAN
---------------------------------------------------------------------------------------------------------
Sort  (cost=2835.85..2836.00 rows=62 width=74) (actual time=19.534..19.600 rows=60 loops=1)
   Sort Key: n1, n2
   Sort Method: quicksort  Memory: 33kB
   ->  Seq Scan on t1  (cost=0.00..2834.00 rows=62 width=74) (actual time=0.067..19.417 rows=60 loops=1)
         Filter: ((n1 >= '25'::double precision) AND (n1 <= '30'::double precision))
         Rows Removed by Filter: 99940

Planning Time: 0.351 ms
Execution Time: 19.703 ms


                                                    QUERY PLAN
-------------------------------------------------------------------------------------------------------------------
Sort  (cost=11.50..11.66 rows=62 width=74) (actual time=0.224..0.289 rows=60 loops=1)
   Sort Key: n1, n2
   Sort Method: quicksort  Memory: 33kB
   ->  Index Scan using t1_i1 on t1  (cost=0.42..9.66 rows=62 width=74) (actual time=0.024..0.113 rows=60 loops=1)
         Index Cond: ((n1 >= '25'::double precision) AND (n1 <= '30'::double precision))

Planning Time: 0.665 ms
Execution Time: 0.391 ms



                                                 QUERY PLAN
---------------------------------------------------------------------------------------------------------------
Index Scan using t1_i1 on t1  (cost=0.42..114.66 rows=62 width=74) (actual time=0.022..0.155 rows=60 loops=1)
   Index Cond: ((n1 >= '25'::double precision) AND (n1 <= '30'::double precision))

Planning Time: 0.690 ms
Execution Time: 0.259 ms
 

As you can see, there’s very little difference between Oracle’s plans and Postgres’ plans in this example. (Though it’s rather nice to see what extra details appear in the Postgres plans, which were generated with the equivalent of Oracle’s “statistics_level = all” setting.

The middle plan shows us that Postgres didn’t use the “incremental sort” – but it’s useful to see it anyway so that we can compare the structure of the plan when we increase the volume of data – which the following script is for:

create table t2 as
with recursive generator(id) as (
        select 1
        union all
        select id + 1 from generator where id < 10000000
)
select
        ceil(id/10)             n1,
        mod(id,13)              n2,
        lpad(id::text,10)             small_vc,
        rpad('x',50,'x')        padding
from
        generator;
 
create index t2_i1 on t2(n1);

explain analyze
select  *
from    t2
where   n1 between 25000 and 30000
order by
        n1, n2
;

analyze t2;

Now we have 10M rows, still with 10 rows per value of n1, and our query requests 5,001 values of n1, so 50,010 rows in total. With that much data the optimizer decided to use the incremental sort rather than sorting the whole result set in one go:

                                                         QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------
Incremental Sort  (cost=0.49..4532.57 rows=52677 width=74) (actual time=0.160..164.125 rows=50010 loops=1)
   Sort Key: n1, n2
   Presorted Key: n1
   Full-sort Groups: 1251  Sort Method: quicksort  Average Memory: 30kB  Peak Memory: 30kB
   ->  Index Scan using t2_i1 on t2  (cost=0.43..2339.97 rows=52677 width=74) (actual time=0.055..61.663 rows=50010 loops=1)
         Index Cond: ((n1 >= '25000'::double precision) AND (n1 <= '30000'::double precision))
Planning Time: 0.197 ms
Execution Time: 204.490 ms

Note, particularly, the “Presorted Key” line leading to the “Sort Key” line. We can also check the plan without the incremental sort with a set command to disable the feature:

set enable_incremental_sort = off;

                                                         QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------
Sort  (cost=8812.64..8944.33 rows=52677 width=74) (actual time=181.487..227.735 rows=50010 loops=1)
   Sort Key: n1, n2
   Sort Method: external merge  Disk: 4128kB
   ->  Index Scan using t2_i1 on t2  (cost=0.43..2339.97 rows=52677 width=74) (actual time=0.070..74.975 rows=50010 loops=1)
         Index Cond: ((n1 >= '25000'::double precision) AND (n1 <= '30000'::double precision))
Planning Time: 0.239 ms
Execution Time: 266.325 ms

Comparing the times for these two queries there isn’t really much difference – but you will notice that the old approach has had to dump 4MB to disc while the incremental sorts gets the job done in 30Kb of memory, which may be an important difference in other circumstances.

More importantly than a tiny time difference in this trivial example is the fact that Phil Florent’s original question was:

“Why is this two-table join with ‘fetch first 5’ taking 28 seconds to complete in Oracle when it takes less than one millisecond to complete in Postgres?”

The answer was:

  • Oracle is doing a hash join that produces a 25M row result set, sorting it to do an analytic row_number() call to get the first 5 rows.
  • Postgres is doing a nested loop join, fetching the first row(s) from the first table in the right order by index range scan then fetching the matching rows from the second table – then doing an incremental sort on those rows; then fetching the next row(s) and repeating the process – stopping after it’s got through a total of 5 rows instead of generating and sorting 25M rows.

Here’s the plan (with obfuscated table names):

                                                    QUERY PLAN                                                    

-------------------------------------------------------------------------------------------------------------------

Limit  (cost=284.36..287.21 rows=5 width=10)
   ->  Incremental Sort  (cost=284.36..14261803.18 rows=25000000 width=10)
         Sort Key: table1.col1 DESC, table2.col2 DESC
         Presorted Key: table1.col1
         ->  Nested Loop  (cost=1.00..13294209.39 rows=25000000 width=10)
               ->  Index Scan using table1_col1_idx on table1  (cost=0.56..1300593.47 rows=25000000 width=8)
               ->  Index Scan using table2.pk_col on table2 (cost=0.43..0.48 rows=1 width=6)
                     Index Cond: (id = table1.id)

You’ll notice in this exanple that the incremental sort can take advantage of the optimizer’s knowledge of the index definitions whether the sort is ascending or descending.

You’ll also notice that Postgres has the same problem as Oracle when it comes to coping with Fetch First (or, in Oracle’s case, rownum <= N and optimizer_mode = first_rows_N). Even when it “knows” that a query is going to stop fetching data very early the plan still reports 25M rows as the expected volume of data.

Summary

Postgres 13 has a wonderful mechanism for optimising sorts that can make a huge difference to “first rows” queries and even basic “order by” clauses involving join query result sets.

Footnote

Browsing the Internet for documentation and comment on the incremental sort I found the following. Given my level of ignorance about Postgres I can’t comment on the completeness or correctness of the information, but it looked good to me, and I found its comments about this feature very informative.

4 Comments »

  1. Jonathan, thank you for this very detailed explanation. I need to do further tests to see how this feature interacts with recent evolutions of partitioning (I always like to mix 2 new features) but it is promising. Regard, Phil.

    Comment by Phil Florent — November 9, 2020 @ 7:02 pm GMT Nov 9,2020 | Reply

  2. While we’re wishing for things, one thing Oracle could do to speed up FIRST N ROWS after an ORDER BY is to switch to a heap sort instead of merge sort. A heap sort wouldn’t waste time totally sorting all the data–it would invest some time in partially sorting all of the data, but only that first number of rows would have time spent on them to complete the sort. Meanwhile the big-O is the same for heap and merge sorts, so you’re unlikely to be wasting CPU even for larger sorts–although I haven’t compared the constant factors between them recently.

    Comment by Jason Bucata — November 12, 2020 @ 3:51 am GMT Nov 12,2020 | Reply

    • Jason,

      Thanks for the comment – an interesting idea.

      The thing that crossed my mind on this is to wonder whether the heap sort could operate (safely) in a way to minimise memory use in the same way that the insertion sort does (as described in this note https://jonathanlewis.wordpress.com/2009/12/28/short-sorts/ for a “rownum < N" query. Notionally, of course, the optimizer path for "first N rows" ought to match the path for "rownum < N" – though the former also has to allow for the optonal "with duplicates" that turns N into N + x.

      Regards
      Jonathan Lewis

      Comment by Jonathan Lewis — November 16, 2020 @ 12:57 pm GMT Nov 16,2020 | Reply

  3. […] example of this while testing the code in the email I mentioned in last week’s posting on the “Incremental Sort” that Postgres 13 can do. Here’s a model to create some data and demonstrate the principle […]

    Pingback by 19c tweak | Oracle Scratchpad — December 23, 2020 @ 2:15 pm GMT Dec 23,2020 | Reply


RSS feed for comments on this post. TrackBack URI

Comments and related questions are welcome.

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Website Powered by WordPress.com.

%d bloggers like this: