Oracle Scratchpad

July 21, 2022

Pagination cost

Filed under: CBO,Execution plans,Oracle,Problem Solving — Jonathan Lewis @ 3:56 pm BST Jul 21,2022

There’s a thread on the MOSC database tuning forum (needs an account) at the moment asking why a “fetch first N” query to fetch next 41 rows with an offset of 8602 rows takes longer to run than the same query when the offset is zero rows. Here’s a possible answer with a little lesson in thinking about what’s going on.

Apart from gremlins in the system there are two possible reasons

  • nothing has changed, but it takes longer to fetch 8643 rows in order and discard 8602 of them than it takes to fetch 41 rows in order and discard none
  • the optimizer has worked out that if it has to fetch 8643 rows then it ought to use a different plan but (as often happens) it was a bad idea to change the plan.

Here’s a little script to build some demo data.

rem
rem     Script:         fetch_first_offset.sql
rem     Author:         Jonathan Lewis
rem     Dated:          July 2022
rem
rem     Last tested 
rem             21.3.0.0
rem             19.11.0.0
rem

create table t1 
as
select 
        * 
from 
        all_objects
where   rownum <= 50000
order by 
        dbms_random.value
/

create index t1_i1 on t1(object_name);

alter session set statistics_level = all;
set serveroutput off
set feedback off

column owner format a20
column object_type format a12
column object_name format a32

All I’ve done is create a table of 50,000 rows with an order by clause that maximises the randomness of the data pattern so that the index on object_name will have a very high clustering_factor.

Here’s the first query I’m going to run, followed by the execution plan, pulled from memory with rowsource execution stats enabled. I’ve queried for the first 20 rows (offset 0 next 20) ordered by object_name:

select
        owner, object_type, object_name
from
        t1
order by
        object_name
offset 
        0 rows
fetch next 
        20 rows only
/

select * from table(dbms_xplan.display_cursor(format=>'+cost allstats last'));


SQL_ID  fmdb8vuxwkp99, child number 0
-------------------------------------
select  owner, object_type, object_name from  t1 order by  object_name
offset  0 rows fetch next  20 rows only

Plan hash value: 3254925009

-----------------------------------------------------------------------------------------------------------------------
| Id  | Operation                     | Name  | Starts | E-Rows | Cost (%CPU)| A-Rows |   A-Time   | Buffers | Reads  |
-----------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT              |       |      1 |        |    23 (100)|     20 |00:00:00.01 |      25 |      2 |
|*  1 |  VIEW                         |       |      1 |     20 |    23   (0)|     20 |00:00:00.01 |      25 |      2 |
|*  2 |   WINDOW NOSORT STOPKEY       |       |      1 |     20 |    23   (0)|     20 |00:00:00.01 |      25 |      2 |
|   3 |    TABLE ACCESS BY INDEX ROWID| T1    |      1 |  50000 |    23   (0)|     20 |00:00:00.01 |      25 |      2 |
|   4 |     INDEX FULL SCAN           | T1_I1 |      1 |     20 |     3   (0)|     20 |00:00:00.01 |       5 |      2 |
-----------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter(("from$_subquery$_002"."rowlimit_$$_rownumber"<=20 AND
              "from$_subquery$_002"."rowlimit_$$_rownumber">0))
   2 - filter(ROW_NUMBER() OVER ( ORDER BY "OBJECT_NAME")<=20)

As you can see from the execution plan Oracle has used an index full scan (because that will access the data in exactly the right order and allows the “nosort stopkey” on the “window (no)sort” operation). It has fetched (A-Rows) 20 rows and reported a cost of 23 – which basically corresponds to 3 block visits for the index and one block visit for each row from the table. In passing you’ll notice from the Predicate Information at operation 2 that Oracle has transformed our “fetch first” into an analytic query using row_number() over(). The phrase “syntactic sugar” seems appropriate.

How do things change if we ask for the 2nd 20 rows – (offset 20, next 20). I’ll show only the output from dbms_xplan, including its slightly mangled SQL statement but dropping the Predicate Information:

select  owner, object_type, object_name from  t1 order by  object_name
offset  20 rows fetch next  20 rows only

Plan hash value: 3254925009

--------------------------------------------------------------------------------------------------------------
| Id  | Operation                     | Name  | Starts | E-Rows | Cost (%CPU)| A-Rows |   A-Time   | Buffers |
--------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT              |       |      1 |        |    43 (100)|     20 |00:00:00.01 |      45 |
|*  1 |  VIEW                         |       |      1 |     40 |    43   (0)|     20 |00:00:00.01 |      45 |
|*  2 |   WINDOW NOSORT STOPKEY       |       |      1 |     40 |    43   (0)|     40 |00:00:00.01 |      45 |
|   3 |    TABLE ACCESS BY INDEX ROWID| T1    |      1 |  49999 |    43   (0)|     40 |00:00:00.01 |      45 |
|   4 |     INDEX FULL SCAN           | T1_I1 |      1 |     40 |     3   (0)|     40 |00:00:00.01 |       5 |
--------------------------------------------------------------------------------------------------------------

As you can see, the optimizer has still decided to use the index full scan, and this time has fetched 40 rows and passed them up the plan until at operation 1 it discards the first 20 rows. The cost (43) is again related to 3 blocks for the index, 40 blocks for 40 rows from the (randomly distributed) table.

What would we see if we added a /*+ full(t1) */ hint to the query to force a tablescan to get the 2nd 20 rows?

select /*+ full(t1) */  owner, object_type, object_name from  t1 order
by  object_name offset  20 rows fetch next  20 rows only

Plan hash value: 2433988517

-----------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                | Name | Starts | E-Rows | Cost (%CPU)| A-Rows |   A-Time   | Buffers |  OMem |  1Mem | Used-Mem |
-----------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT         |      |      1 |        |   947 (100)|     20 |00:00:00.01 |     996 |       |       |          |
|*  1 |  VIEW                    |      |      1 |     40 |   947   (1)|     20 |00:00:00.01 |     996 |       |       |          |
|*  2 |   WINDOW SORT PUSHED RANK|      |      1 |  50000 |   947   (1)|     40 |00:00:00.01 |     996 | 15360 | 15360 |14336  (0)|
|   3 |    TABLE ACCESS FULL     | T1   |      1 |  50000 |   278   (1)|  50000 |00:00:00.01 |     996 |       |       |          |
-----------------------------------------------------------------------------------------------------------------------------------

Oracle has obeyed the hint, and the tablescan has fetched all 50,000 rows from the table and sorted them. Fortunately the optimizer knows that it needs only the top 40 rows so it has been discarding rows as it sorts, hence the appearance of the “pushed rank” in the “window sort” at operation 2; we haven’t had to create a sorted list of all 50,000 rows before picking the top 40. Again, once we’ve got the top 40 we discard the top 20 to allow for the offset.

We note that the cost of the tablescan was 278 but the cost of the sort was really rather large, taking the total cost of this path to 947. So here’s a thought experiment – what’s likely to happen if we ask for an offset of 940 and next 20? Given the costs we’ve seen for the indexed access path the optimizer will calculate a cost of 3 (plus a bit, maybe) for the index and a cost of 960 for visiting the table giving a total cost of about 963 – which should make the tablescan strategy the lower cost.

select  owner, object_type, object_name from  t1 order by  object_name
offset  940 rows fetch next  20 rows only

Plan hash value: 2433988517

-----------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                | Name | Starts | E-Rows | Cost (%CPU)| A-Rows |   A-Time   | Buffers |  OMem |  1Mem | Used-Mem |
-----------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT         |      |      1 |        |   947 (100)|     20 |00:00:00.02 |     996 |       |       |          |
|*  1 |  VIEW                    |      |      1 |    960 |   947   (1)|     20 |00:00:00.02 |     996 |       |       |          |
|*  2 |   WINDOW SORT PUSHED RANK|      |      1 |  50000 |   947   (1)|    960 |00:00:00.02 |     996 |   267K|   267K|  237K (0)|
|   3 |    TABLE ACCESS FULL     | T1   |      1 |  50000 |   278   (1)|  50000 |00:00:00.01 |     996 |       |       |          |
-----------------------------------------------------------------------------------------------------------------------------------

My guesswork about the cost seems to have been nearly right. Unhinted, with an offset of 940 (which you can see as the 960 rows fetched) the optimizer has decided that the tablescan path has a lower cost than the indexed access.

Of course we ought to check this by hinting the indexed access path and seeing what its cost is:

select  /*+ index(t1) */  owner, object_type, object_name from  t1
order by  object_name offset  940 rows fetch next  20 rows only

Plan hash value: 3254925009

-----------------------------------------------------------------------------------------------------------------------
| Id  | Operation                     | Name  | Starts | E-Rows | Cost (%CPU)| A-Rows |   A-Time   | Buffers | Reads  |
-----------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT              |       |      1 |        |   969 (100)|     20 |00:00:00.01 |     968 |      4 |
|*  1 |  VIEW                         |       |      1 |    960 |   969   (1)|     20 |00:00:00.01 |     968 |      4 |
|*  2 |   WINDOW NOSORT STOPKEY       |       |      1 |    960 |   969   (1)|    960 |00:00:00.01 |     968 |      4 |
|   3 |    TABLE ACCESS BY INDEX ROWID| T1    |      1 |    960 |   969   (1)|    960 |00:00:00.01 |     968 |      4 |
|   4 |     INDEX FULL SCAN           | T1_I1 |      1 |  50000 |     9   (0)|    960 |00:00:00.01 |       9 |      4 |
-----------------------------------------------------------------------------------------------------------------------

The cost of the indexed access path is 969 – that’s 960 for the randomly scattered table rows we need plus 9 for the index scan (because at 960 index entries we’re going to visit a few more index leaf blocks than the original 3).

Summing Up

I’ve demonstrated with a simple query using “offset N rows fetch M rows only” that the optimizer will calculate the cost of fetching “N + M” rows using whatever paths are available, then pick the lowest cost path.

As you might expect, the presence of a suitable index will encourage the optimizer to walk the index in order jumping randomly around the table to avoid the cost of acquiring all the relevant data and sorting it. So for “small” values of “offset + next” Oracle might choose an indexed access path with “window nosort stopkey”, but for “large” values of “offset + next” it might choose to do a full tablescan with “window sort pushed rank”.

The consequence of this is that – in the same way we see the optimizer switching between nested loops and hash joins at the wrong moment – we may see the optimizer switch from an indexed access path to a full tablescan either too soon, or too late.

Answering the question

Why did the query with an offset of 8602 take so much longer than the query with an offset of zero when the next was only 41 rows?

It may be that the optimizer stuck with an indexed access path and had to do physical reads of 8,643 blocks when it should have switched to a tablescan.

It may be that the optimizer switched to a tablescan and sort when it should have stuck with using an index on well-clustered, well-cached, data.

As so often happens, the first step to answering an SQL performance question is to look at the actual execution plans.

6 Comments »

  1. […] Pagination Costs (July 2022): select with offset N next M can change plans as N and/or M vary. […]

    Pingback by Execution Plans Catalogue | Oracle Scratchpad — July 21, 2022 @ 4:10 pm BST Jul 21,2022 | Reply

  2. […] Pagination Costs (July 2022): select with offset N next M can change plans as N and/or M vary. […]

    Pingback by Troubleshooting catalogue | Oracle Scratchpad — July 21, 2022 @ 4:11 pm BST Jul 21,2022 | Reply

  3. Hi Jonathan
    Thanks for the interesting demonstration. I have two questions:
    – Why is STARTS for ’TABLE ACCESS BY INDEX ROWID’ 1? Shouldn’t it be 20 as we access 20 rows?
    – Why are the costs for ’offset 940’ 969. I would have expected 29 as we don’t have to access the first 940 rows.
    Best regards, Michael

    Comment by Michael von Rosenberg — August 4, 2022 @ 1:25 pm BST Aug 4,2022 | Reply

    • Michael,

      Thanks for the comment.

      Starts = 1: think of it as “how many times does a program call a subroutine. The Window Nosort calls a subroutine just once to ask it to create a list of rows, the Table Access calls a subroutine just once to create a list of rowids.

      Costs: I think this is a case where Oracle developers have a generic calculation path (possibly with some variations and special cases) but don’t (yet) cover every possible exception that can appear for every different plan for the same query. So they have a calculation for “fetch 960, discard 940” but don’t allow for “counting through the first 940 can be very cheap because we can stay in the index and don’t need to visit the table, only the last 20 will be expensive”. It’s the sort of thing where one day an upgrade will change an execution path because the code has been changed to do a better calculation (and maybe there will be a fix_control to disable the change)

      Regards
      Jonathan Lewis

      Comment by Jonathan Lewis — August 4, 2022 @ 3:50 pm BST Aug 4,2022 | Reply

      • Thanks Jonathan. And also a summaric Thank you for all your articles that I have read in the recent years.

        Comment by Michael von Rosenberg — August 8, 2022 @ 8:39 am BST Aug 8,2022 | Reply

        • Michael,

          Thanks for the comment.
          It’s nice to know that you’ve found it worth reading them.

          Regards
          Jonathan Lewis

          Comment by Jonathan Lewis — August 9, 2022 @ 2:46 pm BST Aug 9,2022


RSS feed for comments on this post. TrackBack URI

Comments and related questions are welcome.

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Website Powered by WordPress.com.

%d bloggers like this: