There’s a thread on the MOSC database tuning forum (needs an account) at the moment asking why a “fetch first N” query to fetch next 41 rows with an offset of 8602 rows takes longer to run than the same query when the offset is zero rows. Here’s a possible answer with a little lesson in thinking about what’s going on.
Apart from gremlins in the system there are two possible reasons
- nothing has changed, but it takes longer to fetch 8643 rows in order and discard 8602 of them than it takes to fetch 41 rows in order and discard none
- the optimizer has worked out that if it has to fetch 8643 rows then it ought to use a different plan but (as often happens) it was a bad idea to change the plan.
Here’s a little script to build some demo data.
rem
rem Script: fetch_first_offset.sql
rem Author: Jonathan Lewis
rem Dated: July 2022
rem
rem Last tested
rem 21.3.0.0
rem 19.11.0.0
rem
create table t1
as
select
*
from
all_objects
where rownum <= 50000
order by
dbms_random.value
/
create index t1_i1 on t1(object_name);
alter session set statistics_level = all;
set serveroutput off
set feedback off
column owner format a20
column object_type format a12
column object_name format a32
All I’ve done is create a table of 50,000 rows with an order by clause that maximises the randomness of the data pattern so that the index on object_name will have a very high clustering_factor.
Here’s the first query I’m going to run, followed by the execution plan, pulled from memory with rowsource execution stats enabled. I’ve queried for the first 20 rows (offset 0 next 20) ordered by object_name:
select
owner, object_type, object_name
from
t1
order by
object_name
offset
0 rows
fetch next
20 rows only
/
select * from table(dbms_xplan.display_cursor(format=>'+cost allstats last'));
SQL_ID fmdb8vuxwkp99, child number 0
-------------------------------------
select owner, object_type, object_name from t1 order by object_name
offset 0 rows fetch next 20 rows only
Plan hash value: 3254925009
-----------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | Cost (%CPU)| A-Rows | A-Time | Buffers | Reads |
-----------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 23 (100)| 20 |00:00:00.01 | 25 | 2 |
|* 1 | VIEW | | 1 | 20 | 23 (0)| 20 |00:00:00.01 | 25 | 2 |
|* 2 | WINDOW NOSORT STOPKEY | | 1 | 20 | 23 (0)| 20 |00:00:00.01 | 25 | 2 |
| 3 | TABLE ACCESS BY INDEX ROWID| T1 | 1 | 50000 | 23 (0)| 20 |00:00:00.01 | 25 | 2 |
| 4 | INDEX FULL SCAN | T1_I1 | 1 | 20 | 3 (0)| 20 |00:00:00.01 | 5 | 2 |
-----------------------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter(("from$_subquery$_002"."rowlimit_$$_rownumber"<=20 AND
"from$_subquery$_002"."rowlimit_$$_rownumber">0))
2 - filter(ROW_NUMBER() OVER ( ORDER BY "OBJECT_NAME")<=20)
As you can see from the execution plan Oracle has used an index full scan (because that will access the data in exactly the right order and allows the “nosort stopkey” on the “window (no)sort” operation). It has fetched (A-Rows) 20 rows and reported a cost of 23 – which basically corresponds to 3 block visits for the index and one block visit for each row from the table. In passing you’ll notice from the Predicate Information at operation 2 that Oracle has transformed our “fetch first” into an analytic query using row_number() over(). The phrase “syntactic sugar” seems appropriate.
How do things change if we ask for the 2nd 20 rows – (offset 20, next 20). I’ll show only the output from dbms_xplan, including its slightly mangled SQL statement but dropping the Predicate Information:
select owner, object_type, object_name from t1 order by object_name
offset 20 rows fetch next 20 rows only
Plan hash value: 3254925009
--------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | Cost (%CPU)| A-Rows | A-Time | Buffers |
--------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 43 (100)| 20 |00:00:00.01 | 45 |
|* 1 | VIEW | | 1 | 40 | 43 (0)| 20 |00:00:00.01 | 45 |
|* 2 | WINDOW NOSORT STOPKEY | | 1 | 40 | 43 (0)| 40 |00:00:00.01 | 45 |
| 3 | TABLE ACCESS BY INDEX ROWID| T1 | 1 | 49999 | 43 (0)| 40 |00:00:00.01 | 45 |
| 4 | INDEX FULL SCAN | T1_I1 | 1 | 40 | 3 (0)| 40 |00:00:00.01 | 5 |
--------------------------------------------------------------------------------------------------------------
As you can see, the optimizer has still decided to use the index full scan, and this time has fetched 40 rows and passed them up the plan until at operation 1 it discards the first 20 rows. The cost (43) is again related to 3 blocks for the index, 40 blocks for 40 rows from the (randomly distributed) table.
What would we see if we added a /*+ full(t1) */ hint to the query to force a tablescan to get the 2nd 20 rows?
select /*+ full(t1) */ owner, object_type, object_name from t1 order
by object_name offset 20 rows fetch next 20 rows only
Plan hash value: 2433988517
-----------------------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | Cost (%CPU)| A-Rows | A-Time | Buffers | OMem | 1Mem | Used-Mem |
-----------------------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 947 (100)| 20 |00:00:00.01 | 996 | | | |
|* 1 | VIEW | | 1 | 40 | 947 (1)| 20 |00:00:00.01 | 996 | | | |
|* 2 | WINDOW SORT PUSHED RANK| | 1 | 50000 | 947 (1)| 40 |00:00:00.01 | 996 | 15360 | 15360 |14336 (0)|
| 3 | TABLE ACCESS FULL | T1 | 1 | 50000 | 278 (1)| 50000 |00:00:00.01 | 996 | | | |
-----------------------------------------------------------------------------------------------------------------------------------
Oracle has obeyed the hint, and the tablescan has fetched all 50,000 rows from the table and sorted them. Fortunately the optimizer knows that it needs only the top 40 rows so it has been discarding rows as it sorts, hence the appearance of the “pushed rank” in the “window sort” at operation 2; we haven’t had to create a sorted list of all 50,000 rows before picking the top 40. Again, once we’ve got the top 40 we discard the top 20 to allow for the offset.
We note that the cost of the tablescan was 278 but the cost of the sort was really rather large, taking the total cost of this path to 947. So here’s a thought experiment – what’s likely to happen if we ask for an offset of 940 and next 20? Given the costs we’ve seen for the indexed access path the optimizer will calculate a cost of 3 (plus a bit, maybe) for the index and a cost of 960 for visiting the table giving a total cost of about 963 – which should make the tablescan strategy the lower cost.
select owner, object_type, object_name from t1 order by object_name
offset 940 rows fetch next 20 rows only
Plan hash value: 2433988517
-----------------------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | Cost (%CPU)| A-Rows | A-Time | Buffers | OMem | 1Mem | Used-Mem |
-----------------------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 947 (100)| 20 |00:00:00.02 | 996 | | | |
|* 1 | VIEW | | 1 | 960 | 947 (1)| 20 |00:00:00.02 | 996 | | | |
|* 2 | WINDOW SORT PUSHED RANK| | 1 | 50000 | 947 (1)| 960 |00:00:00.02 | 996 | 267K| 267K| 237K (0)|
| 3 | TABLE ACCESS FULL | T1 | 1 | 50000 | 278 (1)| 50000 |00:00:00.01 | 996 | | | |
-----------------------------------------------------------------------------------------------------------------------------------
My guesswork about the cost seems to have been nearly right. Unhinted, with an offset of 940 (which you can see as the 960 rows fetched) the optimizer has decided that the tablescan path has a lower cost than the indexed access.
Of course we ought to check this by hinting the indexed access path and seeing what its cost is:
select /*+ index(t1) */ owner, object_type, object_name from t1
order by object_name offset 940 rows fetch next 20 rows only
Plan hash value: 3254925009
-----------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | Cost (%CPU)| A-Rows | A-Time | Buffers | Reads |
-----------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 969 (100)| 20 |00:00:00.01 | 968 | 4 |
|* 1 | VIEW | | 1 | 960 | 969 (1)| 20 |00:00:00.01 | 968 | 4 |
|* 2 | WINDOW NOSORT STOPKEY | | 1 | 960 | 969 (1)| 960 |00:00:00.01 | 968 | 4 |
| 3 | TABLE ACCESS BY INDEX ROWID| T1 | 1 | 960 | 969 (1)| 960 |00:00:00.01 | 968 | 4 |
| 4 | INDEX FULL SCAN | T1_I1 | 1 | 50000 | 9 (0)| 960 |00:00:00.01 | 9 | 4 |
-----------------------------------------------------------------------------------------------------------------------
The cost of the indexed access path is 969 – that’s 960 for the randomly scattered table rows we need plus 9 for the index scan (because at 960 index entries we’re going to visit a few more index leaf blocks than the original 3).
Summing Up
I’ve demonstrated with a simple query using “offset N rows fetch M rows only” that the optimizer will calculate the cost of fetching “N + M” rows using whatever paths are available, then pick the lowest cost path.
As you might expect, the presence of a suitable index will encourage the optimizer to walk the index in order jumping randomly around the table to avoid the cost of acquiring all the relevant data and sorting it. So for “small” values of “offset + next” Oracle might choose an indexed access path with “window nosort stopkey”, but for “large” values of “offset + next” it might choose to do a full tablescan with “window sort pushed rank”.
The consequence of this is that – in the same way we see the optimizer switching between nested loops and hash joins at the wrong moment – we may see the optimizer switch from an indexed access path to a full tablescan either too soon, or too late.
Answering the question
Why did the query with an offset of 8602 take so much longer than the query with an offset of zero when the next was only 41 rows?
It may be that the optimizer stuck with an indexed access path and had to do physical reads of 8,643 blocks when it should have switched to a tablescan.
It may be that the optimizer switched to a tablescan and sort when it should have stuck with using an index on well-clustered, well-cached, data.
As so often happens, the first step to answering an SQL performance question is to look at the actual execution plans.
[…] Pagination Costs (July 2022): select with offset N next M can change plans as N and/or M vary. […]
Pingback by Execution Plans Catalogue | Oracle Scratchpad — July 21, 2022 @ 4:10 pm BST Jul 21,2022 |
[…] Pagination Costs (July 2022): select with offset N next M can change plans as N and/or M vary. […]
Pingback by Troubleshooting catalogue | Oracle Scratchpad — July 21, 2022 @ 4:11 pm BST Jul 21,2022 |
Hi Jonathan
Thanks for the interesting demonstration. I have two questions:
– Why is STARTS for ’TABLE ACCESS BY INDEX ROWID’ 1? Shouldn’t it be 20 as we access 20 rows?
– Why are the costs for ’offset 940’ 969. I would have expected 29 as we don’t have to access the first 940 rows.
Best regards, Michael
Comment by Michael von Rosenberg — August 4, 2022 @ 1:25 pm BST Aug 4,2022 |
Michael,
Thanks for the comment.
Starts = 1: think of it as “how many times does a program call a subroutine. The Window Nosort calls a subroutine just once to ask it to create a list of rows, the Table Access calls a subroutine just once to create a list of rowids.
Costs: I think this is a case where Oracle developers have a generic calculation path (possibly with some variations and special cases) but don’t (yet) cover every possible exception that can appear for every different plan for the same query. So they have a calculation for “fetch 960, discard 940” but don’t allow for “counting through the first 940 can be very cheap because we can stay in the index and don’t need to visit the table, only the last 20 will be expensive”. It’s the sort of thing where one day an upgrade will change an execution path because the code has been changed to do a better calculation (and maybe there will be a fix_control to disable the change)
Regards
Jonathan Lewis
Comment by Jonathan Lewis — August 4, 2022 @ 3:50 pm BST Aug 4,2022 |
Thanks Jonathan. And also a summaric Thank you for all your articles that I have read in the recent years.
Comment by Michael von Rosenberg — August 8, 2022 @ 8:39 am BST Aug 8,2022 |
Michael,
Thanks for the comment.
It’s nice to know that you’ve found it worth reading them.
Regards
Jonathan Lewis
Comment by Jonathan Lewis — August 9, 2022 @ 2:46 pm BST Aug 9,2022