A couple of weeks ago I published a note about an execution plan which showed the details of a scalar subquery in the wrong place (as far as the typical strategies for interpreting execution plans are concerned). In a footnote to the article I commented that Andy Sayer had produced a simple reproducible example of the anomaly based around the key features of the query supplied in the original posting and had emailed it to me. With his permission (and with some minor modifications) I’ve reproduced it below:
rem rem Script: misplaced_subq_plan.sql rem Author: Andrew Sayer rem Dated: May 2019 rem drop table recursive_table; drop table lookup_t; drop table join_t; @@setup set linesize 180 set pagesize 60 create table recursive_table ( my_id number constraint rt_pk primary key, parent_id number, fk_col number ); insert into recursive_table select rownum, nullif(rownum-1,0) parent_id, mod(rownum,10) from dual connect by rownum <=100 ; prompt ================================================== prompt Note that fk_col will be zero for 1/10 of the rows prompt ================================================== create table lookup_t( pk_col number constraint lt_pk primary key, value varchar2(30 char) ) ; insert into lookup_t select rownum, rpad('x',30,'x') from dual connect by rownum <=100 ; create table join_t( pk_col number primary key, col_10 number, value varchar2(100 char) ); insert into join_t select rownum, mod(rownum,10), rpad('x',30,'x') from dual connect by rownum <=1000 --> comment to avoid WordPress format problem. ; execute dbms_stats.gather_table_stats(null,'recursive_table') execute dbms_stats.gather_table_stats(null,'lookup_t') execute dbms_stats.gather_table_stats(null,'join_t') prompt ================================ prompt note that pk_col will never be 0 prompt ================================ set serverout off alter session set statistics_level=all; var x number exec :x := 1 spool misplaced_subq_plan select /* distinct */ r.my_id, j.value, r.ssq_value from ( select my_id, parent_id, fk_col, (select l.value from lookup_t l where l.pk_col = r.parent_id) ssq_value from recursive_table r connect by prior my_id = parent_id start with my_id = :x ) r join join_t j on r.fk_col = j.pk_col / select * from table(dbms_xplan.display_cursor(format=>'allstats projection')); set serveroutput on spool off
The code generates, populates, and queries three tables:
- recursive_table is used in a “connect by” query to generate some data.
- lookup_t is used in a scalar subquery in the select list of the “connect by” query.
- join_t is then joined to the result of the “connect by” query to eliminate some rows.
The construction allows us to see a difference between the number of rows returned and the number of times the scalar subquery is executed, and makes it easy to detect an anomaly in the presentation of the execution plan. And here is the execution plan from an 18.3 instance:
-------------------------------------------------------------------------------------------------------------------------------------------------- | Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers | OMem | 1Mem | O/1/M | -------------------------------------------------------------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | | 90 |00:00:00.01 | 170 | | | | | 1 | TABLE ACCESS BY INDEX ROWID | LOOKUP_T | 100 | 1 | 99 |00:00:00.01 | 102 | | | | |* 2 | INDEX UNIQUE SCAN | LT_PK | 100 | 1 | 99 |00:00:00.01 | 3 | | | | |* 3 | HASH JOIN | | 1 | 2 | 90 |00:00:00.01 | 170 | 1123K| 1123K| 1/0/0| | 4 | VIEW | | 1 | 2 | 100 |00:00:00.01 | 125 | | | | |* 5 | CONNECT BY NO FILTERING WITH START-WITH| | 1 | | 100 |00:00:00.01 | 23 | 9216 | 9216 | 2/0/0| | 6 | TABLE ACCESS FULL | RECURSIVE_TABLE | 1 | 197 | 100 |00:00:00.01 | 23 | | | | | 7 | TABLE ACCESS FULL | JOIN_T | 1 | 1 | 1000 |00:00:00.01 | 45 | | | | -------------------------------------------------------------------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 2 - access("L"."PK_COL"=:B1) 3 - access("R"."FK_COL"="J"."PK_COL") 5 - access("PARENT_ID"=PRIOR NULL) filter("MY_ID"=:X) Column Projection Information (identified by operation id): ----------------------------------------------------------- 1 - "L"."VALUE"[VARCHAR2,120] 2 - "L".ROWID[ROWID,10] 3 - (#keys=1) "R"."MY_ID"[NUMBER,22], "J"."VALUE"[VARCHAR2,400], "R"."SSQ_VALUE"[VARCHAR2,120], "J"."VALUE"[VARCHAR2,400] 4 - "R"."MY_ID"[NUMBER,22], "R"."FK_COL"[NUMBER,22], "R"."SSQ_VALUE"[VARCHAR2,120] 5 - "PARENT_ID"[NUMBER,22], "MY_ID"[NUMBER,22], "FK_COL"[NUMBER,22], "R"."PARENT_ID"[NUMBER,22], PRIOR NULL[22], LEVEL[4] 6 - "MY_ID"[NUMBER,22], "PARENT_ID"[NUMBER,22], "FK_COL"[NUMBER,22] 7 - "J"."PK_COL"[NUMBER,22], "J"."VALUE"[VARCHAR2,400] Note ----- - dynamic statistics used: dynamic sampling (level=2) - this is an adaptive plan
In a typical execution plan with scalar subqueries in the select list, the sub-plans for the scalar subqueries appear in the plan before the main query – and in this plan you can see the scalar subquery here at operations 1 and 2.
But the scalar subquery is in the select list of a non-mergeable view (operations 4, 5, 6). We can see that this view generates 100 rows (A-rows of operation 4) and the scalar subquery starts 100 times (Starts of operation 1) – so we can infer that the subquery ran for each row generated by the view.
The problem, though, is that the result set from the view is joined to another table, eliminating some rows and reducing the size of the result set; so if we don’t look carefully at all the details of the plan we appear to have a driving query that produces a result set of 90 rows (at operation 3), but manages to execute the scalar subquery just above it in the plan more times than there are rows in the result set.
It’s easy to unpick what’s really happening in this very simple query with a very short plan – but much harder to do so in the original case where the scalar subquery appeared “outside” the hash join when it actually executed inside a complex subplan that generated the second input (proble table) for the hash join.
As a further little note – if you look at the Column Projection Information you’ll see that operation 4 is where Oracle first projects ‘r.ssq_value[varchar2,120]’ which is the column created by the execution of the sub-plan at operation 1.
Arguably the execution plan should have look more like:
Plan hash value: 2557600799 -------------------------------------------------------------------------------------------------------------------------------------------------- | Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers | OMem | 1Mem | O/1/M | -------------------------------------------------------------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | | 90 |00:00:00.01 | 170 | | | | |* 1 | HASH JOIN | | 1 | 2 | 90 |00:00:00.01 | 170 | 1123K| 1123K| 1/0/0| | 2 | VIEW | | 1 | 2 | 100 |00:00:00.01 | 125 | | | | | 3 | TABLE ACCESS BY INDEX ROWID | LOOKUP_T | 100 | 1 | 99 |00:00:00.01 | 102 | | | | |* 4 | INDEX UNIQUE SCAN | LT_PK | 100 | 1 | 99 |00:00:00.01 | 3 | | | | |* 5 | CONNECT BY NO FILTERING WITH START-WITH| | 1 | | 100 |00:00:00.01 | 23 | 9216 | 9216 | 2/0/0| | 6 | TABLE ACCESS FULL | RECURSIVE_TABLE | 1 | 100 | 100 |00:00:00.01 | 23 | | | | | 7 | TABLE ACCESS FULL | JOIN_T | 1 | 1 | 1000 |00:00:00.01 | 45 | | | | --------------------------------------------------------------------------------------------------------------------------------------------------
Inevitably, there are cases where the sub-plan for a scalar subquery appears much closer to its point of operation rather than being moved to the top of the execution plan. So any time you have scalar subqueries in select lists inside in-line views keep a careful lookout for where they appear and how many times they run in the execution plan. And don’t forget that giving every query block a name will help you track down your migrating subqueries.
Footnote
If you’re wondering why the Column Projection Information reports s.ssq_value as varchar2(120) when I’ve declared the column as varchar2(30), my declaration is 30 CHAR, and the database (by default) is running with a multi-byte character set that allows a maximum of 4 bytes per character.
Update (22nd May 2019)
Following the comment from Iudith Mentzel below about clever optimisations, primary keys, and related inferences I thought it worth pointing out that it is possible to modify the demonstration query to get the same plan (shape) with different Start counts. We note that instead of putting the scalar subquery inside the inline view we would get the same result if we passed the parent_id to the outer query block and ran the scalar subquery there:
select /* distinct */ r.my_id, j.value, (select l.value from lookup_t l where l.pk_col = r.parent_id) ssq_value from ( select my_id, parent_id, fk_col from recursive_table r connect by prior my_id = parent_id start with my_id = :x ) r join join_t j on r.fk_col = j.pk_col /
This gives us the following execution plan (with rowsource execution statistics):
Plan hash value: 2557600799 -------------------------------------------------------------------------------------------------------------------------------------------------- | Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers | OMem | 1Mem | O/1/M | -------------------------------------------------------------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | | 90 |00:00:00.03 | 29 | | | | | 1 | TABLE ACCESS BY INDEX ROWID | LOOKUP_T | 90 | 1 | 89 |00:00:00.01 | 97 | | | | |* 2 | INDEX UNIQUE SCAN | LT_PK | 90 | 1 | 89 |00:00:00.01 | 8 | | | | |* 3 | HASH JOIN | | 1 | 2 | 90 |00:00:00.03 | 29 | 1695K| 1695K| 1/0/0| | 4 | VIEW | | 1 | 2 | 100 |00:00:00.01 | 7 | | | | |* 5 | CONNECT BY NO FILTERING WITH START-WITH| | 1 | | 100 |00:00:00.01 | 7 | 6144 | 6144 | 2/0/0| | 6 | TABLE ACCESS FULL | RECURSIVE_TABLE | 1 | 100 | 100 |00:00:00.01 | 7 | | | | | 7 | TABLE ACCESS FULL | JOIN_T | 1 | 1 | 1000 |00:00:00.01 | 22 | | | | --------------------------------------------------------------------------------------------------------------------------------------------------
Note that the plan hash values are the same even though (mechanically) the real order of activity is dramatically different. But now we can see that the scalar subquery (operations 1 and 2) starts 90 times – once for each row returned by the hash join at operation 3, and we have done slightly fewer buffer visits (97 compared to 102) for that part of the plan.
It’s a pity, though, that when you start poking at a plan and looking too closely there are always new buggy bits to see. With the scalar subquery now at its optimal position (and maybe it will eventually get there without a manual rewrite) the arithmetic of “summing up the plan” has gone wrong for (at least) the Buffers column. In the new plan the 97 buffer visits attributed to operation 1 (and its descendents) should have been added to the 29 buffer visits attributed to the hash join (and its descendents) at operation 3 to get a total of 126; instead the 97 have just disappeared from the query total.
By comparison, and reading the operations in the original plan a suitable order, we see the view at operation 4 reporting 109 buffers which comes from 7 for its “obvious” descendents plus the 102 from operation 1 that actually happen inside the view. Then the hash join at operation 3 reports 131 buffers which is the 109 from the view plus the 22 from the tablescan at operation 7, and that 131 buffers is the final figure for the query.
So, for this particular example, it doesn’t matter what you do, the execution plan and its stats try to confuse you.
Update (September 2020)
I’ve just surprised myself by discovering an article I wrote in 2012 in answer to a question about an “odd” execution plan that I’ve known (and forgotten) about this slipping scalar subquery plan for many years.
Update (July 2021)
I didn’t make a note of which version of Oracle I tested this on originally, but I’ve just re-run the test on 19.11.0.0 and nothing has changed. The sub-plan for the subquery is still in “the wrong place”.
Hello Jonathan,
In this case, since the hash join is on the PK of table JOIN_T, Oracle can infer that the row set resulting from the hash join cannot be larger
than the row set from the “R” inline view, so it could have been a “clever” optimization from Oracle’s side to “move” the execution of the scalar query
outwards, for possibly executing it a lower number of times (here 90 instead of 100), and, of course, to show this accordingly in the execution plan.
But, probably the optimizer is not (yet) “trained” to move scalar queries around as part of the cost based query transformations,
based on cardinality estimations.
I remember from real life having to use a NO_MERGE hint to achieve exactly the opposite, that is, to prevent merging an inline view with a scalar query, which was further joined to another row source that was going to increase the number of rows in the result,
so that to ensure that the number of executions of the scalar query stays as low as possible.
In the case of this example, and maybe also in the original case, it is probably still the best to consider manually placing the scalar query
in such a place as to minimize the number of its executions.
I also remember that in older versions of Oracle ( maybe even up to 8i ) the scalar queries did not appear at all in the execution plans.
So, anyway, there is some progress in having them show up in the plan.
But, in spite of all, there is still place for using the “good developer intuition” for producing most efficient code in each real-life case.
Thanks a lot & Best Regards,
Iudith Mentzel
Haifa, Israel
Comment by Iudith Mentzel — May 21, 2019 @ 6:43 pm BST May 21,2019 |
[…] notes last night, and he created, and emailed to me, a very simple example (which I’ve now published at this URL) to reproduce this behaviour which I’ve also tested on […]
Pingback by Execution Plan Puzzle | Oracle Scratchpad — March 11, 2020 @ 8:34 am GMT Mar 11,2020 |