A question arrived in my email a few days ago with the following observations on a statement that was supposed to query the data dictionary for some information about a specified composite partitioned table. The query was wrapped in a little PL/SQL, similar to the following:
declare v_src_part varchar2(30) := null; v_tab varchar2(30) := 'PT_COMPOSITE_1'; begin select /*+ qb_name(main) */ uts1.subpartition_name into v_src_part from user_tab_subpartitions uts1 where uts1.table_name = v_tab and uts1.last_analyzed is not null and uts1.num_rows = ( select /*+ qb_name(max_subq) */ max (uts2.num_rows) from user_tab_subpartitions uts2 where uts2.table_name = /* v_tab */ uts1.table_name ) and rownum = 1 ;
The requirement is simple: identify the subpartitions of a specific table that have the largest number of rows of any subpartition of the table – but report only the first match.
You’ll notice that the where clause of the subquery has a commented “v_tab” in it. This is the PL/SQL variable used in the outer query block to identify the target table, and it shouldn’t really make any difference if I use the PL/SQL variable in the subquery rather than using a correlating column. However, the question that came with this block of code was as follows:
All the partitions and subpartitions had their stats when running the test. On a first run using the correlated subquery the block reported oracle error ORA-01403: no data found. Changing the code to use the PL/SQL variable the block reported a specific subpartition as expected. A few hours later (after changing the code back to use the correlated subquery) the block reported the same subpartition. Have you ever seen anything like this? The Oracle version is 12.1.0.2.
Rule 1, of course, is to be a little sceptical when someone says “Honest, Guv, the stats are all okay”. But I’m going to assume that the statistcs on this table really were complete and that there was no “data-related” reason for this query to behave in such a surprising way.
The email is an invitation to consider two points.
- This looks like a bug: the two versions of the query are logically equivalent, they should return the same results if the underlying data had not changed. (In fact, I think the only “legal” way that the query could return ORA-01403 is if there were no stats on any subpartitions of the table in question – any ordinary usage of the dbms_stats package other than delete_table_stats() would have ensured that the query had to find something.) So, the first run of the correlated subquery produced no data while the modified query did get a result. That suggests a problem with some transformation in the 12.1.0.2 code to handle correlated aggregate subqueries.
- How could the second execution of the version with the correlated subquery produce a result a few hours later. Here are a couple of possibilities:
- Someone had gathered dictionary stats (i.e. on the tables used by the query, not on the subpartitioned table) in the “few hours” gap so the optimizer picked a different execution plan which bypassed the bug.
- (minor variation on previous) Someone had gather dictionary stats when the first execution plan was already in memory but the “auto_invalidate” option for cursor invalidation meant that the query didn’t get re-optimised for a few hours.
- Nothing changed, but the query had been flushed from the library cache and did need re-optimisation a few hours later. Since the version is 12.1.0.2 this means statistics feedback or automatic SQL directives could have had an impact – which means there may be dynamic sampling during optimisation – and a different set of random samples could have resulted in a different execution plan.
- Other …
The interesting bit
There is a generic feature about this question that is more interesting than the “what went wrong, how could I get different results”, and it’s in the choice you can make between using a correlation column and repeating a pl/sql variable (or literal value ).
The switch to using a pl/sql variable turns the subquery into a single-row, “standalone”, subquery – one that could be run without any reference to the outer query – and this imposes a dramatic change on what the optimizer can do. Sometimes that change will make a huge difference to the optimisation time and the run time.
As a correlated subquery the notional “first strategy” for the optimizer is:
“for each row in the outer query execute the inner query as a filter subquery passing in the correlation value“
If you take the “standalone” approach the optimizer will be looking for a plan that says (in effect):
“run the subquery once to generate a constant that you will need to execute the rest of the query”
Running the subquery once rather than once per row is likely to be a good idea – on the other hand Oracle can do “scalar subquery caching” so if the value of the correlation column is always the same the correlated subquery will actually run only once anyway.
More importantly, when the optimizer sees a correlated subquery it will consider unnesting it and then transforming it in various other ways; and it might take the optimizer a long time to work out what it can and can’t do, and the plan it finally does produce may be much slower than what it could have done if it had not unnested the subquery.
Some test results
So I ran 3 variations of the PL/SQL block on Oracle 19.3.0.0 with the CBO trace (10053) enabled and picked out a few highlights. The three tests in order were:
- Use the pl/sql variable so the subquery could run as a standalone query
- Use the correlating column to make the subquery a correlated subquery
- Use the correlating column, but add the hint /*+ no_unnest */ to the subquery.
The results were as follows – first the timing, then a critical measure that explains the timing:
- Case 1 – standalone subquery – total time 0.82 seconds
- Case 2 – correlated subquery – total time 5.76 seconds
- Case 3 – correlated subquery with no_unnest hint – total time 0.84 seconds
Where did all that extra time go – a lot of it went in optimisation. How many “Join Orders” were examined for each query
- Case 1 – standalone subquery – 90 join orders
- Case 2 – correlated subquery – 863 join orders
- Case 3 – correlated subquery with no_unnest hint – 90 join orders
If you’re wondering what the 773 extra join orders were about here’s a clue. I extracted all the lines from the case 2 trace file that started with “SU:” – those are the lines tagged for “Subquery Unnest” – using a call to grep -n “^SU:” {tracefile name} and this is the result:
2945:SU: Unnesting query blocks in query block SEL$071BB01A (#1) that are valid to unnest. 2947:SU: Considering subquery unnest on query block SEL$071BB01A (#1). 2948:SU: Checking validity of unnesting subquery SEL$4F5F2F29 (#2) 2949:SU: Passed validity checks, but requires costing. 2950:SU: Using search type: exhaustive 2951:SU: Starting iteration 1, state space = (2) : (1) 2952:SU: Unnesting subquery query block SEL$4F5F2F29 (#2)Subquery removal for query block SEL$4F5F2F29 (#2) 3089:SU: Costing transformed query. 66112:SU: Considering interleaved complex view merging 66113:SU: Unnesting subquery query block SEL$4F5F2F29 (#2)Subquery removal for query block SEL$4F5F2F29 (#2) 66366:SU: Costing transformed query. 129372:SU: Finished interleaved complex view merging 129373:SU: Considering interleaved distinct placement 129374:SU: Finished interleaved distinct placement 129375:SU: Considering interleaved join pred push down 129376:SU: Unnesting subquery query block SEL$4F5F2F29 (#2)Subquery removal for query block SEL$4F5F2F29 (#2) 251638:SU: Rejected interleaved query. 251640:SU: Finished interleaved join pred push down 251641:SU: Considering interleaved OR Expansion 251642:SU: Unnesting subquery query block SEL$4F5F2F29 (#2)Subquery removal for query block SEL$4F5F2F29 (#2) 251651:SU: Finished interleaved OR Expansion 251653:SU: Updated best state, Cost = 19.085153 251654:SU: Starting iteration 2, state space = (2) : (0) 251665:SU: Costing transformed query. 310395:SU: Not update best state, Cost = 20.083998 310396:SU: Will unnest subquery SEL$4F5F2F29 (#2)
The optimizer checks the validity of unnesting (generated) query block SEL$4F5F2F29 at line 2948 of the trace and decides, 308,000 lines later after an exhaustive examination of the possibilities, that it will unnest the subquery. Since this is a recent version of Oracle we take one simple extra step by checking for “TIMER” information, again using a “grep -n” call –
251639:TIMER: SU: Interleaved JPPD SEL$B73B51DC cpu: 1.263 sec elapsed: 1.263 sec 251652:TIMER: SU: iteration (#1) SEL$B73B51DC cpu: 2.607 sec elapsed: 2.607 sec 310577:TIMER: CBQT SU and CVM SEL$071BB01A cpu: 3.323 sec elapsed: 3.323 sec 433371:TIMER: Cost-Based Join Predicate Push-Down SEL$12B6FE6C cpu: 1.307 sec elapsed: 1.306 sec 433477:TIMER: Cost-Based Transformations (Overall) SEL$12B6FE6C cpu: 4.731 sec elapsed: 4.731 sec 496189:TIMER: SQL Optimization (Overall) SEL$12B6FE6C cpu: 5.306 sec elapsed: 5.306 sec
Of course most of the time spent in this particular example was a result of optimising (and writing the optimizer trace), but for my tiny example (table definition below) the final figures I’ll show are the buffer gets and CPU time reported by a basic 10046 trace file after optimisation with all the relevant data was cached:
- Case 1 – standalone subquery – 89 buffer gets / 0.00 seconds
- Case 2 – correlated subquery – 130 buffer gets / 0.53 seconds
- Case 3 – correlated subquery with no_unnest hint – 121 buffer gets / 0.08 CPU seconds
The sub-centisecond time is a little suspect, of course, but the others seem fairly trustworthy.
Conclusion
The title of this piece is “Most Recent” because the commonest requirement for a query of this shape is “find the most recent row matching the following predicates”, even though in this case the interpretation is “find me the row matching the largest value”.
The “standard” pattern for writing a “most recent” query is to use a correlated subquery – but it’s worth remembering that you may reduce optimisation time and run time by “copying down the constant” rather than using the correlation mechanism.
(There are alternative strategies to the subquery approach, of course, and the analytic max() – introduced in Oracle 8i – is gaining traction as one of the popular alternatives; in fact there are cases where the optimizer will “remove aggregate subquery” by using an analytic to eliminate an aggregate subquery query.)
Footnote 1
If you want to re-run my test on different platforms and versions of Oracle, here’s the code to generate the table. (Don’t be surprised if you don’t get completely consistent results – much of the optimization will depend on the size of all the relevant tables (tab$, tabcompart$, etc.) in the data dictionary, rather than on the actual definition of this partitioned table.
em rem Script: most_recent_3.sql rem Author: Jonathan Lewis rem Dated: Jun 2020 rem rem Last tested rem 19.3.0.0 rem create table pt_composite_1 ( id, grp, small_vc, padding ) nologging partition by range(id) subpartition by hash (grp) subpartitions 4 ( partition p2 values less than (400), partition p3 values less than (800), partition p4 values less than (1600), partition p5 values less than (3200) ) as select rownum id, trunc(rownum/50) grp, to_char(trunc(rownum/20)) small_vc, rpad('x',100) padding from all_objects where rownum <= 3000 -- > comment to avoid wordpress format issue ; execute dbms_stats.gather_table_stats(user,'pt_composite_1',granularity=>'ALL')
Footnote 2
For reference, here are the outputs I got from executing egrep -n -e”^SU:” -e”TIMER” against the other two CBO trace files.
First for the “standalone” form – note how line 3130 tells us that “there is no correlation”.
806:SU: Considering subquery unnesting in query block MISC$1 (#0) 2947:SU: Unnesting query blocks in query block SEL$071BB01A (#1) that are valid to unnest. 2952:SU: Considering subquery unnest on query block SEL$071BB01A (#1). 2953:SU: Checking validity of unnesting subquery SEL$4F5F2F29 (#2) 2954:SU: SU bypassed: No correlation to immediate outer subquery. 2955:SU: SU bypassed: Failed basic validity checks. 2956:SU: Validity checks failed. 3130:SU: SU bypassed: No correlation to immediate outer subquery.
Then for the correlated subquery with /*+ no_unnest */ hint; and line 3122 tells us that SU was bypassed because of a hint/parameter:
809:SU: Considering subquery unnesting in query block MISC$1 (#0) 2945:SU: Unnesting query blocks in query block SEL$071BB01A (#1) that are valid to unnest. 2947:SU: Considering subquery unnest on query block SEL$071BB01A (#1). 2948:SU: Checking validity of unnesting subquery SEL$4F5F2F29 (#2) 2949:SU: SU bypassed: Not enabled by hint/parameter. 2950:SU: SU bypassed: Failed basic validity checks. 2951:SU: Validity checks failed. 3122:SU: SU bypassed: Not enabled by hint/parameter.
Neither file showed any “TIMER” information since that appears, by default, only for steps that take longer than one second. (If you want to adjust the granularity, see Franck Pachot’s note on parse time that describes bug/fix_control 16923858.
Comments and related questions are welcome.