An important target of trouble-shooting, particularly when addressing performance problems, is to minimise the time and effort you have to spend to get a “good enough” result. A recent question on the OTN database forum struck me as a good demonstration of following this strategy; the problem featured a correlated update that had to access a view 84 times to update a small table; but the view was a complex view (apparently non-mergeable) and the update took several hours to complete even though the view, when instantiated, held only 63 rows.
The OP told us that the query “select * from view” took seven minutes to return those 63 rows, and wanted to know if we could find a nice way to perform the update in (approximately) that seven minutes, rather than using the correlated update approach that seemed to take something in the ballpark of 7 minutes per row updated.
Of course the OP could have given us all the details of the view definition, all the table and index definitions, with stats etc. and asked us if we could make the update run faster – but that could lead to a long and frustrating period of experimentation and testing, and a solution that might increase the general maintenance costs of the system (because a subsequent modification to the view might then have to be echoed into the code that did the update). Setting a strictly limited target that clearly ought to be achievable is (if nothing else) a very good starting point for improving the current situation.
I don’t know (as at the time of writing) if the OP implemented the strategy I suggested, but from his description it looked as if it should have been simple to use subquery factoring with materialization to achieve the required result in the most elegant way possible (meaning, in this case, simple SQL and no change to any surrounding code).
The OP has responded to my suggestion with a comment that “it didn’t work”, but it appeared to me that they were looking at and mis-interpreting the output from a call to “Explain Plan” rather than testing the query and pulling the plan from memory – so I thought I’d build a simple model to demonstrate the principle and show you how you could confirm (beyond just checking the clock) that the strategy had worked.
We start with a table to update, a non-mergeable view, and two tables to make up the non-mergeable view:
rem rem Script: cte_update.sql rem Author: Jonathan Lewis rem Dated: Jan 2016 rem create table t1 as select trunc((rownum-1)/15) n1, trunc((rownum-1)/15) n2, rpad(rownum,180) v1 from dual connect by level <= 3000 -- > comment to avoid wordpress format issue ; create table t2 as select mod(rownum,200) n1, mod(rownum,200) n2, rpad(rownum,180) v1 from dual connect by level <= 3000 -- > comment to avoid wordpress format issue ; create index t1_i1 on t1(n1); create index t2_i1 on t2(n1); begin dbms_stats.gather_table_stats( user, 't1', method_opt => 'for all columns size 1' ); dbms_stats.gather_table_stats( user, 't2', method_opt => 'for all columns size 1' ); end; / create or replace view v1 as select distinct t1.n1 t1n1, t1.n2 t1n2, t2.n2 t2n2 from t1, t2 where t1.n1 = t2.n1 ; create table t3 as select * from v1 ; begin dbms_stats.gather_table_stats( user, 't3', method_opt => 'for all columns size 1' ); end; /
I’ve created the table t3 by copying the content of the view v1 and I’m going to update every row in t3 from v1; I gathered stats on t1 and t2 before creating the view and table simply to avoid the need for Oracle to do dynamic sampling as it created t3. Depending on your version of Oracle, of course, the stats collections might be redundant.
Having set the scene with the data, here’s the “original” code for doing the required update, followed by its execution plan (pulled from the memory of a 12.1.0.2 instance):
set serveroutput off set linesize 180 set trimspool on alter session set statistics_level = all; spool cte_update update t3 set t2n2 = ( select v1.t2n2 from v1 where v1.t1n1 = t3.t1n1 and v1.t1n2 = t3.t1n2 ) ; select * from table(dbms_xplan.display_cursor(null,null,'allstats last')); --------------------------------------------------------------------------------------------------------------------------------------- | Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers | OMem | 1Mem | Used-Mem | --------------------------------------------------------------------------------------------------------------------------------------- | 0 | UPDATE STATEMENT | | 1 | | 0 |00:00:01.22 | 46745 | | | | | 1 | UPDATE | T3 | 1 | | 0 |00:00:01.22 | 46745 | | | | | 2 | TABLE ACCESS FULL | T3 | 1 | 200 | 200 |00:00:00.01 | 3 | | | | | 3 | VIEW | V1 | 200 | 1 | 200 |00:00:01.22 | 46332 | | | | | 4 | SORT UNIQUE | | 200 | 1 | 200 |00:00:01.21 | 46332 | 2048 | 2048 | 2048 (0)| | 5 | NESTED LOOPS | | 200 | 1 | 45000 |00:00:01.11 | 46332 | | | | | 6 | NESTED LOOPS | | 200 | 1 | 45000 |00:00:00.34 | 1332 | | | | |* 7 | TABLE ACCESS BY INDEX ROWID BATCHED| T1 | 200 | 1 | 3000 |00:00:00.02 | 684 | | | | |* 8 | INDEX RANGE SCAN | T1_I1 | 200 | 15 | 3000 |00:00:00.01 | 408 | | | | |* 9 | INDEX RANGE SCAN | T2_I1 | 3000 | 1 | 45000 |00:00:00.11 | 648 | | | | | 10 | TABLE ACCESS BY INDEX ROWID | T2 | 45000 | 1 | 45000 |00:00:00.31 | 45000 | | | | --------------------------------------------------------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 7 - filter("T1"."N2"=:B1) 8 - access("T1"."N1"=:B1) 9 - access("T2"."N1"=:B1) filter("T1"."N1"="T2"."N1")
Points to note from this execution plan: the VIEW operation at line 3 has started 200 times (there are 200 rows in table t3, the subquery runs once per row); and a simple measure of work done is the 46,745 buffer visits (of which, I can tell you, roughly 400 are current block gets) reported under Buffers in the top line of the plan.
It’s an interesting detail that although Oracle has pushed the correlation predicates inside the view (as shown by the predicate section for operations 7,8 and 9) it doesn’t report the operation at line 3 as “VIEW PUSHED PREDICATE”. It would be nice to see the explicit announcement of predicate pushing here, but that seems to be an expression reserved for pushing join predicates into views – fortunately we always check the predicate section, don’t we!
Now let’s see what the SQL and plan look like if we want Oracle to create the entire v1 result set and use that to update the t3 table.
update t3 set t2n2 = ( with v0 as ( select /*+ materialize */ t1n1, t1n2, t2n2 from v1 ) select t2n2 from v0 where v0.t1n1 = t3.t1n1 and v0.t1n2 = t3.t1n2 ) ; ----------------------------------------------------------------------------------------------------------------------------------------------------------------- | Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers | Reads | Writes | OMem | 1Mem | Used-Mem | ----------------------------------------------------------------------------------------------------------------------------------------------------------------- | 0 | UPDATE STATEMENT | | 1 | | 0 |00:00:00.19 | 1185 | 1 | 1 | | | | | 1 | UPDATE | T3 | 1 | | 0 |00:00:00.19 | 1185 | 1 | 1 | | | | | 2 | TABLE ACCESS FULL | T3 | 1 | 200 | 200 |00:00:00.01 | 3 | 0 | 0 | | | | | 3 | TEMP TABLE TRANSFORMATION | | 200 | | 200 |00:00:00.18 | 778 | 1 | 1 | | | | | 4 | LOAD AS SELECT | | 1 | | 0 |00:00:00.01 | 171 | 0 | 1 | 1040K| 1040K| | | 5 | VIEW | V1 | 1 | 45000 | 200 |00:00:00.01 | 168 | 0 | 0 | | | | | 6 | HASH UNIQUE | | 1 | 45000 | 200 |00:00:00.01 | 168 | 0 | 0 | 1558K| 1558K| 3034K (0)| |* 7 | HASH JOIN | | 1 | 45000 | 45000 |00:00:00.01 | 168 | 0 | 0 | 1969K| 1969K| 1642K (0)| | 8 | TABLE ACCESS FULL | T1 | 1 | 3000 | 3000 |00:00:00.01 | 84 | 0 | 0 | | | | | 9 | TABLE ACCESS FULL | T2 | 1 | 3000 | 3000 |00:00:00.01 | 84 | 0 | 0 | | | | |* 10 | VIEW | | 200 | 45000 | 200 |00:00:00.17 | 603 | 1 | 0 | | | | | 11 | TABLE ACCESS FULL | SYS_TEMP_0FD9D6618_911FB4C | 200 | 45000 | 40000 |00:00:00.08 | 603 | 1 | 0 | | | | ----------------------------------------------------------------------------------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 7 - access("T1"."N1"="T2"."N1") 10 - filter(("V0"."T1N1"=:B1 AND "V0"."T1N2"=:B2))
The headline figure to note is that 1,185 Buffer visits – clearly we’ve done something very different (and possibly cheaper and faster, even in this tiny demonstration). Looking at operation 3 we see the “TEMP TABLE TRANSFORMATION”, which tells us that we’ve materialized our factored subquery. There is scope, though, for a little ambiguity and uncertainty – the Starts column for this operation says we started it 200 times, once for each row in t3. We might worry that we’ve actually recreated the result and written it to disc 200 times even though we might then notice that lines 4 – 9 tell us that we loaded the temporary table just once (Starts = 1).
You could take my word for it that we didn’t “do” the temp table transformation 200 time, we merely used the result of the temp table transformation 200 times; but I wasn’t prepared to make this assumption until I had done a little more checking, so there’s no reason why you shouldn’t still be a little suspicious. Lines 4 – 9 do seem to tell us (consistently) that we only load the data once, but there have been occasional bugs where counters have been reset to zero when they shouldn’t have been, so the fact that we see (for example, at operation 8) “1 full tablescan of t1 returning 3,000 rows after visiting 84 buffers” may mean that Oracle counted the work once and “forgot” to count it the other 199 times.
It’s easy enough to do a quick cross-check. Take a snapshot of v$mystat joined to v$statname before and after runnning the query, and check the difference in buffer visits, tablescans, and tablescan rows gotten – if those figures are broadly consistent with the figures in the execution plan I think we can be reasonably confident that the plan is telling us the truth.
Here’s what we get for a few key figures:
Name Value ---- ----- session logical reads 1,472 db block gets 412 consistent gets 1,060 consistent gets from cache 1,060 db block changes 410 table scans (short tables) 205 table scan rows gotten 46,213 table scan blocks gotten 366
There are a number of oddities – not to mention version and feature dependent variations – in the numbers and a couple of discrepancies introduced by the code I was using to take the snapshot, but the “table scan rows gotten” figure is particularly easy to see in the execution plan:
46,213 = 3000 (t1) + 3000 (t2) + 200 (t3) + 200 * 200 (temp table)
With a small error the number of “table scans (short tables)” is also consistent with the plan Starts – and that’s perhaps the most important indicator, we scan t1 and t2 just once, and the temp table result 200 times. If we were creating the temp table 200 times we’d have to have done over 400 table scans (200 each for t1 and t2).
I won’t go into the details of how to compare the session logical I/O to the total Buffer gets for the plan – but the figures are in the right ballpark as far as matching is concerned – if the plan was deceiving us about the number of times the temporary table was created (rather than used) the session stats would have to report a figure more like 33,600 (200 * (84 + 84)) consistent gets.
Conclusion
We have managed to reduce the workload from “one view instantiation per row” to “one view instantiation” with a very small change to the SQL. In the case of the OP this should result in a small, easily comprehensible, change in the SQL statement leading to a drop in run-time from several hours to seven minutes – and maybe that’s good enough for the present.
This seems to work even better with the documented /*+ RESULT_CACHE */ hint …
Less buffers and less time
Comment by Geert — January 8, 2016 @ 1:37 pm GMT Jan 8,2016 |
Geert,
Thanks for the suggestion. Using the result cache to get the view content created just once, instead of once per row, is a nice variation on the theme.
It raised a couple of questions in my mind about the requirement the OP had; particularly relating to the fact that it took seven minutes to generate the view result set. Don’t take that as a criticism, by the way, it’s just that good ideas always bring out the devil’s advocate (or maybe the inquisition) in me. Here’s what I’d want to check (and maybe it’s in the documentation, or has previously been written about by others).
Result caches become invalid if the underlying data changes: what happens if someone else updates the data in the seven minutes that the result cache query is running to populate the result cache ? Will the result cache immediately be invalidated before OUR query can use it ? There might be a race condition for someone somewhere if we tried this.
Comment by Jonathan Lewis — January 9, 2016 @ 12:05 pm GMT Jan 9,2016 |
I wonder if a hash join can be used somehow. Something that results in hashing the view’s content in memory once, and reusing it.
Comment by Christo Kutrovsky — January 8, 2016 @ 4:23 pm GMT Jan 8,2016 |
Christo,
I’d say your thoughts are moving towards the updateable join view / merge with update only strategy that András Gábor demonstrates in comment #3
Comment by Jonathan Lewis — January 9, 2016 @ 12:32 pm GMT Jan 9,2016 |
When I first saw in MS SQL Server the UPDATE .. FROM .. JOIN style I got pretty envious. The syntax allows using joins instead of per-row subqueries, thus allowing hashing, effecient aggregations, etc.
I wanted to do the same In Oracle and found 2 approaches to do something very close.
The basic idea is to calculate the new value(s) as new columns in a query along with the target column and keys of the rows. Queries can leverage joins, and such. Then update the query – view – or use it as a source to MERGE
Excuse any typos, I do not have access to a DB right now.
#1 Calculate the values as part of an inline view and update the view:
We may select other t3 columns, like PK or ROWID of t3.
Running the query part only before the update allows us to verify what modifications we are about to perform.
However this fails more often than not as the optimizer needs quite a lot of metadata to recognize that this view is updateable.
#2 We can work that around using a MERGE statement with UPDATE only
We need to join back the rows here either using key column(s) of the target table or a ROWID (as long as our update does not affect ROWIDs, such as partition keys are not updated, etc)
#3
For Jonathan’s simplified example an even more simple MERGE would do the trick, but that would not illustrate the more robust pattern
Sorry again for typos here, feel free to fix the examples…
Comment by András Gábor (Kojak) — January 8, 2016 @ 6:45 pm GMT Jan 8,2016 |
András,
Thanks for the suggestion – again a nice way to restrict the generation of the seven-minute result set to occurring just once and, I think, a neater and tidier solution than mine. Thanks to a side effect of the “update by join” rather than “update by subquery”, it was also slightly more efficient than Geert’s result-cache strategy.
The updateable join view will fail for the OP’s (and my) example because of the need for a uniqueness constraint on the view. (The “key preservation” error but I can’t remember the number off-hand).
Technically the merge command could fail as well (with the run-time equivalent of the key-preservation error, again I don’t recall the text but something about unstable result sets), but for the OP it would only fail if the original update with subquery would have failed anyway with “single row subquery returns more than one row”.
I got a merge by default, but hinted it into the following hash join (for the simple, rather than rowid, approach):
Comment by Jonathan Lewis — January 9, 2016 @ 12:47 pm GMT Jan 9,2016 |
Good point on the “single row subquery returns more than one row” possibility. The other possible “error” is when the scalar subquery would not return a row at all. I think this does not error out, but the subquery evaluates to NULL and the column is just updated to NULL.
Using outer join is one way to address that case, still allowing hashing.
As in many cases when tuning, best if one understands the idea behind the transformation and not apply it blindly as a recipe.
We are more familiar with this kind of transformation when transforming a scalar subquery column expression into a join in a select statement for performance reasons. Same principles apply here and there. Same typical errors occur there when not done right: row multiplications, row elimination, etc.
Maybe this strategy could be automated by the optimizer in the future and prevent such human mistakes.
Actually it already happens as you described in your earlier post: https://jonathanlewis.wordpress.com/2015/12/09/12-scalar-subquery/.
Maybe it will be done for updates as well at some point.
Comment by András Gábor (Kojak) — January 10, 2016 @ 11:25 am GMT Jan 10,2016 |
[…] Case Study: using query factoring to reduce the workload for a complex correlated update […]
Pingback by CTE Catalogue | Oracle Scratchpad — June 10, 2020 @ 6:46 pm BST Jun 10,2020 |
Hi Jonathan,
As mentioned by András a non standard but intuitive syntax can be used by SQL Server. It can also be used with postgres and it’s efficient but it’s a good practice to use standard SQL when it is available.
1st query is simple and standard it would be good to make it work efficiently.
Indirectly related but I am puzzled by those lines of the 2nd plan :
(E-Rows X Starts) gives the estimated number of rows if I understand https://jonathanlewis.wordpress.com/2016/05/05/e-rows-a-rows/ ? Do I miss something or does Oracle “think” it will obtain 45000 rows in v1 ? If true something seems fundamentally wrong here. I wonder how the CBO has the best chance to choose the right plan after that.
Here is the result of an “explain analyze” on pg 13 :
A corresponding line seems to be :
I added statistics on t1(n1, n2) :
(it is obvious extended statistics on t2 are useless in this case if I consider v1)
If you create extended statistics on t1(n1, n2) or an hypothetical index on those columns do you obtain a better estimation and perhaps a better plan “out of the box” in the case of OP ?
Best regards,
Phil
Comment by Phil Florent — December 24, 2020 @ 6:31 pm GMT Dec 24,2020 |
Forget that. I hadn’t read carefully my own plan : it’s just a slightly different version of the 1st plan.
Comment by Phil Florent — December 25, 2020 @ 3:32 am GMT Dec 25,2020 |
Followup : it was in fact obvious I should have created statistics on t2(n1, n2) … but they are useless since they don’t apply on 2 tables. Estimation on v1 is just slightly better thanks to the extended stats on t1.
Consequences of this error can be very important. With such a relatively big “v1” planner estimates that it could use JIT (just in time) compilation on this query but it’s not worth the price.
The 2nd plan is executed more slowly than the first one if jit is enabled :
By default jit was disabled with pg 11. I also deactivate it by default with pg 13 but I still think it would be good to have a better estimation of the number of rows of v1. JIT can be very useful on some cases and it’s not good to completely deactivate a feature at cluster (= CDB) level.
Comment by Phil Florent — December 25, 2020 @ 4:30 am GMT Dec 25,2020 |
It seems possible to transform v1 cf https://pgphil.ovh/group_by_join_11_01.php to obtain an exact estimation and better execution times but it’s not intuitive. Automation is still under development : https://commitfest.postgresql.org/31/1247/
Comment by Phil Florent — December 25, 2020 @ 2:33 pm GMT Dec 25,2020