A recent client was seeing a very large redo penalty from refreshing materialized views. Unfortunately they had to be refreshed very frequently, and were being handled with a complete refresh in atomic mode – which means delete every row from every MV then re-insert every row. The total redo was running at about 5GB per hour, which wasn’t a problem for throughput, but the space for handling backup and recovery was getting a bit extreme.
The requirement consisted of two MVs which extracted and aggregated row and column subsets in two different ways from a single table; then two MVs that aggregated one of the first MVs in two different ways; then two MVs which each joined one of the first level MVs to one of the scond level MVs.
No problem – join MVs are legal, aggregate MVs are legal, “nested” MVs are legal: all you have to do is create the right MV logs and pick the right refresh command. Since the client was also running Standard Editions (SE2) there was no need to worry about how to ensure that query rewrite would work (feature not implemented on SE).
So here, simplified and camouflaged, is a minimum subset of just the first few stages of the construction: a base table with MV log, one first-level aggregate MV with its own MV log, and two aggregate MVs based on the first MV.
rem rem Script: mv_nested_bug.sql rem Author: Jonathan Lewis rem Dated: Jan 2018 rem drop materialized view log on req_line; drop materialized view log on jpl_req_group_numlines; drop materialized view jpl_req_group_numlines; drop materialized view jpl_req_numsel; drop materialized view jpl_req_basis; drop table req_line; -- ---------- -- Base Table -- ---------- create table req_line( eventid number(10,0), selected number(10,0), req number(10,0), basis number(10,0), lnid number(10,0), area varchar2(10), excess number(10,0), available number(10,0), kk_id number(10,0), eventdate number(10,0), rs_id number(10,0) ) ; -- -------------------- -- MV log on base table -- -------------------- create materialized view log on req_line with rowid( req, basis, lnid, eventid, selected, area, excess, available, kk_id, eventdate, rs_id ) including new values ; -- -------------------- -- Level 1 aggregate MV -- -------------------- create materialized view jpl_req_group_numlines( eventid, selected, row_ct, req_ct, basis_ct, req, basis, maxlnid, excess, numsel, area, available, kk_id, rs_id, eventdate ) segment creation immediate build immediate refresh fast on demand as select eventid, selected, count(*) row_ct, count(req) req_ct, count(basis) basis_ct, sum(req) req, sum(basis) basis, max(lnid) maxlnid, excess, count(selected) numsel, area, available, kk_id, rs_id, eventdate from req_line group by eventid, selected, area, excess, available, kk_id, eventdate, rs_id ; -- ------------------------ -- MV log on first level MV -- ------------------------ create materialized view log on jpl_req_group_numlines with rowid ( eventid, area, selected, available, basis, req, maxlnid, numsel ) including new values ; -- ---------------------------- -- First "level 2" aggregate MV -- ---------------------------- create materialized view jpl_req_numsel( eventid, selected, row_ct, totalreq_ct, totalbasis_ct, totalreq, totalbasis, maxlnid, numsel_ct, numsel, area ) segment creation immediate build immediate refresh fast on demand as select eventid, selected, count(*) row_ct, count(req) req_ct, count(basis) basis_ct, sum(req) req, sum(basis) basis, max(maxlnid) maxlnid, count(numsel) numsel_ct, sum(numsel) numsel, area from jpl_req_group_numlines group by eventid, selected, area ; -- ----------------------------- -- Second "level 2" aggregate MV -- ----------------------------- create materialized view jpl_req_basis( eventid, row_ct, totalbasis_ct, totalreq_ct, totalbasis, totalreq, area, selected, available, maxlnid , numsel_ct, numsel ) segment creation immediate build immediate refresh fast on demand as select eventid, count(*) row_ct, count(basis) totalbasis_ct, count(req) totalreq_ct, sum(basis) totalbasis, sum(req) totalreq, area, selected, available, max(maxlnid) maxlnid, count(numsel) numsel, sum(numsel) numsel from jpl_req_group_numlines group by eventid, area, available, selected ;
Once the table, MV logs and MVs exist we can insert some data into the base table, then try refreshing the views. I have tried three different calls to the dbms_mview package: refresh_all_mviews(), refresh_dependent(), and refresh(), specifying the ‘F’ (fast) refresh method, atomic refresh, and nested. All three fail in the same way on 12.2.0.1. The code below shows only the refresh_dependent() call.
I’ve included a query to report the current state of the materialized views before and after the calls, and set a two second sleep before the refresh so that changes in “last refresh” time will appear. The final queries are just to check that the expected volume of data has been transferred to the materialized views.
-- ------------------------------------ -- Insert some data into the base table -- ------------------------------------ begin for i in 1..100 loop execute immediate 'insert into req_line values( :x, :x, :x, :x, :x, :x, :x, :x, :x, :x, :x)' using i,i,i,i,i,i,i,i,i,i,i; commit; end loop; end; / set linesize 144 column mview_name format a40 select mview_name, staleness, compile_state, last_refresh_type, to_char(last_refresh_date,'dd-mon hh24:mi:ss') ref_time from user_mviews ORDER by last_refresh_date, mview_name ; prompt Waiting for 2 seconds to allow refresh time to change execute dbms_lock.sleep(2) declare m_fail_ct number(6,0); begin dbms_mview.refresh_dependent( number_of_failures => m_fail_ct, list => 'req_line', method => 'F', nested => true, atomic_refresh => true ); dbms_output.put_line('Failures: ' || m_fail_ct); end; / select mview_name, staleness, compile_state, last_refresh_type, to_char(last_refresh_date,'dd-mon hh24:mi:ss') ref_time from user_mviews order by last_refresh_date, mview_name ; -- -------------------------------- -- Should be 100 rows in each table -- -------------------------------- select count(*) from jpl_req_basis; select count(*) from jpl_req_group_numlines; select count(*) from jpl_req_numsel;
Both the earlier versions of Oracle are happy with this code and refresh all three materialized view without fail. Oracle 12.2.0.1 crashes the procedure call with a deadlock error which, when traced, shows itself to be a self-deadlock while attempting to select a data dictionary row for update. Here’s the output produced from a test of the code on 12.2.0.1:
MVIEW_NAME STALENESS COMPILE_STATE LAST_REF REF_TIME ---------------------------------------- ------------------- ------------------- -------- ------------------------ JPL_REQ_BASIS FRESH VALID COMPLETE 19-jan 14:03:01 JPL_REQ_GROUP_NUMLINES NEEDS_COMPILE NEEDS_COMPILE COMPLETE 19-jan 14:03:01 JPL_REQ_NUMSEL FRESH VALID COMPLETE 19-jan 14:03:01 3 rows selected. Waiting for 2 seconds to allow refresh time to change PL/SQL procedure successfully completed. declare * ERROR at line 1: ORA-00060: deadlock detected while waiting for resource ORA-06512: at "SYS.select log, oldest, oldest_pk, oldest_oid, oldest_new, youngest+1/86400, flag, yscn, oldest_seq, oscn, oscn_pk, oscn_oid, oscn_new, oscn_seq from sys.mlog$ where mowner = :1 and master = :2 for update", line 2952 ORA-06512: at "SYS.DBMS_SNAPSHOT_KKXRCA", line 85 ORA-06512: at "SYS.DBMS_SNAPSHOT_KKXRCA", line 245 ORA-06512: at "SYS.DBMS_SNAPSHOT_KKXRCA", line 1243 ORA-06512: at "SYS.DBMS_SNAPSHOT_KKXRCA", line 2414 ORA-06512: at "SYS.DBMS_SNAPSHOT_KKXRCA", line 2908 ORA-06512: at "SYS.DBMS_SNAPSHOT_KKXRCA", line 3699 ORA-06512: at "SYS.DBMS_SNAPSHOT_KKXRCA", line 3723 ORA-06512: at "SYS.DBMS_SNAPSHOT", line 75 ORA-06512: at line 4 MVIEW_NAME STALENESS COMPILE_STATE LAST_REF REF_TIME ---------------------------------------- ------------------- ------------------- -------- ------------------------ JPL_REQ_NUMSEL NEEDS_COMPILE NEEDS_COMPILE COMPLETE 19-jan 14:03:01 JPL_REQ_BASIS FRESH VALID FAST 19-jan 14:03:04 JPL_REQ_GROUP_NUMLINES FRESH VALID FAST 19-jan 14:03:04
The deadlock graph from the trace file, with a little extra surrounding information, looks like this:
Deadlock graph: ------------Blocker(s)----------- ------------Waiter(s)------------ Resource Name process session holds waits serial process session holds waits serial TX-00020009-00000C78-A9B090F8-00000000 26 14 X 40306 26 14 X 40306 *** 2018-01-19T14:18:03.925859+00:00 (ORCL(3)) dbkedDefDump(): Starting a non-incident diagnostic dump (flags=0x0, level=1, mask=0x0) ----- Error Stack Dump ----- ----- Current SQL Statement for this session (sql_id=2vnzfjzg6px33) ----- select log, oldest, oldest_pk, oldest_oid, oldest_new, youngest+1/86400, flag, yscn, oldest_seq, oscn, oscn_pk, oscn_oid, oscn_new, oscn_seq from sys.mlog$ where mowner = :1 and master = :2 for update ----- PL/SQL Stack -----
So far I haven’t been able to spot whether or not I’m doing something wrong, or prohibited, and I haven’t been able to find a matching problem on MoS. Since the code works on 11gR2 and 12cR1 I’m inclined to believe it’s a bug introduced in the 12cR2 timeline – which is a nuisance for my client, but if it is a bug then perhaps a fix will appear fairly promptly.
Update (Feb 2022)
You’ll see in the comments that someone ran this test with no problem appearing on 12.2.0.1, and I’ve just re-run the original script with no problems. The trouble is, it’s not the same instance, and I don’t know what might be different between the test I did 4 years ago and the test I did today (although the test today was on a PDB and the previous test might not have been on a container database). There’s a tiny clue in the trace file from 19c and 12c that there’s a small change in the code that could be defending against the deadlock with a slight indication that the deadlock may depend on a quirk of the optimizer’s choice of execute path. But I’d be interested to hear if anyone can reproduce in 12.2.0.1 the deadlock that I was seeing on two different systems.
so this ora-00060 is particular to the test you are doing, not seen at customer?
can you see if select * from sys.mlog$; shows that there is more than one entry for a given mview log ie master,owner pair, seems that is most likely I think maybe … but maybe be a bug as well, I can’t spot any logged bug that matches this at all though, but not to say this isn’t a bug obviously. Another thought is that the mviews are not unique names say, ie that master then gets confused but would expect to see very different failure ( or at least I saw a very different failure in 11gr2 ).
Comment by Enda — January 19, 2018 @ 3:08 pm GMT Jan 19,2018 |
Enda,
Thanks for the thought – I’ve just checked mlog$:
Checking dba_mviews (and cdb_mviews), the three mviews I’ve created are the only ones on the database.
This is a model that I created on a sandbox – the client isn’t using fast refresh at the moment and they’re using different calls with complete atomic refreshes. If I change the method from ‘F’ to ‘C’ to mimic what they’re doing the example above works properly in 12.2.0.1
Comment by Jonathan Lewis — January 19, 2018 @ 3:36 pm GMT Jan 19,2018 |
does MVIEW_SITE from DBA_REGISTERED_MVIEWS; show any thing interesting, ie duplicate site names or the like, ie the sites might not be unique global names maybe?
and that for all the masters involved here
After that, not a clue to be honest.
Comment by Enda — January 19, 2018 @ 3:52 pm GMT Jan 19,2018 |
Enda,
Nothing stands out there, either:
Comment by Jonathan Lewis — January 19, 2018 @ 4:16 pm GMT Jan 19,2018 |
Hi Jonathan.
I done test on patched 12.2 without any errors.
Part of Opatch output:
ARU platform id: 226
ARU platform description:: Linux x86-64
Installed Top-level Products (1):
Oracle Database 12c 12.2.0.1.0
There are 1 products installed in this Oracle Home.
Interim patches (2) :
Patch 26925644 : applied on Tue Dec 19 11:11:28 CET 2017
Unique Patch ID: 21615634
Patch description: “OCW RELEASE UPDATE 12.2.0.1.0(ID:171003) (26925644)”
Patch 26925392 : applied on Tue Dec 19 11:07:08 CET 2017
Unique Patch ID: 21726225
Patch description: “Database Release Update : 12.2.0.1.171121 (26925392)”
Script output:
Comment by Paweł — January 20, 2018 @ 9:21 am GMT Jan 20,2018 |
Pawel,
That’s great news, thanks for doing the test and letting us know.
Interstingly, when I had a look at the “bugs fixed document” for those two patches there was nothing that looked like a good match for the problem (though there were 4 or 5 bugs relating to MVs, one of which might have been in the same area of code).
Could I send you the whole multi-level model to see if it works on your patched system ?
Regards
Jonathan Lewis
Comment by Jonathan Lewis — January 20, 2018 @ 10:25 am GMT Jan 20,2018 |
Not a problem at all :)
Dev env is idle at weekend anyway so it cat be used for some tests ;)
Comment by Paweł — January 20, 2018 @ 11:15 am GMT Jan 20,2018 |
5GB/hour of redo? How I wish my DW db was like that!
Peak periods it generates 1GB of redo every 27 secs. 300GB/day.
Of course: DWs are read only and they never do lots of updates.
And their large amounts of data are generated in the cloud!
;)
(just kidding!
Very interesting problem with the MVs for 12.2.0.1. About to start upgrading my dbs to that one.
Gonna book mark this one as I’m sure some of mine will eventually go into this sort of trouble…
Thanks heaps for posting this. VERY useful.)
Comment by Noons — January 21, 2018 @ 10:32 pm GMT Jan 21,2018 |
Noons,
Happy to be of service.
1GB every 27 seconds is pretty heavy – but if all you’ve got is 300GB per day on your D/W that’s only 2.5 times as much as I have on this OLTP (!) system. ;)
Comment by Jonathan Lewis — January 26, 2018 @ 9:07 am GMT Jan 26,2018 |
There are some features which you think surely loads of people must be using but when you take a closer look are such a mess that the logical conclusion is that they can’t be…. I had this conclusion with fast refresh but join only MVs here: https://orastory.wordpress.com/2014/11/27/the-mess-that-is-fast-refresh-join-only-materialized-views/
Comment by Dom Brooks — January 25, 2018 @ 9:44 pm GMT Jan 25,2018 |
Dom,
Thanks for the link – nice article, and a good starting point for anyone who’s planning to build some materialized views for their system.
I may have to steal you models and do some further testing for the latest releases – but my previous experience of “refresh on commit” with just a single table with simple “subset” view left me hurrying away when I saw the number of extra statements Oracle introduced after a single row change and commit. (Top of my head, I think the code path executed 40 SQL statements on commit.)
I wonder if 12.2 does better – or whether it’s taken the view that “real time evaluation” is the way to go to avoid the overheads and to avoid the problem of the change in performance between running before and after committing your own changes. Plenty of scope for a lost weekend investigating that.
Comment by Jonathan Lewis — January 26, 2018 @ 9:33 am GMT Jan 26,2018 |
[…] Nested MVs (Jan 2018) – when written this demonstrated a strange self-deadlock error with nested MVs in 12.2. The deadlock no longer appears on my 12.2 system, but the note is a nice story about how to set up nested MVs anyway. […]
Pingback by Materialized Views catalogue | Oracle Scratchpad — February 16, 2022 @ 6:36 pm GMT Feb 16,2022 |
[…] Materialized view oddity (Jan 2018): a strange (possibly transient) deadlock when refreshing nested materialized views. […]
Pingback by Lock Catalogue | Oracle Scratchpad — February 19, 2022 @ 10:34 am GMT Feb 19,2022 |