Here’s a problem with ASSM that used to exist in older versions of Oracle had disappeared by 11.2.0.4 and then re-appeared in 12.1.0.2 – disappearing again by 12.2.0.1. It showed up on MoS a few days ago under the heading: “Insert is running long with more waits on db file sequential read”.
The obvious response to this heading is to question the number of indexes on the table – because big tables with lots of indexes tend to give you lots of random I/O as Oracle maintains the indexes – but this table had no indexes. The owner of the problem supplied several of bits of information in the initial post, with further material in response to follow-up questions, including the tkprof summary of the 10046/level 12 trace of the insert and two extracts from the trace file to show us some of the “db file sequential read” waits – the first extract made me wonder if there might be some issue involving 16KB blocks but the second one dispelled that illusion.
There are several buggy things that can appear with ASSM and large-scale DML operations, and sometimes the problems can appear long after the original had done the dirty deed, so I thought I’d create a simple model based on the information supplied to date – and discovered what the problem (probably) was. Here’s how it starts – I’ve created a tablespace using ASSM, and in this tablespace I’ve created a table which has 48 columns with a row length of 290 bytes (roughly matching the OP’s table), and I’ve hacked out a simple PL/SQL block that loops around inserting arrays of 100 rows at a time into the table for a total of 1M rows before committing.
rem rem Script: assm_cleanout.sql rem Author: Jonathan Lewis rem Dated: Jan 2018 rem Purpose: rem rem Last tested rem 12.2.0.1 rem 12.1.0.2 Lots of blocks left "not full" rem 11.2.0.4 rem rem rem using OMF, so no file-name needed rem Ran this bit as SYS, and altered test user to have unlimited quota rem /* create tablespace test_8k_assm datafile size 1G extent management local autoallocate segment space management auto ; */ rem rem Create the table, pre-allocate some space. rem This means we should get consistent 8M extents and not initial little ones rem create table t1 ( v001 varchar2(5), v002 varchar2(5), v003 varchar2(5), v004 varchar2(5), v005 varchar2(5), v006 varchar2(5), v007 varchar2(5), v008 varchar2(5), v009 varchar2(5), v010 varchar2(5), v011 varchar2(5), v012 varchar2(5), v013 varchar2(5), v014 varchar2(5), v015 varchar2(5), v016 varchar2(5), v017 varchar2(5), v018 varchar2(5), v019 varchar2(5), v020 varchar2(5), v021 varchar2(5), v022 varchar2(5), v023 varchar2(5), v024 varchar2(5), v025 varchar2(5), v026 varchar2(5), v027 varchar2(5), v028 varchar2(5), v029 varchar2(5), v030 varchar2(5), v031 varchar2(5), v032 varchar2(5), v033 varchar2(5), v034 varchar2(5), v035 varchar2(5), v036 varchar2(5), v037 varchar2(5), v038 varchar2(5), v039 varchar2(5), v040 varchar2(5), v041 varchar2(5), v042 varchar2(5), v043 varchar2(5), v044 varchar2(5), v045 varchar2(5), v046 varchar2(5), v047 varchar2(5), v048 varchar2(5) ) segment creation immediate tablespace test_8k_assm storage(initial 8M) ; alter table t1 allocate extent (size 8M); alter table t1 allocate extent (size 8M); rem rem Simple anonymous pl/sql block rem Large insert, handled with array inserts rem Can modify loop count and array size very easily rem declare type tab_array is table of t1%rowtype; junk_array tab_array; begin select 'xxxxx', 'xxxxx', 'xxxxx', 'xxxxx', 'xxxxx', 'xxxxx', 'xxxxx', 'xxxxx', 'xxxxx', 'xxxxx', 'xxxxx', 'xxxxx', 'xxxxx', 'xxxxx', 'xxxxx', 'xxxxx', 'xxxxx', 'xxxxx', 'xxxxx', 'xxxxx', 'xxxxx', 'xxxxx', 'xxxxx', 'xxxxx', 'xxxxx', 'xxxxx', 'xxxxx', 'xxxxx', 'xxxxx', 'xxxxx', 'xxxxx', 'xxxxx', 'xxxxx', 'xxxxx', 'xxxxx', 'xxxxx', 'xxxxx', 'xxxxx', 'xxxxx', 'xxxxx', 'xxxxx', 'xxxxx', 'xxxxx', 'xxxxx', 'xxxxx', 'xxxxx', 'xxxxx', 'xxxxx' bulk collect into junk_array from all_objects where rownum <= 100 -- > comment to avoid WordPress format issue ; for i in 1..10000 loop forall j in 1..junk_array.count insert into t1 values junk_array(j) ; end loop; end; commit;
The number of rows per block after this insert is 24, with 1038 bytes free space left (808 due to the pctfree = 10, then the bit that was too small to take a 25th row before breaching the pctfree barrier). This means we should report 1M/24 = 41,666 full blocks and one block with some free space. So we query the table using the dbms_space package:
declare m_unformatted_blocks number; m_unformatted_bytes number; m_fs1_blocks number; m_fs1_bytes number; m_fs2_blocks number; m_fs2_bytes number; m_fs3_blocks number; m_fs3_bytes number; m_fs4_blocks number; m_fs4_bytes number; m_full_blocks number; m_full_bytes number; begin dbms_space.SPACE_USAGE( segment_owner => 'TEST_USER', segment_name => 'T1', segment_type => 'TABLE', unformatted_blocks => m_unformatted_blocks, unformatted_bytes => m_unformatted_bytes, fs1_blocks => m_fs1_blocks , fs1_bytes => m_fs1_bytes, fs2_blocks => m_fs2_blocks, fs2_bytes => m_fs2_bytes, fs3_blocks => m_fs3_blocks, fs3_bytes => m_fs3_bytes, fs4_blocks => m_fs4_blocks, fs4_bytes => m_fs4_bytes, full_blocks => m_full_blocks, full_bytes => m_full_bytes ); dbms_output.new_line; dbms_output.put_line('Unformatted : ' || to_char(m_unformatted_blocks,'999,999,990') || ' / ' || to_char(m_unformatted_bytes,'999,999,999,990')); dbms_output.put_line('Freespace 1 ( 0 - 25% free) : ' || to_char(m_fs1_blocks,'999,999,990') || ' / ' || to_char(m_fs1_bytes,'999,999,999,990')); dbms_output.put_line('Freespace 2 ( 25 - 50% free) : ' || to_char(m_fs2_blocks,'999,999,990') || ' / ' || to_char(m_fs2_bytes,'999,999,999,990')); dbms_output.put_line('Freespace 3 ( 50 - 75% free) : ' || to_char(m_fs3_blocks,'999,999,990') || ' / ' || to_char(m_fs3_bytes,'999,999,999,990')); dbms_output.put_line('Freespace 4 ( 75 - 100% free) : ' || to_char(m_fs4_blocks,'999,999,990') || ' / ' || to_char(m_fs4_bytes,'999,999,999,990')); dbms_output.put_line('Full : ' || to_char(m_full_blocks,'999,999,990') || ' / ' || to_char(m_full_bytes,'999,999,999,990')); end; /
The results aren’t what we expect:
Unformatted : 0 / 0 Freespace 1 ( 0 - 25% free) : 35,001 / 286,728,192 Freespace 2 ( 25 - 50% free) : 1 / 8,192 Freespace 3 ( 50 - 75% free) : 0 / 0 Freespace 4 ( 75 - 100% free) : 67 / 548,864 Full : 6,665 / 54,599,680
We have one block marked as 25 – 50% free (that’s the one block with 16 rows in it, which means about 40% space currently free) but our 41,666 full blocks are actually reported as 6,665 full blocks and 35,001 blocks with some space available. That’s going to hurt eventually if some process wants to insert more rows and finds that it has to fail its way through 35,001 blocks before finding a block which has enough free space.
So what happens when I repeat the PL/SQL block (and commit)? Here are the results from calls to dbms_space after the next two cycles:
Unformatted : 0 / 0 Freespace 1 ( 0 - 25% free) : 70,002 / 573,456,384 Freespace 2 ( 25 - 50% free) : 2 / 16,384 Freespace 3 ( 50 - 75% free) : 0 / 0 Freespace 4 ( 75 - 100% free) : 220 / 1,802,240 Full : 13,330 / 109,199,360 Unformatted : 256 / 2,097,152 Freespace 1 ( 0 - 25% free) : 105,003 / 860,184,576 Freespace 2 ( 25 - 50% free) : 3 / 24,576 Freespace 3 ( 50 - 75% free) : 0 / 0 Freespace 4 ( 75 - 100% free) : 117 / 958,464 Full : 19,995 / 163,799,040
Every time we execute the PL/SQL block we leave a trail of 35,001 more blocks which are flagged as “not quite full”.
Looking at the session stats while running the insert loop I can tell that Oracle isn’t checking to see whether or not it should be using those blocks. (A quick way of proving this is to flush the buffer cache before each execution of the PL/SQL and note that Oracle doesn’t read back the 105,000 blocks before inserting any data). So somehow, sometime, someone might get a nasty surprise – and here’s one way that it might happen:
Since I know I my data fits 24 rows per block I’m going to modify my PL/SQL block to select one row into the array then loop round the insert 25 times – so I know I’m inserting a little bit more than one block’s worth of data. Starting from the state with 105,003 blocks marked as “Freespace 1” this is what I saw – first, the free space report after inserting 25 rows:
Unformatted : 240 / 1,966,080 Freespace 1 ( 0 - 25% free) : 1,074 / 8,798,208 Freespace 2 ( 25 - 50% free) : 0 / 0 Freespace 3 ( 50 - 75% free) : 0 / 0 Freespace 4 ( 75 - 100% free) : 133 / 1,089,536 Full : 123,927 / 1,015,209,984
Then a few wait events and session statistics for the insert:
--------------------------------------------------------- SID: 39:TEST_USER - jonathan Session Events - 09-Jan 16:57:18 Interval:- 6 seconds --------------------------------------------------------- Event Waits Time_outs Csec Avg Csec Max Csec ----- ----- --------- ---- -------- -------- db file sequential read 15,308 0 128 .008 3 db file scattered read 20,086 0 271 .014 4 --------------------------------- Session stats - 09-Jan 16:57:18 Interval:- 6 seconds --------------------------------- Name Value ---- ----- session logical reads 269,537 physical read total IO requests 35,401 db block gets 229,522 consistent gets 40,015 physical reads 124,687 physical reads cache 124,687 db block changes 208,489 physical reads cache prefetch 89,293 redo entries 207,892 redo size 16,262,724 undo change vector size 1,720 deferred (CURRENT) block cleanout applications 103,932 table scan blocks gotten 20,797 HSC Heap Segment Block Changes 25
The session has read and updated almost all of the level 1 bitmap blocks. I don’t know exactly what triggered this re-read, but seems to be related to the number of rows inserted (or, perhaps, the amount of space used rather than the row count) as an insert crosses the pctfree boundary and fails over to the next block. I’ve only done a couple of little tests to try and get a better idea of why an insert sometimes sweeps through the bitmap blocks – so I know that inserting 2 or 3 rows at a time will also trigger the cleanout – but there are probably several little details involved that need to be identified.
You might note a couple of details in the stats:
- Because I had flushed the buffer cache before the insert Oracle did its “cache warmup” tablescanning trick – if this had not happened I would probably have done a single block read for every single bitmap block I touched.
- There are 103,932 block cleanout applications – but 208,000 db block changes and redo entries. Roughly half the latter are for data block cleanouts (OP code 4.1) and half are the state changes on the level 1 bitmap blocks (OP code 13.22). You’ll notice that neither change produces any undo.
- I’ve also included the HSC Heap Segment Block Changes statistics to show you that not all changes to Heap Segment Blocks show up where you might expect them.
And finally:
If you re-run the tests on 11.2.0.4 and 12.2.0.1 you get the following results after the intial script run – the problem doesn’t exist:
11.2.0.4 ======== Unformatted : 0 / 0 Freespace 1 ( 0 - 25% free) : 0 / 0 Freespace 2 ( 25 - 50% free) : 1 / 8,192 Freespace 3 ( 50 - 75% free) : 0 / 0 Freespace 4 ( 75 - 100% free) : 67 / 548,864 Full : 41,666 / 341,327,872 12.2.0.1 ======== Unformatted : 0 / 0 Freespace 1 ( 0 - 25% free) : 0 / 0 Freespace 2 ( 25 - 50% free) : 1 / 8,192 Freespace 3 ( 50 - 75% free) : 0 / 0 Freespace 4 ( 75 - 100% free) : 67 / 548,864 Full : 41,666 / 341,327,872
So how does this help the OP.
- First, there may be a huge mess still waiting to be cleaned in the table – but at 34M blocks I’m not too keen to suggest running the dbms_space routine to find out what it looks like – but maybe that’s necessary.
- Secondly – an upgrade to 12.2 will probably avoid the problem in future.
- Thirdly – if the number of rows per block is very close to uniform, write a little code to do a loop that inserts (say) 2 * expected number of rows per block as single row inserts and rolls back; the inserts will probably trigger a massive though perhaps not complete cleanout, so rinse and repeat until the cleanout is complete. Try to find a time when you don’t mind the extra load to get this job done.
- Finally – on the big job that does the bulk insert – repeat the dummy insert/rollback at the end of the job to clean up the mess made by the job.
Addenda
Prompted by comment #2 below, I should add that if the problem has been fixed in 12.2 then possibly there’s a bug report and patch for it already. If there isn’t then the OP could raise an SR (referencing this blog note), and request a bug fix or back-port from 12.2.
And with 24 hours of publication, comment #4 (from Yury Pudovchenko) tells us that the bug is fixed by the Oct 2017 Bundle Patch.
Would the table being compressed with Advanced Compression make this problem better or worse?
Comment by Bob Bryla — January 9, 2018 @ 8:08 pm GMT Jan 9,2018 |
Bob,
I don’t think you’d be using this method for loading data if the table were defined with compression; I think you’d find a way of doing an “insert as select”, probably with and /*+ append */.
I haven’t spent any time testing anything other than what you see above.
Comment by Jonathan Lewis — January 10, 2018 @ 9:19 am GMT Jan 10,2018 |
But more importantly, is this a bug that Oracle should fix (other than upgrading to 12.2.0.1)? Cleaning up the mess with those suggestions is fine and dandy but every day? Other ways to avoid this scenario?
Comment by rjbdba — January 10, 2018 @ 3:18 am GMT Jan 10,2018 |
rjbdba,
As I said, I haven’t done anything to work out any of the details of what makes Oracle set the bits suitably, so there may be a better strategy. In fact one quick test I did do was to drop the array size to exactly 24 because that’s exactly the number of rows I ALWAYS get per block. At that array size every block was marked as full, making me think there’s something special happening when the first row of the insert (array) breaks the pctfree barrier – so another workaround is to pick the right arraysize IF you can guarantee that every block will always get exactly the same number of rows.
I would say this probably is a bug that Oracle should fix, and that my suggested hack is not one where Oracle Support could reasonably say you had a viable workaround that made the problem less urgent. Having said that – since the problem seems to be fixed in 12.2, perhaps there already is a bug fix that could be back-ported to 12.1. I did try to find a bug report about this, but wasn’t successful, but that may have been my failure to pick the right search terms.
Comment by Jonathan Lewis — January 10, 2018 @ 9:24 am GMT Jan 10,2018 |
Thanks. I keep ending up back to this Oracle Support note as well —
Oracle Support Document 1263496.1 (INSERT slow on ASSM) can be found at: https://support.oracle.com/epmos/faces/DocumentDisplay?id=1263496.1
“not a bug”, but there is a “repair” option — doesn’t seem like a sustainable solution, and having PCTFREE much higher seems like a poor solution. In our situation, we have Advanced Compression in the mix, and even with PCFREE=40 or more, we still get chained (migrated) rows, which because they’re migrated, doesn’t cause a huge issue with Exadata smart scans, but does throw a monkey wrench into the ETL batch when a few jobs take 6 hours instead of 6 minutes and have to be killed and restarted.
A good reason to accelerate the upgrade to 12.2 though.
Comment by rjbdba — January 10, 2018 @ 11:24 am GMT Jan 10,2018 |
rjbdba,
I had a little rant about that note on OTN, which you’ve probably read, of course.
It’s an example of a very annoying symptom of MoS notes – it’s been updated to say it’s “relevant” to newer versions without proper testing; it hasn’t been edited to reflect any changes; the critical text is ambiguous; it doesn’t contain a demonstration to allow you to test the versions that haven’t been tested; and – in a most important aspect – it’s wrong because the problem doesn’t (seem to) exist in 11.2.0.4.
Comment by Jonathan Lewis — January 10, 2018 @ 12:14 pm GMT Jan 10,2018 |
Hello, Jonathan,
I run this test on 12.1.0.2.171017 (2017 October quarterly patch) and got :
It appears that this bug has already been fixed.
Patch Id: 26717470
Patch Description: Database Bundle Patch : 12.1.0.2.171017 (26717470)
Comment by Yury Pudovchenko — January 10, 2018 @ 11:25 am GMT Jan 10,2018 |
Interesting, my customer is not on that patch set yet, but my database is, which would explain why I can’t reproduce in house. I know what I’ll be doing this morning…
Comment by rjbdba — January 10, 2018 @ 11:38 am GMT Jan 10,2018 |
Yuri,
Excellent.
Thanks for doing the test and passing on the news.
Checking the patch release note the specific bug fix looks like this one, published in the Jan 2016 release:
Bug 20466322 – ASSM Free block space not reused when expected (Doc ID 20466322.8)
Joel Garry suggested that bug on the OTN thread, but the details don’t really match this case, but the fix has probably covered several variations on the theme.
Note: When I searched the patch database for this patch the search redirected me to an updated bundle:
Patch 26635880: DATABASE PROACTIVE BUNDLE PATCH 12.1.0.2.171017
Comment by Jonathan Lewis — January 10, 2018 @ 12:10 pm GMT Jan 10,2018 |
[…] yesterday’s post one of the obvious follow-up questions was whether the problem I demonstrated was a side effect of […]
Pingback by ASSM Argh 2 | Oracle Scratchpad — January 10, 2018 @ 1:24 pm GMT Jan 10,2018 |
[…] a follow-on from Tuesday’s (serious) note about a bug in 12.1.0.2 that introduces random slowdown on large-scale inserts. This threat in this […]
Pingback by ASSM tangle | Oracle Scratchpad — January 11, 2018 @ 5:35 pm GMT Jan 11,2018 |