Oracle Scratchpad

January 17, 2020

Group by Elimination

Filed under: 12c,18c,Bugs,Oracle — Jonathan Lewis @ 12:57 pm GMT Jan 17,2020

Here’s a bug that was highlighted a couple of days ago on the Oracle Developer Community forum; it may be particularly worth thinking about if if you haven’t yet got up to Oracle 12c as it appeared in an optimizer feature that appeared in 12.2 (and hasn’t been completely fixed) even in the latest release of 19c (currently 19.6).

Oracle introduce “aggregate group by elimination” in 12.2, protected by the hidden parameter “_optimizer_aggr_groupby_elim”. The notes on MOS about the feature tell us that Oracle can eliminate a group by operation from a query block if a unique key from every table in the query block appears in the group by clause. Unfortunately there were a couple of gaps in the implementation in 12.2 that can produce wrong results. Here’s some code to model the problem.

rem
rem     Script:         group_by_elim_bug.sql
rem     Author:         Jonathan Lewis
rem     Dated:          Jan 2020
rem

create table ref_clearing_calendar(
        calendar_name   char(17),
        business_date   date,
        update_ts       timestamp (6) default systimestamp,
        constraint pk_ref_clearing_calendar 
                        primary key (business_date)
)
/

insert into ref_clearing_calendar (business_date)
select
        sysdate + 10 * rownum
from 
        all_objects 
where 
        rownum <= 40 -- > comment to avoid wordpress format issue
/

commit;

execute dbms_stats.gather_table_stats(null,'ref_clearing_calendar',cascade=>true)

set autotrace on explain

select
        to_char(business_date,'YYYY') , count(*)
from
        ref_clearing_calendar
group by 
        to_char(business_date,'YYYY')
order by 
        to_char(business_date,'YYYY')
/

set autotrace off

I’ve created a table with a primary key on a date column, and then inserted 40 rows which are spaced every ten days from the current date; this ensures that I will have a few dates in each of two consecutive years (future proofing the example!). Then I’ve aggregated to count the rows per year using the to_char({date column},’YYYY’) conversion option to extract the year from the date. (Side note: the table definition doesn’t follow my normal pattern as the example started life in the ODC thread.)

If you run this query on Oracle 12.2 you will find that it returns 40 (non-unique) rows and displays the following execution plan:


---------------------------------------------------------------------------------------------
| Id  | Operation        | Name                     | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT |                          |    40 |   320 |     2  (50)| 00:00:01 |
|   1 |  SORT ORDER BY   |                          |    40 |   320 |     2  (50)| 00:00:01 |
|   2 |   INDEX FULL SCAN| PK_REF_CLEARING_CALENDAR |    40 |   320 |     1   (0)| 00:00:01 |
---------------------------------------------------------------------------------------------

The optimizer has applied “aggregate group by elimination” because it hasn’t detected that the primary key column that appears in the group by clause has been massaged in a way that means the resulting value is no longer unique.

Fortunately this problem with to_char() is fixed in Oracle 18.1 where the query returns two rows using the following execution plan (which I’ve reported from an instance of 19.5):

---------------------------------------------------------------------------------------------
| Id  | Operation        | Name                     | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT |                          |    40 |   320 |     2  (50)| 00:00:01 |
|   1 |  SORT GROUP BY   |                          |    40 |   320 |     2  (50)| 00:00:01 |
|   2 |   INDEX FULL SCAN| PK_REF_CLEARING_CALENDAR |    40 |   320 |     1   (0)| 00:00:01 |
---------------------------------------------------------------------------------------------

Unfortunately there is still at least one gap in the implementation. Change the to_char(business_date) to extract(year from business_date) at all three points in the query, and even in 19.6 you’re back to the wrong results – inappropriate aggregate group by elimination and 40 rows returned.

There are a couple of workarounds, one is the hidden parameter _optimizer_aggr_groupby_elim to false at the system or session level, or through an opt_param() hint at the statement level (possibly injected through an SQL_Patch. The other option is to set a fix_control, again at the system, session, or statement level – but there’s seems to be little point in using the fix_control approach (which might be a little obscure for the next developer to see the code) when it seems to do the same as the explicitly named hidden parameter.

select
        /*+ opt_param('_optimizer_aggr_groupby_elim','false') */
        extract(year from business_date) , count(*)
from ,,,

select
        /*+ opt_param('_fix_control','23210039:0') */
        extract(year from business_date) , count(*)
from ...

One final thought about this “not quite fixed” bug. It’s the type of “oversight” error that gives you the feeling that there may be other special cases that might have been overlooked. The key question would be: are there any other functions (and not necessarily datetime functions) that might be applied (perhaps implicitly) to a primary or unique key that would produce duplicate results from distinct inputs – if so has the code that checks the validity of eliminating the aggregate operation been written to notice the threat.

Footnote

The problem with extract() has been raised as a bug on MOS, but it was not public at the time of writing this note.

Update (about 60 seconds after publication)

Re-reading my comment about “other functions” it occurred to me that to_nchar() might, or might not, behave the same way as to_char() in 19c – so I tested it … and got the wrong results in 19c.

 

 

 

January 14, 2020

Drop Column bug

Filed under: 18c,Bugs,Oracle,Upgrades — Jonathan Lewis @ 1:22 pm GMT Jan 14,2020

When I was a child I could get lost for hours in an encyclopedia because I’d be looking for one topic, and something in it would make me want to read another, and another, and …

The same thing happens with MOS (My  Oracle Support) – I search for something and the search result throws up a completely irrelvant item that looks much more interesting so I follow a hyperlink, which mentions a couple of other notes, and a couple of hours later I can’t remember what I had started looking for.

Today’s note is a side effect of that process. A comment made yesterday about count(*)/count(1) referenced Oracle bug “19450314: UNNECESSARY INVALIDATIONS IN 12C”, and when I searched MOS for more information on this bug I discovered bug 30404639 : TRIGGER DOES NOT WORK CORRECTLY AFTER ALTER TABLE DROP UNUSED COLUMN. The impact of this bug is easy to demonstrate, and the ramifications are as follows:

Exercise extreme care with the “alter table drop column” command in 18c and above.

The problem is easy to work around, but the impact of not knowing about it could be catastrophic if your pre-production testing wasn’t quite good enough. Here’s a little demonstration script – the bug note says the problem appeared in 18.3 but I ran this test against 19.3. The script is a modified version of the SQL in the bug note:


create table t1 (c0 varchar2(30), c1 varchar2(30), c2 varchar2(30), c3 varchar2(30), c4 varchar2(30));
create table t2 (c_log varchar2(30));

create or replace trigger t1_ariu
after insert or update on t1
for each row
begin
        IF :new.c3 is not null then
                insert into t2 values (:new.c3);
        end if;
end;
/

spool drop_col_bug_18c.lst

insert into t1(c3) values ('Inserting c3 - should log'); 
select * from t2;
 
insert into t1(c4) values ('Inserting c4 - should not log'); 
select * from t2;

prompt  ===================================
prompt  Drop some columns in two steps then
prompt  truncate t2 and repeat the test
prompt  ===================================
 
alter table t1 set unused (c1, c2);
alter table t1 drop unused columns;

truncate table t2;

insert into t1(c3) values ('Inserting c3 - should log'); 
select * from t2;
 
insert into t1(c4) values ('Inserting c4 - should not log'); 
select * from t2;
 

The code is very simple. It creates a couple of tables an “after row” trigger on one of them to copy one column value across to the other table on an insert or update provided the new column value is not null.

To check that the trigger is (at least in part) behaving the code does two inserts – one which should copy a value and one which should not – and we see that the copy takes place as expected.

Now comes the critical part. We mark two of the columns in the table as unused, then drop all unused columns, truncate the second table and repeat the inserts.

If you run the test on 12.2.0.1 then you should find that the second run behaves just like the first run. If you’re running 18c or 19c be prepared for the following:


insert into t1(c3) values ('Inserting c3 - should log')
*
ERROR at line 1:
ORA-00600: internal error code, arguments: [insChkBuffering_1], [4], [4], [], [], [], [], [], [], [], [], []

no rows selected

insert into t1(c4) values ('Inserting c4 - should not log')
*
ERROR at line 1:
ORA-00600: internal error code, arguments: [insChkBuffering_1], [4], [4], [], [], [], [], [], [], [], [], []

This is not good – but it gets worse. If your application starts consistently breaking with an ORA-00600 error that’s going to annoy a lot of users for (we hope) a brief interval, but if your application keeps running and corrupting your data that’s a much bigger problem. Re-run the whole script (dropping the two tables first) but change it to mark just one of the two columns as unused, and you’ll get results for the second pass that look like this:


Table truncated.


1 row created.


no rows selected


1 row created.


C_LOG
------------------------------
Inserting c4 - should not log

1 row selected.

The trigger seems to “lose count” of the columns in the table (presumably it’s compiled to refer to something like “column_position = 3” and doesn’t adjust on the “drop column” – the linked bug notes on MOS refer to the problem being associated with the project to increase fine-grained dependencies) so it manages to survive with one column dropped because there’s still a “column 3” which happens now to be the column that used to be “column 4”.

Workaround

There is a simple workaround if you run into this problem after modifying your production system (and before you’ve corrupted a huge amount of data – recompile the trigger manually immediately after the drop completes: “alter trigger t1_ariu compile”.

Refinement

The problem seems to appear only if the following two conditions are true:

  • you use a two-step approach to dropping a column, viz: set unused then drop. If you simply issue “alter table t1 drop column c1” (with or without a “checkpoint NNN”) then the problem does not appear. It’s a great shame that in the past I’ve given advice that setting columns unused and dropping them later is a better option than doing an immediate drop.
  • you drop columns that appear earlier in the table than the highest position column mentioned in the trigger. But this isn’t something you should gamble on, particularly since the workaround is so easy to implement, because the order the columns appear in the table declaration isn’t necessarily the internal column ordering so you might get it wrong (not that I’ve tried to test for that threat) – and what if there are some selective materialized view logs where you don’t explicitly create triggers and forget to cater for.

I don’t expect anyone to be dropping columns in production systems with any great frequency, and you would expect such a significant operation to be tested quite carefully, but it’s easy to envisage a scenario where the testing might be split into two pieces viz:

  1. test the application on a pre-prod version of the database where a table has been created as a subset of the production data without the column that’s due to be dropped
  2. test how long it takes to execute the actual drop on a (minimal) recovered backup of production, but don’t test the new production code on the resulting table.

Sometimes it’s easy to overlook things that “obviously” won’t need testing, especially when it’s something that has always worked in the past with no special treatment required.

<h3>Footnote</h3>

If you try running this model on LiveSQL you’ll find that the code stops and the web page reports “Error: Internal Server Error” so you can’t tell that the problem is exactly the same there – but it seems quite likely that it is.

Given how easy it is to bypass the problem I haven’t bothered to do any further research on the issue – is it only related to insert and update trigger, and do they have to be after row for the update, and what about before row delete triggers (with materialized view logs in mind).

 

January 13, 2020

Collections

Filed under: Execution plans,Oracle,Troubleshooting — Jonathan Lewis @ 2:31 pm GMT Jan 13,2020

This is a note I drafted in September 2015 and only rediscovered a couple of days ago while searching for something I was sure I’d written about collections and/or table functions. The intention of collections and table functions is that they should behave like tables when you use them in a query – but there are cases where a real table and something cast to a table() aren’t treated the same way by the optimizer – and this 4-year old note (which is still valid in 2020 for 19c) is one of those cases.

 

There was a question – with test case – on Oracle-L recently [ed: now more than 4 years ago] about the behaviour of a query that changed plans as you switched from using a global temporary table to a collection – why was Oracle doing something inefficient with the collection. The answer was: “Bad luck, it’s a limitation in the optimizer”.  (Sub-text: collections are a pain).

The test case was short and simple so I thought I’d post it – with an h/t to Patrick Jolliffe who presented the probem and Timur Akhmadeev and Stefan Koehler who explained the problems.

Here’s the script (with a little cosmetic editing) to create the necessary objects and data:

rem
rem     Script:         collections.sql
rem     Author:         Jonathan Lewis
rem     Dated:          Sep 2015
rem
rem     Last tested 
rem             19.3.0.0
rem             12.2.0.1
rem             12.1.0.2
rem

create or replace type number_table is table of number;
/

create table test_objects as select * from all_objects;
create /* unique */ index test_objects_idx on test_objects(object_id);

exec dbms_stats.gather_table_stats(null, 'test_objects');

create global temporary table gtt_test_objects (object_id number);
insert into gtt_test_objects values (1);


In this example I’ve created a type which is a simple table of number. In a more general case you might create a simple object type, and then a type that was a table of that object type, then you might create a function that returned a variable of that table type, or a function that was declared to return the table type “pipelined” and uses the “pipe row” instruction in the code to return one value of the simple object type at a time. Whichever variation you used you could then use the table() operator to tell Oracle to treat the content of the table type as if it were a relational table. (In recent versions of Oracle the table() operator is redundant).

Here’s the first query, which uses the global temporary table in an “IN” subquery, followed by its execution plan – again with a little cosmetic editing and the addition of query block names across the board:


prompt  ==================================
prompt  Query using global temporary table
prompt  ==================================

select  
        /*+ qb_name(main) */ 
        null 
from    (
        select
                /*+ qb_name(inline) */
                distinct object_id 
        from    test_objects
        ) 
where   object_id in (
                select 
                        /*+
                                qb_name(subq)
                                cardinality(gtt_test_objects 1) 
                        */ 
                        gtt_test_objects.object_id
                from
                        gtt_test_objects 
        )
;


-------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation             | Name             | Starts | E-Rows | A-Rows |   A-Time   | Buffers |  OMem |  1Mem | Used-Mem |
-------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT      |                  |      1 |        |      0 |00:00:00.01 |       5 |       |       |          |
|   1 |  VIEW                 | VM_NWVW_1        |      1 |      1 |      0 |00:00:00.01 |       5 |       |       |          |
|   2 |   SORT UNIQUE NOSORT  |                  |      1 |      1 |      0 |00:00:00.01 |       5 |       |       |          |
|   3 |    NESTED LOOPS       |                  |      1 |      1 |      0 |00:00:00.01 |       5 |       |       |          |
|   4 |     SORT UNIQUE       |                  |      1 |      1 |      1 |00:00:00.01 |       3 |  2048 |  2048 | 2048  (0)|
|   5 |      TABLE ACCESS FULL| GTT_TEST_OBJECTS |      1 |      1 |      1 |00:00:00.01 |       3 |       |       |          |
|*  6 |     INDEX RANGE SCAN  | TEST_OBJECTS_IDX |      1 |      1 |      0 |00:00:00.01 |       2 |       |       |          |
-------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   6 - access("OBJECT_ID"="GTT_TEST_OBJECTS"."OBJECT_ID")

As you can see I’ve set statistics_level to all, and used dbms_xplan.display_cursor() to pull the actual execution plan from memory. This plan tells us that the optimizer unnested the IN subquery to generate a unique set of values and used that unique set to drive a nested loop join into the test_objects table (with an index-only probe). Moreover, before this step, the optimizer used complex view merging and cost-based query transformation to postpone the “distinct” from the original query to do the join before distinct. The E-rows at operation 5 also tells us that the optimizer “knew” that there was only one row in the GTT – it took note of my cardinality() hint.

Now we replace with gtt_test_objects table with the collection – casting it to a table() and giving Oracle the same cardinality() hint – as follows:


select 
        /*+ 
                qb_name(main)
--              no_use_hash_aggregation(@sel$1)
        */ 
        null
from    (
        select  
                /*+ inline */
                distinct object_id 
        from    test_objects
        )
where   object_id in (
                select 
                        /*+ 
                                qb_name(subq)
                                cardinality(gtt_test_objects 1) 
                        */ 
                        column_value object_id
                from
                        table(number_table(1)) gtt_test_objects
        )
;

-------------------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                               | Name             | Starts | E-Rows | A-Rows |   A-Time   | Buffers |  OMem |  1Mem | Used-Mem |
-------------------------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                        |                  |      1 |        |      0 |00:00:00.08 |     132 |       |       |          |
|   1 |  MERGE JOIN SEMI                        |                  |      1 |      1 |      0 |00:00:00.08 |     132 |       |       |          |
|   2 |   SORT JOIN                             |                  |      1 |  56762 |      1 |00:00:00.08 |     132 |  1470K|   606K| 1306K (0)|
|   3 |    VIEW                                 |                  |      1 |  56762 |  56762 |00:00:00.03 |     132 |       |       |          |
|   4 |     HASH UNIQUE                         |                  |      1 |  56762 |  56762 |00:00:00.03 |     132 |  4122K|  2749K| 3418K (0)|
|   5 |      INDEX FAST FULL SCAN               | TEST_OBJECTS_IDX |      1 |  56762 |  56762 |00:00:00.01 |     132 |       |       |          |
|*  6 |   SORT UNIQUE                           |                  |      1 |      1 |      0 |00:00:00.01 |       0 |  2048 |  2048 | 2048  (0)|
|   7 |    COLLECTION ITERATOR CONSTRUCTOR FETCH|                  |      1 |      1 |      1 |00:00:00.01 |       0 |       |       |          |
-------------------------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   6 - access("OBJECT_ID"=VALUE(KOKBF$))
       filter("OBJECT_ID"=VALUE(KOKBF$))

The second plan is completely different. The optimizer has unnested the subquery to produce a join, but instead of using the unique set of values that it generated from the collection to drive a nested loop it’s decide to do a merge semi-join, which has entailed an expensive fast full scan of the test_objects_idx index to acquire all the key values first.

I tried to make the optimizer use the collection to drive a nested loop, adding some carefully targeted hints to force the join order and dictate a nested loop join with pushed predicate: but the optimizer wouldn’t push the “obvious” join predicate and continued to do an index fast full scan and sort of the text_object_idx. If you’re interested here are the hints and the resulting plan:

/*+
        qb_name(main)
        leading( @sel$8969f1c9 kokbf$0@sel$2 "from$_subquery$_001"@main)
        use_nl( @sel$8969f1c9 "from$_subquery$_001"@main)
        push_pred(@sel$8969f1c9 "from$_subquery$_001"@main)
*/

-------------------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                               | Name             | Starts | E-Rows | A-Rows |   A-Time   | Buffers |  OMem |  1Mem | Used-Mem |
-------------------------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                        |                  |      1 |        |      0 |00:00:00.03 |     132 |       |       |          |
|   1 |  NESTED LOOPS                           |                  |      1 |      1 |      0 |00:00:00.03 |     132 |       |       |          |
|   2 |   SORT UNIQUE                           |                  |      1 |      1 |      1 |00:00:00.01 |       0 |  2048 |  2048 | 2048  (0)|
|   3 |    COLLECTION ITERATOR CONSTRUCTOR FETCH|                  |      1 |      1 |      1 |00:00:00.01 |       0 |       |       |          |
|*  4 |   VIEW                                  |                  |      1 |      1 |      0 |00:00:00.03 |     132 |       |       |          |
|   5 |    SORT UNIQUE                          |                  |      1 |  56762 |  56762 |00:00:00.03 |     132 |  2604K|   728K| 2314K (0)|
|   6 |     INDEX FAST FULL SCAN                | TEST_OBJECTS_IDX |      1 |  56762 |  56762 |00:00:00.01 |     132 |       |       |          |
-------------------------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   4 - filter("OBJECT_ID"=VALUE(KOKBF$))

Hint Report (identified by operation id / Query Block Name / Object Alias):
Total hints for statement: 6 (U - Unused (1))
---------------------------------------------------------------------------
0 - SEL$102722C0
- qb_name(subq)

1 - SEL$8969F1C9
- leading( @sel$8969f1c9 kokbf$0@sel$2 "from$_subquery$_001"@main)
- qb_name(main)

1 - SEL$8969F1C9 / from$_subquery$_001@MAIN
U - push_pred(@sel$8969f1c9 "from$_subquery$_001"@main)
- use_nl( @sel$8969f1c9 "from$_subquery$_001"@main)

5 - SEL$1
- inline

In the previous post we had a “NOT IN” subquery against a collection/pipelined table function that couldn’t even be unnested (even in 19c); in this example we have an IN subquery that does unnest but then can’t drive a nested loop efficiently because the optimizer won’t push the collection values into the distinct view, and won’t do complex view merging to avoid having to do that join predicate pushdown. Collections and table functions() just don’t play nicely with the optimizer!

In fact this plan also shows one of those “generic” approaches in the optimizer that allows a human operator to see a special case that could have been further optimized: if the optimizer had used a sort unique rather than a hash unique at operation 4 then the sort join at operation 2 would have been redundant – with an overall reduction in memory and CPU usage that I managed to get in a separate test by adding the hint /*+ no_use_hash_aggregation(@sel$1) */ to the query. (Since operation 6 is also a sort unique the merge join semi could, in principle, have become a merge join with no risk of producing duplicates – but the semi-join code path is probably a little more efficient, anyway, and a balance has to be struck between the risk of introducing complexity for a special case and the potential frequency and scale of the benefit it might produce.)

Conclusion

You can often see collections and table functions behaving very like tables when you use them in the from clause of queries – but there are some restrictions on the transformations that the optimizer can use when your query isn’t using “real” tables.

Footnote

There are many ways that you can play around with this starting model to investigate where the boundaries might be. For example, if I make the index on test_objects unique the plan changes to a simple nested loop driven by the unnested collection (there’s no longer a non-mergeable view in the way). If I eliminate the distinct from the original query the same thing happens (for the same reason). If I force the join order to start with the collection (using the leading() hint) but don’t hint a nested loop Oracle produces (at least in my case) a hash join with a Bloom filter that minimised the memory and and CPU requirement.

I mentioned at the start that Timur Akhmadeev and Stefan Koehler supplied explanations for what was going on behind the scenes. Critically Stefan also referenced one of two posts from the Oracle blog on complex view merging and its restrictions: part 1, part 2.

The related problem that led me to re-discover and complete this note is at this URL (published a couple of days ago).

January 12, 2020

Thank You

Filed under: Non-technical — Jonathan Lewis @ 9:16 am GMT Jan 12,2020

Regular readers will have noticed that for the last few weeks I’ve been adding a footnote (now deleted) to any new blog posts asking for donations to the Nepal Youth Foundation (UK), with an offer to match (up to a limit) any donations made by my readers.  The page I had set up on JustGiving to collect donations and describe the work of the foundation reached my limit last week, so I sent off my matching cheque on 6th Jan. I got a very prompt email telling me the cheque had arrived the following day, and thanking me for the effort; then yesterday a follow-up thank you letter arrived in the morning post.

I thought I’d share this with you so that you could see that your generosity has been appreciated by an organisation that really cares about the change it can make to the lives of the people who most need their help:

My JustGiving page is still live, but if you feel at any time that you’d like to donate to a worthwhile cause the NYF(UK) has a JustGiving page of its own that you could donate to directly, and for further information about what they do, here’s a link to their main (UK) website. For other countries there’s a link to the US site with information about other international offices.

January 10, 2020

Collection limitation

Filed under: Execution plans,Hints,Oracle,Troubleshooting — Jonathan Lewis @ 5:43 pm GMT Jan 10,2020

The ODC SQL and PL/SQL forum came up with an example a couple of days ago that highlighted an annoying limitation in the optimizer’s handling of table functions. The requirement was for a piece of SQL that would generate “installments” information from a table of contract agreements and insert into another table any installments that were not yet recorded there.

The mechanism to turn a single row of contract data into a set of installments was a (optionally pipelined) table function that involved some business logic that (presumably) dealt with the timing and size of the installments. The final SQL to create the data that needed to be inserted was reported as follows (though it had clearly been somewhat modified):

select 
        y.*,
        trunc(sysdate,'mm'),
        user
from 
        table_a a
join 
        table(
                my_function(
                        a.loan_acct_nbr, 
                        a.start_dt,
                        a.maturity_dt,
                        a.num_of_terms
                )
        ) y
on 
        a.loan_acct_nbr = y.loan_acct_nbr
where 
        a.yr_mnth_cd = to_char(add_months(sysdate,-1),'yyyymm')       -- last month
and     a.loan_typ   = 'ABC'
and     not exists ( 
                select  1 
                from
                        final_load_table l
                where
                        l.loan_acct_nbr = y.loan_acct_nbr
                and     l.yr_mnth_cd    = y.yr_mnth_cd 
        )
;

In this query, table_a is the set of contracts, final_load_table is the set of installments, and my_function() is the pipelined function returning a table of installments derived from the start date, maturity date, and number of installments for a contract. The query needs a “not exists” subquery to eliminate any installments that are already known to the database. Once this query is operating efficiently it could be used either to drive a PL/SQL loop or (generally the better strategy) to do a simple “insert as select”.

We were told that the function would return around 60 rows per contract; that the starting size of the final_load_table would be about 60M rows, and the size of the result set from the initial join would be about 60M or 70M rows (which suggests about 1M rows acquired from table_a).

The owner of this problem seemed to be saying that the query had returned no data after 4 or 5 hours – which suggests (a) the optimizer chose a bad plan and/or (b) the PL/SQL function is working very inefficiently and/or (c) the join had generated a vast amount of data but the effect of the subquery was to discard all of it .

Clearly we needed to see an execution plan (preferably from SQL Monitor) and be given have some idea of how much of the 60M/70M rows predicted for the join would be discarded because it already existed.

The owner did send us an execution plan – which included a very obvious threat and suggested a couple of optimizer errors – but supplied it as a picture rather than a flat text.

You’ll notice, of course, that the tables and columns have changed from the original statement. More significantly, though, there are serious problems with the numbers – the estimated row count for the basic join is only 5,326 rather than 50 Million which, at first sight, is probably why the optimizer has decided that a filter subquery (operation 1) to do an index-only probe (operation 5) is a good way of handling the subquery. Perhaps if the estimates had been a little more accurate (e.g. through the addition of a couple of column groups or, if necessary, by opt_estimate() or cardinality() hints) the subquery would have been unnested and turned into a hash anti-join.

I suggested a quick test of a suitable cardinality() hint – but ran up a little model to check that I’d got the hint right – and found that I had but it wasn’t going to help. So I decided to write up the model (and a possible solution for the owner of the problem) in this blog note.

Here’s the code to create the set of objects I’m going to work with. The naming follows the naming in the original statement of the problem suggested by the owner:


rem
rem     Script:         table_function_plan.sql
rem     Author:         Jonathan Lewis
rem     Dated:          Jan 2020
rem     Purpose:        
rem
rem     Last tested 
rem             19.3.0.0
rem             12.2.0.1
rem

create table table_a (
        loan_acct_nbr   number,
        yr_mnth_cd      varchar2(6),
        start_dt        date,
        maturity_dt     date    ,
        number_of_terms number,
        constraint ta_pk primary key(loan_acct_nbr, yr_mnth_cd)
)
;

execute dbms_random.seed(0)

insert /*+ append */
into    table_a
with generator as (
        select rownum id
        from    dual
        connect by
                level <= 4000 -- > comment to avoid wordpress issue
)
select
        trunc(dbms_random.value(1e9, 2e9)),
        to_char(sysdate-(365-mod(rownum,365)),'yyyymm'),
        sysdate-(365-mod(rownum,365)),
        sysdate+(1500+mod(rownum,365)),
        60
from
        generator       v1,
        generator       v2
where
        rownum <= 1e4 -- > comment to avoid WordPress format issue
;

create table final_load_table_l(
        loan_acct_nbr   number,
        yr_mnth_cd      varchar2(6),
        v1              varchar2(10),
        padding         varchar2(200),
        constraint lt_pk primary key(loan_acct_nbr, yr_mnth_cd)
)
;

insert /*+ append */ into final_load_table_l
with generator as (
        select rownum id
        from    dual
        connect by
                level <= 4000 -- > comment to avoid wordpress issue
)
select
        trunc(dbms_random.value(1e9, 2e9)),
        to_char(sysdate-(365-mod(rownum,365)),'yyyymm'),
        lpad(rownum,10),
        lpad('x',200,'x')
from
        generator       v1,
        generator       v2
where
        rownum <= 1e4 -- > comment to avoid WordPress format issue
;


begin
        dbms_stats.gather_table_stats(
                ownname     => null,
                tabname     => 'table_a',
                method_opt  => 'for all columns size 1'
        );

        dbms_stats.gather_table_stats(
                ownname     => null,
                tabname     => 'final_load_table_l',
                method_opt  => 'for all columns size 1'
        );
end;
/

create type my_row_type as object (
        loan_acct_nbr   number,
        yr_mnth_cd      varchar2(6),
        v1              varchar2(10),
        padding         varchar2(200)
);
/

create type my_table_type as table of my_row_type;
/

create  or replace function my_function (
        i_loan_acct_nbr         in      number,
        i_yr_mnth_cd            in      varchar2,
        i_start_dt              in      date,
        i_maturity_dt           in      date,
        i_number_of_terms       in      number
)       return  my_table_type pipelined
as
begin
        for i in 1..i_number_of_terms loop
                pipe row (
                        my_row_type(
                                i_loan_acct_nbr,
                                to_char(i_start_dt+32*i,'yyyymm'),
                                i,
                                lpad('x',200,'x')
                        )
                );
        end loop;
        return;
end;
/

I was planning to create some large tables – hence the option to generate 16M rows from my generator CTEs – but I discovered the critical problem almost as soon as I had some data and code in place, so I didn’t need to go large.

I’ve had to create an object type and table type in order to create a pipelined function that returns the table type by piping rows of the object type. The data I’ve created, and the way the function generates data probably doesn’t bear much resemblance to the real system of course, but I don’t think it needs to be very close to make the crucial point.

Here’s the text of the select statement the OP wants to run, with the execution plan I got from my data set after running the query and pulling the plan from memory:

alter session set statistics_level = all;

select 
        /*+ find this 1 */
        y.* 
from 
        table_a a, 
        table(my_function(
                a.loan_acct_nbr,
                a.yr_mnth_cd,
                a.start_dt,
                a.maturity_dt,
                a.number_of_terms
        )) y
where 
        a.yr_mnth_cd = '202001'
and     not exists (
                select  
                        /*+ unnest */
                        null
                from    final_load_table_l l
                where   l.loan_acct_nbr = y.loan_acct_nbr
                and     l.yr_mnth_cd    = y.yr_mnth_cd
        )
;

--------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                           | Name        | Starts | E-Rows | Cost (%CPU)| A-Rows |   A-Time   | Buffers |
--------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                    |             |      1 |        |  5059K(100)|  14580 |00:00:00.15 |   16330 |
|*  1 |  FILTER                             |             |      1 |        |            |  14580 |00:00:00.15 |   16330 |
|   2 |   NESTED LOOPS                      |             |      1 |   6283K| 10421   (8)|  14580 |00:00:00.11 |     335 |
|*  3 |    TABLE ACCESS FULL                | TABLE_A     |      1 |    769 |    10  (10)|    243 |00:00:00.01 |     297 |
|   4 |    COLLECTION ITERATOR PICKLER FETCH| MY_FUNCTION |    243 |   8168 |    14   (8)|  14580 |00:00:00.10 |      38 |
|*  5 |   INDEX UNIQUE SCAN                 | LT_PK       |  14580 |      1 |     1   (0)|      0 |00:00:00.02 |   15995 |
--------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter( IS NULL)
   3 - filter("A"."YR_MNTH_CD"='202001')
   5 - access("L"."LOAN_ACCT_NBR"=:B1 AND "L"."YR_MNTH_CD"=:B2)


I’ve put in a hint to tell the optimizer to unnest the subquery – and it didn’t. Oracle does not ignore hints (unless they’re illegal, or out of context, or the optimizer never got to them, or you’ve found a bug) so why did Oracle appear to ignore this hint? There’s a really nice feature in 19.3 execution plans – you can request a hint report for a plan, and here’s the hint report for this query (ignore the bits about “find this” being an error):


Hint Report (identified by operation id / Query Block Name / Object Alias):
Total hints for statement: 3 (U - Unused (1), E - Syntax error (2))
---------------------------------------------------------------------------
   0 -  SEL$1
         E -  find
         E -  this

   5 -  SEL$3
         U -  unnest / Invalid correlated predicates


I put in an /*+ unnest */ hint to unnest the subquery, and I’ve been told that the predicates are not valid. The only thing about them that could be invalid is that they come from a pipelined function that has returned an object type. The pipelined function does not behave exactly like a table. But this gives me a clue about forcing the unnest to happen – hide the pipelined function inside a non-mergeable view.


select
        /*+ find this 2 */
        v.*
from    (
        select  /*+ no_merge */
                y.* 
        from 
                table_a a, 
                table(my_function(
                        a.loan_acct_nbr,
                        a.yr_mnth_cd,
                        a.start_dt,
                        a.maturity_dt,
                        a.number_of_terms
                )) y
        where 
                a.yr_mnth_cd = '202001'
        )       v
where   not exists (
                select
                        /*+ unnest */
                        null
                from    final_load_table_l l
                where   l.loan_acct_nbr = v.loan_acct_nbr
                and     l.yr_mnth_cd    = v.yr_mnth_cd
        )
/


------------------------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                            | Name        | Starts | E-Rows | Cost (%CPU)| A-Rows |   A-Time   | Buffers |  OMem |  1Mem | Used-Mem |
------------------------------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                     |             |      1 |        | 10628 (100)|  14580 |00:00:00.12 |     387 |       |       |          |
|*  1 |  HASH JOIN RIGHT ANTI                |             |      1 |   6283K| 10628  (10)|  14580 |00:00:00.12 |     387 |  1878K|  1878K| 2156K (0)|
|   2 |   INDEX FAST FULL SCAN               | LT_PK       |      1 |  10000 |     6  (17)|  10000 |00:00:00.01 |      91 |       |       |          |
|   3 |   VIEW                               |             |      1 |   6283K| 10371   (8)|  14580 |00:00:00.11 |     296 |       |       |          |
|   4 |    NESTED LOOPS                      |             |      1 |   6283K| 10371   (8)|  14580 |00:00:00.10 |     296 |       |       |          |
|*  5 |     TABLE ACCESS FULL                | TABLE_A     |      1 |    769 |    10  (10)|    243 |00:00:00.01 |     296 |       |       |          |
|   6 |     COLLECTION ITERATOR PICKLER FETCH| MY_FUNCTION |    243 |   8168 |    13   (0)|  14580 |00:00:00.10 |       0 |       |       |          |
------------------------------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - access("L"."LOAN_ACCT_NBR"="V"."LOAN_ACCT_NBR" AND "L"."YR_MNTH_CD"="V"."YR_MNTH_CD")
   5 - filter("A"."YR_MNTH_CD"='202001')

I’ve wrapped the basic join between table_a and the table function in an inline view called (unimaginatively) v, and I’ve added a /*+ no_merge */ hint to that inline view. So the main query becomes a select from a non-mergeable view with a “not exists” subquery applied to a couple of view columns. And Oracle thinks that that’s okay – and my unnest hint encouraged the optimizer to use a hash anti-join.

So here’s an option for the owner of the problem – but with a few outstanding questions: will a rewrite of their query in this form do the same, will the performance of the select be good enough, and will the subsequent “insert as select” keep the same driving plan.

There’s one significant detail to worry about: the build table in this hash (anti-)join is going to be constructed from 50M (load_acct_bfr, yr_mnth_cd) pairs – which means the workarea requirement is likely to be about 1.2GB for an optimial (i.e. in-memory) hash join; otherwise the join may spill to disc and do a lot of I/O – probably as a one-pass hash join.

(Did you notice,by the way, that the word storage appeared at operation 3 in the original plan?  That suggests a nice big Exadata box; however, there’s no storage predicate in the Predicate Information section for that operation and you would have thought that lease_type = ‘US_SSP’ would be pushed to storage, so maybe this is a ZFS Pillar backing a less exotic server.)

Conclusion

Some (if not all) types of correlated subqueries behave badly if the correlation predicates involve table functions. But you may be able to work around the issue by hiding part of the query, including the table function, inside a non-mergeable inline view before applying the subquery to the view.

Footnote

When I realised that the table function was causing a problem unnesting I remembered that I had written about a similar problem a few years ago – after searching for a little while I discovered a draft note that I had started in September 2015 but had not quite finished; so I’ll be finishing it off and publishing it some time in the next few days. (Update: now published)

 

 

January 3, 2020

push_having_to_gby() – 2

Filed under: 18c,Execution plans,Hints,Oracle — Jonathan Lewis @ 11:31 am GMT Jan 3,2020

The problem with finding something new and fiddling with it and checking to see how you can best use it to advantage is that you sometimes manage to “break” it very quickly. In yesterday’s blog note I introduced the /*+ push_having_to_gby(@qbname) */ hint and explained why it was a useful little enhancement. I also showed a funny little glitch with a missing predicate in the execution plan.

Today I thought I’d do something a little more complex with the example I produced yesterday, and I’ve ended up with a little note that’s not actually about the hint, it’s about something that appeared in my initial testing of the hint, and then broke when I pushed it a little further. Here’s a script to create data for the new test:

rem
rem     Script:         push_having_2.sql
rem     Author:         Jonathan Lewis
rem     Dated:          Dec 2019
rem     Purpose:        
rem
rem     Last tested 
rem             19.3.0.0
rem

create table t1
nologging
as
with generator as (
        select 
                rownum id
        from dual 
        connect by 
                level <= 1e4 -- > comment to avoid WordPress format issue
)
select
        rownum                          id,
        lpad(rownum,10,'0')             v1,
        lpad('x',50,'x')                padding
from
        generator       v1,
        generator       v2
where
        rownum <= 1e6 -- > comment to avoid WordPress format issue
;

insert into t1 values (2, lpad(2,10,'0'), lpad('x',50,'x'));
commit;

alter table t1 modify id not null;
create index t1_i1 on t1(id) nologging;

create table t2 as select * from t1;
create index t2_i1 on t2(id) nologging;

I’ve created two tables here, one a clone of the other, with one id value out of 1 million having two rows. As we saw yesterday it’s quite simple to write some SQL that uses an index full scan on the t1_i1 index to check for duplicate id values without doing a massive sort or hash aggregation:


set serveroutput off
alter session set statistics_level = all;

select
        /*+
                qb_name(driver)
                index(@driver t1@driver)
        */
        id 
from
        t1
where   id is not null
group by 
        id 
having  
        count(1) > 1
;

select * from table(dbms_xplan.display_cursor(null,null,'allstats last'));


-------------------------------------------------------------------------------------------------
| Id  | Operation            | Name  | Starts | E-Rows | A-Rows |   A-Time   | Buffers | Reads  |
-------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT     |       |      1 |        |      1 |00:00:00.87 |    2229 |   2228 |
|   1 |  SORT GROUP BY NOSORT|       |      1 |  50000 |      1 |00:00:00.87 |    2229 |   2228 |
|   2 |   INDEX FULL SCAN    | T1_I1 |      1 |   1000K|   1000K|00:00:00.40 |    2229 |   2228 |
-------------------------------------------------------------------------------------------------

As we saw yesterday this plan simply walks the index in order keeping track of a “running count” and doesn’t allocate a large PGA to sort a million rows of data, but there’s no asterisk by any operation telling us that there’s a predicate being checked, and no Predicate Information section to report the “count(1) > 1” predicate that we know exists (and is used, since the query produces the right answer).

Having ascertained that there is one duplicated id in the table, let’s join to the (clone) t2 table to list the rows for that id – and lets use the initial query as an inline view:

select
        /*+ 
                qb_name(main)
        */
        t2.v1
from    (
        select
                /*+
                        qb_name(driver)
                        index(@driver t1@driver)
                        no_use_hash_aggregation(@driver)
                */
                id 
        from
                t1
        where   id is not null
        group by 
                id 
        having  
                count(1) > 1
        )                       v1,
        t2
where
        t2.id = v1.id
;

select * from table(dbms_xplan.display_cursor(null,null,'allstats last'));

------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                    | Name  | Starts | E-Rows | A-Rows |   A-Time   | Buffers | Reads  |  OMem |  1Mem | Used-Mem |
------------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |       |      1 |        |      2 |00:00:00.76 |    2234 |     87 |       |       |          |
|   1 |  NESTED LOOPS                |       |      1 |  50000 |      2 |00:00:00.76 |    2234 |     87 |       |       |          |
|   2 |   NESTED LOOPS               |       |      1 |        |      2 |00:00:00.75 |    2232 |     28 |       |       |          |
|   3 |    VIEW                      |       |      1 |  50000 |      1 |00:00:00.75 |    2228 |      0 |       |       |          |
|*  4 |     SORT GROUP BY            |       |      1 |  50000 |      1 |00:00:00.75 |    2228 |      0 |    53M|  2539K|   47M (0)|
|   5 |      INDEX FULL SCAN         | T1_I1 |      1 |   1000K|   1000K|00:00:00.26 |    2228 |      0 |       |       |          |
|*  6 |    INDEX RANGE SCAN          | T2_I1 |      1 |        |      2 |00:00:00.01 |       4 |     28 |       |       |          |
|   7 |   TABLE ACCESS BY INDEX ROWID| T2    |      2 |      1 |      2 |00:00:00.01 |       2 |     59 |       |       |          |
------------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   4 - filter(COUNT(*)>1)
   6 - access("T2"."ID"="V1"."ID")

As you can see from this plan, I didn’t get the “sort group by nosort” that I wanted – even though the inline view was not merged. In fact, you’ll notice the /*+ no_use_hash_aggregation() */ hint I had to include to get a sort group by rather than a hash group by. The logic behind resolving this query block changed significantly when it went into a more complex query.

Having tried adding several other hints (blocking nlj_prefetch, nlj_batching, batched index access, setting cardinality to 1, first_rows(1) optimisation) I finally came down to using a materialized CTE (common table expression / “with” subquery):

with v1 as (
        select
                /*+
                        qb_name(driver)
                        index(@driver t1@driver)
                        materialize
                */
                id 
        from
                t1
        where
                id is not null
        group by 
                id 
        having  
                count(1) > 1
)
select
        /*+ 
                qb_name(main)
        */
        t2.v1
from    
        v1,
        t2
where
        t2.id = v1.id
;

select * from table(dbms_xplan.display_cursor(null,null,'allstats last'));

---------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                                | Name                       | Starts | E-Rows | A-Rows |   A-Time   | Buffers |
---------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                         |                            |      1 |        |      2 |00:00:00.86 |    2236 |
|   1 |  TEMP TABLE TRANSFORMATION               |                            |      1 |        |      2 |00:00:00.86 |    2236 |
|   2 |   LOAD AS SELECT (CURSOR DURATION MEMORY)| SYS_TEMP_0FD9D66F8_E3B235A |      1 |        |      0 |00:00:00.86 |    2229 |
|   3 |    SORT GROUP BY NOSORT                  |                            |      1 |  50000 |      1 |00:00:00.86 |    2228 |
|   4 |     INDEX FULL SCAN                      | T1_I1                      |      1 |   1000K|   1000K|00:00:00.39 |    2228 |
|   5 |   NESTED LOOPS                           |                            |      1 |  50000 |      2 |00:00:00.01 |       6 |
|   6 |    NESTED LOOPS                          |                            |      1 |        |      2 |00:00:00.01 |       4 |
|   7 |     VIEW                                 |                            |      1 |  50000 |      1 |00:00:00.01 |       0 |
|   8 |      TABLE ACCESS FULL                   | SYS_TEMP_0FD9D66F8_E3B235A |      1 |  50000 |      1 |00:00:00.01 |       0 |
|*  9 |     INDEX RANGE SCAN                     | T2_I1                      |      1 |        |      2 |00:00:00.01 |       4 |
|  10 |    TABLE ACCESS BY INDEX ROWID           | T2                         |      2 |      1 |      2 |00:00:00.01 |       2 |
---------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   9 - access("T2"."ID"="V1"."ID")

You’ll notice that the hinting is back to the bare minimum – with only the addition of the /*+ materialize */ hint in the CTE. You’ll also notice that the “count(1) > 1” predicate is still missing. But critically we do have the index full scan leading into a sort group by nosort and no huge memory allocation.

The price we have to pay is that we do direct path writes to the temporary tablespace to materialize the CTE and db file scattered reads to read the data back. But since this example is aimed at a large data set returning a small result set this may be a highly appropriate trade off.

It’s possible that a detailed examination of the 10053 trace file would give us a clue about why Oracle can find the sort group by nosort when the query block is a materialized CTE but not when it’s an inline view – but I’m happy to leave that investigation to someone else and just leave this here as a warning that sometimes (even in 19c) there’s a difference between a non-merged view path and a materizlied subquery path.

 

January 2, 2020

push_having_to_gby()

Filed under: 18c,Execution plans,Oracle — Jonathan Lewis @ 3:36 pm GMT Jan 2,2020

I came across an interesting new hint recently when checking the Outline Data for an execution plan: /*+ push_having_to_gby() */  It’s an example of a “small” change designed to reduce CPU usage by reducing the volume of data that passes through the layers of calls that an execution plan represents. The hint appeared in 18.3 but I’ve run the following on 19.3 as a demonstration of what it does and why it’s a good thing:

rem
rem     Script:         push_having.sql
rem     Author:         Jonathan Lewis
rem     Dated:          Dec 2019
rem     Purpose:        
rem
rem     Last tested 
rem             19.3.0.0
rem
rem     Notes:
rem     New (18c) push_having_to_gby() hint
rem     Avoids one pipeline (group by to filter) in
rem     execution plans.
rem

create table t1
nologging
as
with generator as (
        select 
                rownum id
        from dual 
        connect by 
                level <= 1e4 -- > comment to avoid WordPress format issue
)
select
        rownum                          id,
        lpad(rownum,10,'0')             v1,
        lpad('x',50,'x')                padding
from
        generator       v1,
        generator       v2
where
        rownum <= 1e6 -- > comment to avoid WordPress format issue
;

alter table t1 modify id not null;
create index t1_i1 on t1(id) nologging;

set serveroutput off
alter session set statistics_level = all;

select
        /*+
                qb_name(driver)
        */
        id 
from
        t1
where   id is not null
group by 
        id 
having  
        count(1) > 1
;

select * from table(dbms_xplan.display_cursor(null,null,'allstats last'));

There aren’t very many options for the execution path for this query, and the default path taken on my database was an imdex fast full scan with hash aggregation:


-----------------------------------------------------------------------------------------------------------------------------
| Id  | Operation             | Name  | Starts | E-Rows | A-Rows |   A-Time   | Buffers | Reads  |  OMem |  1Mem | Used-Mem |
-----------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT      |       |      1 |        |      0 |00:00:00.68 |    2238 |   2230 |       |       |          |
|*  1 |  HASH GROUP BY        |       |      1 |  50000 |      0 |00:00:00.68 |    2238 |   2230 |    55M|  7913K|   57M (0)|
|   2 |   INDEX FAST FULL SCAN| T1_I1 |      1 |   1000K|   1000K|00:00:00.20 |    2238 |   2230 |       |       |          |
-----------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter(COUNT(*)>1)

You’ll notice that the query should return no rows – the way I’ve generated the id means it’s unique even though I haven’t declared a unique constraint/index. DId you also notice the common guess (5%) that the optimizer has used for the selectivity of the having clause ? But have you spotted the 18c enhancement yet ? If not, we’ll get to it in a moment.

It just so happens that I know there is a better execution path than this for this specific query with my specific data set, so I’m going to put in a minimalist hint to tell the optimizer about it, just to see what happens. The data is very well organized, so using an index scan with running total will be significantly more efficient than a big hash group by:


select
        /*+
                qb_name(driver)
                index(@driver t1@driver)
        */
        id 
from
        t1
where   id is not null
group by 
        id 
having  
        count(1) > 1
;

----------------------------------------------------------------------------------------
| Id  | Operation            | Name  | Starts | E-Rows | A-Rows |   A-Time   | Buffers |
----------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT     |       |      1 |        |      0 |00:00:00.80 |    2228 |
|   1 |  SORT GROUP BY NOSORT|       |      1 |  50000 |      0 |00:00:00.80 |    2228 |
|   2 |   INDEX FULL SCAN    | T1_I1 |      1 |   1000K|   1000K|00:00:00.40 |    2228 |
----------------------------------------------------------------------------------------

Notice how the optimizer has obeyed my /*+ index(t1) */ hint and used an index full scan to walk through the t1_i1 index in order, doing a “sort group by” which doesn’t need to do any sorting, so its effectively using a simple running total to count repetitions. The timing (A-time) difference isn’t really something to trust closely when dealing with brief time periods and rowsource_execution_statistics, but eliminating 57M of PGA allocation for the hash join SQL workarea might be a significant benefit. But there’s something else to be seen in this plan – if you can manage to see the things that aren’t there.

So let’s push this query back to its 12c plan:

select
        /*+
                qb_name(driver)
                index(@driver t1@driver)
                optimizer_features_enable('12.2.0.1')
        */
        id 
from
        t1
where   id is not null
group by 
        id 
having  
        count(1) > 1
;

-----------------------------------------------------------------------------------------
| Id  | Operation             | Name  | Starts | E-Rows | A-Rows |   A-Time   | Buffers |
-----------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT      |       |      1 |        |      0 |00:00:01.02 |    2228 |
|*  1 |  FILTER               |       |      1 |        |      0 |00:00:01.02 |    2228 |
|   2 |   SORT GROUP BY NOSORT|       |      1 |  50000 |   1000K|00:00:00.93 |    2228 |
|   3 |    INDEX FULL SCAN    | T1_I1 |      1 |   1000K|   1000K|00:00:00.45 |    2228 |
-----------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter(COUNT(*)>1)

Notice the FILTER that appears as operation 1. Oracle generates the aggregate data (which happens to total 1M rows) at the Sort Group By (whether it’s the Nosort option from the index full scan, or the “proper” hash group by from the index fast full scan) and passes the 1M row result set (estimated at 50K rows) up to the parent operation where the filter takes place. In 18c onwards the separate filter operation disappears and the filtering takes place as part of the aggregation. This is probably a good thing but if you ever want to disable it without switching everything back to the 12c optimizer features then there’s a dedicated hint: (no_)push_having_to_gby():


select
        /*+
                qb_name(driver)
                index(@driver t1@driver)
                no_push_having_to_gby(@driver)
        */
        id 
from
        t1
where   id is not null
group by 
        id 
having  
        count(1) > 1
;

-----------------------------------------------------------------------------------------
| Id  | Operation             | Name  | Starts | E-Rows | A-Rows |   A-Time   | Buffers |
-----------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT      |       |      1 |        |      0 |00:00:01.05 |    2228 |
|*  1 |  FILTER               |       |      1 |        |      0 |00:00:01.05 |    2228 |
|   2 |   SORT GROUP BY NOSORT|       |      1 |  50000 |   1000K|00:00:00.91 |    2228 |
|   3 |    INDEX FULL SCAN    | T1_I1 |      1 |   1000K|   1000K|00:00:00.36 |    2228 |
-----------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter(COUNT(*)>1)

If you were looking very carefully (especially after my comment about “seeing things that aren’t there”) you may have noticed that there’s an odd detail in the example where I’ve hinted the index without blocking the push. There’s no Predicate Information section in that execution plan – and that wasn’t a mistake on my part – it simply doesn’t exist – and you’ll note that there’s no asterisk by any of the Operation lines to remind you that there should be should be some Predicate Information! The “count(1) > 1” just doesn’t get reported (even though it does get reported if you use dbms_xplan.display() after a call to explain plan). Fortunately, however, when I modified the model to duplicate one of the rows I did get the correct result – so even though the predicate is not reported it is still applied.  [Side Note: did you notice that my original count(1) changes to count(*) when it gets to the Predicate Information. People still ask which is faster, count(1) or count(*) – the argument should have died back in Oracle 7 days.]

Summary

18c introduced a new optimisation that pushes a “having” predicate down into a group by operation. This reduces CPU usage by eliminating the need to pass a potentially large result from a child operation up to a parent filter operation. Unfortunately you may find that the predicate becomes invisible when you pull the execution plan from memory.

In the unlikely event that you manage to find a case where this optimisation is used when it would have been better to bypass it then there is a hint /*+ no_push_having_to_gby(@qbname) */ to block it, and if it doesn’t appear when you think it should then the opposite hint /*+ push_having_to_gby(@qbname) */ is available.

 

 

December 30, 2019

Scalar Subq Bug

Filed under: CBO,dbms_xplan,Execution plans,Oracle,subqueries — Jonathan Lewis @ 9:30 am GMT Dec 30,2019

This is an observation that came up on the Oracle Developer Forum a couple of days ago, starting life as the fairly common problem:

I have a “select” that runs quickly  but when I use in a “create as select” it runs very slowly.

In many cases this simply means that the query was a distributed query and the plan changed because the driving site changed from the remote to the local server. There are a couple of other reasons, but distributed DML is the one most commonly seen.

In this example, though, the query was not a distributed query, it was a fully local query. There were three features to the query that were possibly suspect, though:

  • “ANSI” syntax
  • scalar subqueries in the select list
  • redundant “order by” clauses in inline views

The OP had supplied the (horrible) SQL in a text format along with images from the Enterprise Manager SQL Monitor screen showing the two execution plans and two things were  obvious from the plans – first that the simple select had eliminated the scalar subqueries (which were redundant) while the CTAS had kept them in the plan, and secondly most of the elapsed time for the CTAS was spent in lots of executions of the scalar subqueries.

My first thought was that the problem was probably a quirk of how the optimizer translates “ANSI” SQL to Oracle-standard SQL, so I created a model that captured the key features of the problem – starting with 3 tables:

rem
rem     Script:         ctas_scalar_subq.sql
rem     Author:         Jonathan Lewis
rem     Dated:          Dec 2019
rem     Purpose:        
rem
rem     Last tested 
rem             19.3.0.0
rem             12.2.0.1
rem             11.2.0.4
rem

create table t1 as
select * from all_objects
where rownum <= 10000 -- > comment to avoid wordpress format issue
;

alter table t1 add constraint t1_pk primary key(object_id);

create table t2 as
select * from t1
;

alter table t2 add constraint t2_pk primary key(object_id);

create table t3 as
select * from all_objects
where rownum <= 500 -- > comment to avoid wordpress format issue
;

alter table t3 add constraint t3_pk primary key(object_id);

begin
        dbms_stats.gather_table_stats(
                ownname     => null,
                tabname     => 'T1',
                method_opt  => 'for all columns size 1'
        );

        dbms_stats.gather_table_stats(
                ownname     => null,
                tabname     => 'T2',
                method_opt  => 'for all columns size 1'
        );

        dbms_stats.gather_table_stats(
                ownname     => null,
                tabname     => 'T3',
                method_opt  => 'for all columns size 1'
        );
end;
/

I’m going to use the small t3 table as the target for a simple scalar subquery in the select list of a query that selects some columns from t2; then I’m going to use that query as an inline view in a join to t1 and select some columns from the result. Here’s the starting query that’s going to become an inline view:


select 
        t2.*,
        (
        select  t3.object_type 
        from    t3 
        where   t3.object_id = t2.object_id
        )       t3_type
from
        t2
order by
        t2.object_id
;

And here’s how I join the result to t1:


explain plan for
        select
                v2.*
        from    (
                select
                        t1.object_id,
                        t1.object_name  t1_name,
                        v1.object_name  t2_name,
                        t1.object_type  t1_type,
                        v1.object_type  t2_type
                from
                        t1
                join (
                        select 
                                t2.*,
                                (
                                select  t3.object_type 
                                from    t3 
                                where   t3.object_id = t2.object_id
                                )       t3_type
                        from
                                t2
                        order by
                                t2.object_id
                )       v1
                on
                        v1.object_id = t1.object_id
                and     v1.object_type = 'TABLE'
                )       v2
;

select * from table(dbms_xplan.display(null,null,'outline alias'));

The initial t2 query becomes an inline view called v1, and that becomes the second table in a join with t1. I’ve got the table and view in this order because initially the OP had an outer (left) join preserving t1 and I thought that that might be significant, but it turned out that it wasn’t.

Having joined t1 and v1 I’ve selected a small number of columns from the t1 and t2 tables and ignored the column that was generated by the inline scalar subquery. (This may seem a little stupid – but the same problem appears when the inline view is replaced with a stored view, which is a more realistic possibility.) Here’s the resulting execution plan (taken from 11.2.0.4 in this case):


-----------------------------------------------------------------------------
| Id  | Operation            | Name | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------
|   0 | SELECT STATEMENT     |      |   476 | 31416 |    45  (12)| 00:00:01 |
|*  1 |  HASH JOIN           |      |   476 | 31416 |    45  (12)| 00:00:01 |
|   2 |   VIEW               |      |   476 | 15708 |    23  (14)| 00:00:01 |
|   3 |    SORT ORDER BY     |      |   476 | 41888 |    23  (14)| 00:00:01 |
|*  4 |     TABLE ACCESS FULL| T2   |   476 | 41888 |    22  (10)| 00:00:01 |
|   5 |   TABLE ACCESS FULL  | T1   | 10000 |   322K|    21   (5)| 00:00:01 |
-----------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - access("V1"."OBJECT_ID"="T1"."OBJECT_ID")
   4 - filter("T2"."OBJECT_TYPE"='TABLE')

I was a little surprised by this plan as I had expected the optimizer to eliminate the in-line “order by” in view v1 – but even when I changed the code to traditional Oracle join syntax the redundant and wasteful sort at operaton 3 still took place. (You might note that the data will be reported in an order dictated by the order of the data arriving from the t1 tablescan thanks to the mechanism of the hash join, so the sort is a total waste of effort.)

The plus point, of course, is that the optimizer had been smart enough to eliminate the scalar subquery referencing t3. The value returned from t3 is not needed anywhere in the course of the execution, so it simply disappears.

Now we change from a simple select to a Create as Select which I’ve run, with rowsource execution stats enabled, using Oracle 19.3 for this output:

set serveroutput off
set linesize 156
set trimspool on
set pagesize 60

alter session set statistics_level = all;

create table t4 as
        select  
                v2.*
        from    (
                select
                        t1.object_id,
                        t1.object_name  t1_name,
                        v1.object_name  t2_name,
                        t1.object_type  t1_type,
                        v1.object_type  t2_type
                from
                        t1
                join (
                        select 
                                t2.*,
                                (
                                select  t3.object_type 
                                from    t3 
                                where   t3.object_id = t2.object_id
                                )       t3_type
                        from
                                t2
                        order by 
                                t2.object_id
                )       v1
                on
                        v1.object_id = t1.object_id
                and     v1.object_type = 'TABLE'
                )       v2
;

select * from table(dbms_xplan.display_cursor(null,null,'allstats last'));

alter session set statistics_level = typical;

And here’s the run-time execution plan – showing the critical error and statistics to prove that it really happened:

----------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                        | Name  | Starts | E-Rows | A-Rows |   A-Time   | Buffers | Writes |  OMem |  1Mem | Used-Mem |
----------------------------------------------------------------------------------------------------------------------------------------
|   0 | CREATE TABLE STATEMENT           |       |      1 |        |      0 |00:00:00.01 |     471 |      3 |       |       |          |
|   1 |  LOAD AS SELECT                  | T4    |      1 |        |      0 |00:00:00.01 |     471 |      3 |  1042K|  1042K| 1042K (0)|
|   2 |   OPTIMIZER STATISTICS GATHERING |       |      1 |    435 |    294 |00:00:00.01 |     414 |      0 |   256K|   256K|  640K (0)|
|*  3 |    HASH JOIN                     |       |      1 |    435 |    294 |00:00:00.01 |     414 |      0 |  1265K|  1265K| 1375K (0)|
|   4 |     VIEW                         |       |      1 |    435 |    294 |00:00:00.01 |     234 |      0 |       |       |          |
|   5 |      TABLE ACCESS BY INDEX ROWID | T3    |    294 |      1 |     50 |00:00:00.01 |      54 |      0 |       |       |          |
|*  6 |       INDEX UNIQUE SCAN          | T3_PK |    294 |      1 |     50 |00:00:00.01 |       4 |      0 |       |       |          |
|   7 |      SORT ORDER BY               |       |      1 |    435 |    294 |00:00:00.01 |     234 |      0 | 80896 | 80896 |71680  (0)|
|*  8 |       TABLE ACCESS FULL          | T2    |      1 |    435 |    294 |00:00:00.01 |     180 |      0 |       |       |          |
|   9 |     TABLE ACCESS FULL            | T1    |      1 |  10000 |  10000 |00:00:00.01 |     180 |      0 |       |       |          |
----------------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   3 - access("V1"."OBJECT_ID"="T1"."OBJECT_ID")
   6 - access("T3"."OBJECT_ID"=:B1)
   8 - filter("T2"."OBJECT_TYPE"='TABLE')

You’ll notice that the VIEW at operation 4 reports the inline scalar subquery as operations 5 and 6, and the Starts column show that the scalar subquery executes 294 times – which is the number of rows returned by the scan of table t2. Although my first thought was that this was an artefact of the transformation from ANSI to Oracle syntax it turned out that when I modified the two statements to use traditional Oracle syntax the same difference appeared. Finally I re-ran the CTAS after removing the order by clause in the in-line view and the redundant subquery disappeared from the execution plan.

Tiny Geek bit

It’s not immediately obvious why there should be such a difference between the select and the CTAS in this case, but the 10053 trace files do give a couple of tiny clues the CTAS trace file includes the lines:

ORE: bypassed - Top query block of a DML.
TE: Bypassed: Top query block of a DML.
SQT:    SQT bypassed: in a transaction.

The first two suggest that we should expect some cases where DML statement optimise differently from simple queries. The last one is a further indication that differences may appear. (SQT – might this be subquery transformation, it doesn’t appear in the list of abbreviations in the trace file).

Unfortunately the SELECT trace file also included the line:


SQT:     SQT bypassed: Disabled by parameter.

So “SQT” – whatever that is – being in or out of a transaction may not have anything to do with the difference.

Summary

There are cases where optimising a select statement is not sufficient as a strategy for optimising a CTAS statement. In this case it looks as if an inline view which was non-mergable (thanks to a redundant order by clause) produced the unexpected side-effect that a completely redundant scalar subquery in the select list of the inline view was executed during the CTAS even though it was transformed out of existence for the simple select.

There are some unexpected performance threats in “cut-and-paste” coding and in re-using stored views if you haven’t checked carefully what they do and how they’re supposed to be used.

 

 

December 24, 2019

Flashback Archive

Filed under: Bugs,Flashback,Oracle — Jonathan Lewis @ 8:33 pm GMT Dec 24,2019

A classic example of Oracle’s “mix and match” problem showed up on the Oracle Developer Forum a few days ago. Sometimes you see two features that are going to be really helpful in your application – and when you combine them something breaks. In this case it was the combination of Virtual Private Database (VPD/FGAC/RLS) and Flashback Data Archive (FDA/FBA) that resulted in the security predicate not being applied the way you would expect, hence allowing users to see data they were not supposed to see.

The OP supplied us with a model (based in part on Tim Hall’s FDA article) to demonstrate the issue on 11.2.0.4, and I’ve hacked it about a bit to explain it here, and to test it on 12.2.0.1 and 19.3.0.0 where the same failure occurs.

I’m going to start with just the VPD part of the setup before adding in the FDA. Most of the code has been written to run as the SYS user and it creates a new tablespace and a couple of users so you may want to do some editing before you try any tests. There’s also a short script at the end of the blog to remove the flashback data archive, tablespace, and users – again, something to be run by SYS.

You’ll note that this script assumes you already have a tablespace called test_8k_assm, and a temporary tablespace called temp. The latter may well be a valid assumption, the former probably isn’t.

rem
rem     Script:         vpd_fda_bug.sql
rem     Author:         Duncs (ODC)
rem     Dated:          Dec 2019
rem
rem     Last tested
rem             19.3.0.0
rem             12.2.0.1
rem             11.2.0.4
rem
rem     Notes
rem     Has to be run as SYS
rem
rem     See also
rem     https://community.oracle.com/thread/4307453
rem


create user vpd_test_data_owner identified by Password_1234 
        default tablespace test_8k_assm
        temporary tablespace temp 
        quota unlimited on test_8k_assm
;
 
grant 
        connect,
        resource, 
        create any context
to
        vpd_test_data_owner
;

grant
        execute on dbms_rls
to
        vpd_test_data_owner
;
 
 
create table vpd_test_data_owner.person (
        person_id       number, 
        surname         varchar2(30), 
        unit_id         number
);

insert into  vpd_test_data_owner.person values (-1, 'One',  -1);
insert into  vpd_test_data_owner.person values (.2, 'Two',  -2);
insert into  vpd_test_data_owner.person values (.3, 'Three',-3);
insert into  vpd_test_data_owner.person values (-4, 'Four', -4);
insert into  vpd_test_data_owner.person values (-5, 'Five', -5);

commit;

create user vpd_test_function_owner identified by Password_1234
        default tablespace test_8k_assm 
        temporary tablespace temp 
        quota unlimited on test_8k_assm
;
 
grant 
        connect, 
        resource
to 
        vpd_test_function_owner
;
 
prompt  ============================================
prompt  Create a packaged function to set a context
prompt  that we will use in a VPD security predicate
prompt  ============================================

create or replace package vpd_test_function_owner.context_api_pkg AS

procedure set_parameter(
        p_name  in  varchar2,
        p_value in  varchar2
);

end context_api_pkg;
/
 
create or replace package body vpd_test_function_owner.context_api_pkg IS
 
procedure set_parameter (
        p_name  in  varchar2,
        p_value in  varchar2
) is
begin
        dbms_session.set_context('my_vpd_context', p_name, p_value);
end set_parameter;

end context_api_pkg;
/

prompt  ======================================================
prompt  Allow public to set the context value.  (Not sensible)
prompt  ======================================================

grant execute on vpd_test_function_owner.context_api_pkg to public;

prompt  ===============================================================
prompt  Create a context that can only be set by our packaged procedure
prompt  ===============================================================

create or replace context my_vpd_context 
        using vpd_test_function_owner.context_api_pkg
;

prompt  =====================================================
prompt  Create a security function that generates a predicate
prompt  based on our context, then create a policy to connect
prompt  the function to the test table for select statements.
prompt  =====================================================
 
create or replace function vpd_test_function_owner.test_vpd_function (
    p_schema  in varchar2 default null
  , p_object  in varchar2 default null
)
return varchar2
as
    lv_unit_id number := nvl(sys_context('my_vpd_context','unit_id'), -1);
begin
    return 'unit_id = ' || lv_unit_id;
end test_vpd_function;
/

begin
      dbms_rls.add_policy (
               object_schema    => 'vpd_test_data_owner'
             , object_name      => 'person'
             , policy_name      => 'test_vpd_policy'
             , function_schema  => 'vpd_test_function_owner'
             , policy_function  => 'test_vpd_function'
             , statement_types  => 'select'
      );
end;
/


There are several quick and dirty bits to the script – you shouldn’t be using the connect and resoruce roles, for example; they exist only for backwards compatibility and don’t even manage that very well any more. Any grants made should be carefully chosen to be the minimum necessary to achieve the required functionality, and you should be defining roles of your own rather than using pre-existing ones.

Generally you don’t expect to set up a security policy that stops the owner of the data from seeing all the data – and I’ve left the policy to default to dynamic which means the function will execute on every parse and execute of a statement accessing the table (and that’s somethin to avoid if you can). For convenience I’ve also alloweed the owner of the data to execute the function that changes the context that is used by the predicate function – and you don’t really want to allow anyone who is constrained by a security policy to be able to modify their own access rights like this.

Since the code allows a deliberately lax setup on VPD you could at this point do something like the following to check that VPD is actually working before moving on to test the effect of FDA:

connect vpd_test_data_owner/Password_1234
select * from person;

execute vpd_test_function_owner.context_api_pkg.set_parameter('unit_id',-2)
select * from person;

The first execution of the query should show you only the row where unit_id = -1 as “unit_id = -1” is the default return value from the security function. The second execution should return only the row where unit_id = -2 as the call to set_parameter() changes the context value so that when the security function re-executes it generate a new security predicate “unit_it = -2”. (It’s worth noting that one of the options for security policies is to make them context-dependent so that they re-execute only when the relevant context is changed – but in this case the policy defaults to “re-execute the function on every parse and execute”.)  [NOTE: for some clues on the possible performance impact of badly defined VPD, check the comments made in response to this blog note]

Once you’re satisfied that the security policy is working correctly you can move on to the second feature – flashback data archive. Logging on as SYS once again, execute the following code – which, amongst other things, creates a new tablespace. You’ll notice that I’ve got three lines in the “create tablespace” statement naming a datafile (though one of them doesn’t actually supply a name). The names (or absence thereof) correspond to the default naming conventions I have for my sandbox 11g, 12c, and 19c instances in that order. You will want to adjust according to your file-naming conventions.


prompt  ============================
prompt  Setting up Flashback Archive
prompt  ============================

create tablespace fda_ts 
        datafile        
--              no name needed if OMF
--              '/u01/app/oracle/oradata/orcl12c/orcl/fda_ts.dbf'
--              '/u02/data/OR19/orclpdb/fda_ts.dbf'
        size 1m autoextend on next 1m
;

alter user vpd_test_data_owner quota unlimited on fda_ts;

create flashback archive default fda_1year tablespace fda_ts
quota 1g retention 1 year;
 
grant flashback archive on fda_1year to vpd_test_data_owner;
grant flashback archive administer to vpd_test_data_owner;
grant execute on dbms_flashback_archive to vpd_test_data_owner;
 
prompt  Sleeping for 1 minute before adding table to flashback archive
execute dbms_lock.sleep(60);
alter table vpd_test_data_owner.person flashback archive fda_1year;

prompt  Sleeping for 1 minute before updating the date
execute dbms_lock.sleep(60);
update vpd_test_data_owner.person set surname = upper(surname);

commit;

prompt  Sleeping for 5 minutes to give FDA a chance to do its thing.
execute dbms_lock.sleep(300);
alter system flush shared_pool;

prompt  ==================================================
prompt  Now connect to the data owner schema and run the 
prompt  original query then a couple of flashback queries, 
prompt  pulling their plans from memory
prompt  ==================================================

connect vpd_test_data_owner/Password_1234

set linesize 120
set pagesize 50
set trimspool on
set serveroutput off

spool vpd_fda_bug.lst
-- set autotrace on explain

select * from vpd_test_data_owner.person;
select * from vpd_test_data_owner.person AS OF TIMESTAMP SYSDATE-1/1440;
select * from vpd_test_data_owner.person AS OF TIMESTAMP SYSDATE-2/1440;
select * from vpd_test_data_owner.person AS OF TIMESTAMP SYSDATE-3/1440;
select * from vpd_test_data_owner.person AS OF TIMESTAMP SYSDATE-4/1440;
select * from vpd_test_data_owner.person AS OF TIMESTAMP SYSDATE-5/1440;
select * from vpd_test_data_owner.person AS OF TIMESTAMP SYSDATE-6/1440;
select * from vpd_test_data_owner.person AS OF TIMESTAMP SYSDATE-7/1440;  
select * from vpd_test_data_owner.person AS OF TIMESTAMP SYSDATE-8/1440;  
select * from vpd_test_data_owner.person AS OF TIMESTAMP SYSDATE-9/1440;  
select * from vpd_test_data_owner.person AS OF TIMESTAMP SYSDATE-10/1440; 
select * from vpd_test_data_owner.person AS OF TIMESTAMP SYSDATE-15/1440; 
set autotrace off
spool off 

I’ve created a tablespace that I’m going to reserve for the flashback archive and given my data owner a quota on that tablespace; then I’ve created a flashback archive in that tablespace and granted various privileges relating to flashback archive to my data owner.

The next few lines of code include a couple of calls to dbms_lock.sleep() because I want to avoid the risk of getting an Oracle error ORA-01466: unable to read data – table definition has changed, but all I’ve done otherwise is modify the person table to be archiving and then made a little change that will eventually be recorded as part of the archive.

I’ve then introduced a 5 minute wait as it seems to take about 5 minutes before the flashback process takes any action to capture the original table data and copy any related undo; but after that 5 minutes is up I’ve queried the person table directly (which should show you the one row where unit_id = -1, then gradually gone backwards in time re-querying the data.

You should see the same result being produced for a few minutes, then a version of the “pre-update” data (upper case ‘ONE’ changing to mixed case ‘One’), and then you will (I hope) see the entire original data set appearing and finally you should see Oracle raising error “ORA-01466: unable to read data – table definition has changed” when your “as of timestamp” goes back beyond the moment you created the archive. (Except that that doesn’t happen with 11.2.0.4, which manages to report the data as if it had existed long before you created it).

I’ve commented out the “set autotrace on explain” in the above, but if you leave it in, or introduce it for one of the queries, you’ll see what’s going on that allows flashback data archive show you data that should have been hidden by the security predicate. Here’s the execution plan for one run:

-----------------------------------------------------------------------------------------------------------------
| Id  | Operation                 | Name                | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |
-----------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT          |                     |     2 |    86 |    17  (12)| 00:00:01 |       |       |
|   1 |  VIEW                     |                     |     2 |    86 |    17  (12)| 00:00:01 |       |       |
|   2 |   UNION-ALL               |                     |       |       |            |          |       |       |
|*  3 |    FILTER                 |                     |       |       |            |          |       |       |
|   4 |     PARTITION RANGE SINGLE|                     |     1 |    71 |     7   (0)| 00:00:01 |   KEY |     1 |
|*  5 |      TABLE ACCESS FULL    | SYS_FBA_HIST_353151 |     1 |    71 |     7   (0)| 00:00:01 |   KEY |     1 |
|*  6 |    FILTER                 |                     |       |       |            |          |       |       |
|   7 |     MERGE JOIN OUTER      |                     |     1 |  2083 |    10  (20)| 00:00:01 |       |       |
|   8 |      SORT JOIN            |                     |     1 |    55 |     7  (15)| 00:00:01 |       |       |
|*  9 |       TABLE ACCESS FULL   | PERSON              |     1 |    55 |     6   (0)| 00:00:01 |       |       |
|* 10 |      SORT JOIN            |                     |     5 | 10140 |     3  (34)| 00:00:01 |       |       |
|* 11 |       TABLE ACCESS FULL   | SYS_FBA_TCRV_353151 |     5 | 10140 |     2   (0)| 00:00:01 |       |       |
-----------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   3 - filter("TIMESTAMP_TO_SCN"(SYSDATE@!-.004861111111111111111111111111111111111111)<12670390363943)
   5 - filter("ENDSCN" .le. 12670390363943 AND ("OPERATION" IS NULL OR "OPERATION"<>'D') AND
              "ENDSCN">"TIMESTAMP_TO_SCN"(SYSDATE@!-.004861111111111111111111111111111111111111) AND ("STARTSCN" IS
              NULL OR "STARTSCN" .le. "TIMESTAMP_TO_SCN"(SYSDATE@!-.004861111111111111111111111111111111111111)))
   6 - filter("STARTSCN"<="TIMESTAMP_TO_SCN"(SYSDATE@!-.004861111111111111111111111111111111111111) OR
              "STARTSCN" IS NULL)
   9 - filter("UNIT_ID"=(-1) AND ("VERSIONS_OPERATION" IS NULL OR "VERSIONS_OPERATION"<>'D') 
             AND ("VERSIONS_STARTSCN" IS NULL OR "VERSIONS_STARTSCN".le."TIMESTAMP_TO_SCN(SYSDATE@!-.004861111111111111111111111111111111111111))
             AND ("VERSIONS_ENDSCN" IS NULL OR "VERSIONS_ENDSCN">"TIMESTAMP_TO_SCN"(SYSDATE@!-.004861111111111111111111111111111111111111)))
  10 - access("RID"(+)=ROWIDTOCHAR("T".ROWID))
       filter("RID"(+)=ROWIDTOCHAR("T".ROWID))
  11 - filter(("ENDSCN"(+) IS NULL OR "ENDSCN"(+)>12670390363943) AND ("STARTSCN"(+) IS NULL OR
              "STARTSCN"(+)<12670390363943))

Note
-----
   - dynamic sampling used for this statement (level=2)

Notice that the predicate “unit_id = -1″ appears on the full table scan of person at operation 9 – that’s Oracle applying the security predicate to the person table. But the flashback code has replaced the person table with a union all of (some partititions of) the SYS_FBA_HIST_353151 and a join between the person table and the SYS_FBA_TCRV_353151 table. And the code path that attaches the security predicate fails to attach it to the history table.

tl;dr

VPD (virtual private database) does not seem to be aware of the query rewrite that takes place if a table has an assocated FDA (flashback data archive), so a flashback query may report rows from the “history” table that should have been blocked by the VPD security policy.

Lagniappe

There is another little problem with FDA that might affect you if you try to optimizer flashback queries by creating SQL Plan Baselines. If you create a baseline on a test system (that isn’t a backup copy of the production system) and use the export/import facility to move the baseline to production then the baseline won’t work because the sys_fba_hist_nnnnn and sys_dba_tcrv_nnnnn table names are constructed from the object_id of the base table – which means the archive table names (and associated baseline hints) in the test system are probably going to have different names from the production system.

Housekeeping

To clean up the database after you’ve done all this testing, run the following script (modified to match any changes you’ve made in the test) after logging on as SYS:


alter table vpd_test_data_owner.person no flashback archive;

drop flashback archive fda_1year;

drop USER VPD_TEST_FUNCTION_OWNER cascade;
drop USER VPD_TEST_DATA_OWNER cascade;

drop tablespace fda_ts including contents and datafiles;

 

 

 

 

December 20, 2019

Purge Cursor

Filed under: Oracle,Troubleshooting — Jonathan Lewis @ 1:54 pm GMT Dec 20,2019

This is a note I first drafted about 5 years ago (the date stamp says March 2014) and rediscovered a few days ago when the question came up on a Twitter thread.

How do you purge a single SQL statement from the library cache without having to execute “alter system flush shared_pool”?

The answer is in the package dbms_shared_pool, specfically the purge() procedure. This package changes significantly in the upgrade from 11.2 (manual page here) to 12.1 (manual page here) so it’s best to check the reference manual for the version you’re using in case it changes again.  In 11.2 (and earlier) there’s just one option for the purge() procedure but in 12.1 the package gets 3 overloaded versions of the procedure – and one of the operloads gets an extra parameter (edition) by 19c.

Side note: In very early versions of Oracle the package wasn’t installed automatically, so you may have to execute $ORACLE_HOME/admin/rdbms/dbmspool.sql (possibly followed by prvtpool.plb) to install it. The facility to purge a cursor appeared in 11.1 and was then back-ported to 10.2.0.4. The manual pages for the procedure are, however, not up to date and don’t list all the possible flags that tell the procedure what type of object it is supposed to be purging.

The only use I’ve made of the purge() procedure is to purge a cursor from memory, though you can purge other types of object if you want to. Technically you could flush the execution plan from memory without eliminating the cursor, though you would still have to re-optimize the statement so there may be no benefit (or very little benefit) in doing so.

To demonstrate the mechanism I’m going to use three sessions – two to run a query with different optimizer environments, then a third to find and purge the cursors. Here’s the code for the first two sessions:


rem
rem     Script: purge_cursor.sql
rem     Dated:  March 2014
rem     Author: Jonathan Lewis
rem
rem     Last tested
rem             12.2.0.1
rem             11.2.0.4
rem             10.2.0.4  -- with variations to get  two child cursors
rem

/*    To be run by session 1    */

create table t1 as select * from all_objects where rownum <= 10000;
create index t1_i1 on t1(object_id) invisible;

alter session set optimizer_use_invisible_indexes = true;

set serveroutput off
select object_name from t1 where object_id = 250;
select * from table(dbms_xplan.display_cursor);

/*    To be run by session 2    */

set serveroutput off
select object_name from t1 where object_id = 250;
select * from table(dbms_xplan.display_cursor);

You’ll see that I’ve created an invisible index on the table then allowed one session to use invisible indexes while the other session isn’t allowed to. As a consequence session 1 will produce an execution plan that shows an index range scan for child cursor 0, and session 2 will produce an execution plan that shows a table scan for child cursor 1. I won’t show the output from these two sessions, but my calls to dbms_xplan reported the sql_id as ‘ab08hg3s62rpq’ and I’ve used that as the value for a substituion variable in the code to be run by session 3.

The final demo srcipt is a little messy because it’s going to attempt to report all the execution plans for that sql_id three times, but it will also write and execute a script to call the purge() procedure twice – once to eliminate the plans but leave the cursors in place, then a second time to purge the cursors.

define m_sql_id = 'ab08hg3s62rpq'

spool purge_cursor

prompt  =============================
prompt  Before purge- 2 child cursors
prompt  =============================

select * from table(dbms_xplan.display_cursor('&m_sql_id', null));

spool off

set verify off
set feedback off
set heading off

spool temp_purge.sql

select 
        q'{ execute dbms_shared_pool.purge('}' ||
                address || ',' || hash_value || 
                q'{', 'C', 64) }'
from 
        V$sqlarea 
where 
        sql_id = '&m_sql_id'
;

spool off

@temp_purge

set heading on
set feedback on
set verify on

spool purge_cursor append

prompt  =======================================
prompt  After heap 6 purge- 2 cursors, no plans
prompt  =======================================

select * from table(dbms_xplan.display_cursor('&m_sql_id', null));

spool off

set verify off
set feedback off
set heading off

spool temp_purge.sql

select 
        q'{ execute dbms_shared_pool.purge('}' ||
                address || ',' || hash_value || 
                q'{', 'C') }'
from 
        V$sqlarea 
where 
        sql_id = '&m_sql_id'
;

spool off

@temp_purge

set heading on
set verify on
set feedback on

spool purge_cursor append

prompt  =================================
prompt  After complete purge - no cursors
prompt  =================================

select * from table(dbms_xplan.display_cursor('&m_sql_id', null));

On the first pass the select from v$sqlarea produces a script with the following line:

 execute dbms_shared_pool.purge('000000008D3B4FD0,4032913078', 'C', 64)

On the second pass the select produces a script with the following line:

 execute dbms_shared_pool.purge('000000008D3B4FD0,4032913078', 'C')

The effect of the first call is to purge heap 6 (power(2,6) = 64) for any child cursors that exist for the sql_id. The effect of the second call is to purge the entire set of child cursors. The purge_cursor.lst file ends up with the following results:


=============================
Before purge- 2 child cursors
=============================

PLAN_TABLE_OUTPUT
---------------------------------------------------------------------------------------------
SQL_ID  ab08hg3s62rpq, child number 0
-------------------------------------
select object_name from t1 where object_id = 250

Plan hash value: 3320414027

---------------------------------------------------------------------------------------------
| Id  | Operation                           | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                    |       |       |       |     2 (100)|          |
|   1 |  TABLE ACCESS BY INDEX ROWID BATCHED| T1    |     1 |    25 |     2   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN                  | T1_I1 |     1 |       |     1   (0)| 00:00:01 |
---------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("OBJECT_ID"=250)

SQL_ID  ab08hg3s62rpq, child number 1
-------------------------------------
select object_name from t1 where object_id = 250

Plan hash value: 3617692013

--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      |       |       |    27 (100)|          |
|*  1 |  TABLE ACCESS FULL| T1   |     1 |    25 |    27   (8)| 00:00:01 |
--------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter("OBJECT_ID"=250)


37 rows selected.

=======================================
After heap 6 purge- 2 cursors, no plans
=======================================
old   1: select * from table(dbms_xplan.display_cursor('&m_sql_id', null))
new   1: select * from table(dbms_xplan.display_cursor('ab08hg3s62rpq', null))

PLAN_TABLE_OUTPUT
---------------------------------------------------------------------------------------------
SQL_ID  ab08hg3s62rpq, child number 0

select object_name from t1 where object_id = 250

NOTE: cannot fetch plan for SQL_ID: ab08hg3s62rpq, CHILD_NUMBER: 0
      Please verify value of SQL_ID and CHILD_NUMBER;
      It could also be that the plan is no longer in cursor cache (check v$sql_plan)

SQL_ID  ab08hg3s62rpq, child number 1

select object_name from t1 where object_id = 250

NOTE: cannot fetch plan for SQL_ID: ab08hg3s62rpq, CHILD_NUMBER: 1
      Please verify value of SQL_ID and CHILD_NUMBER;
      It could also be that the plan is no longer in cursor cache (check v$sql_plan)


16 rows selected.

=================================
After complete purge - no cursors
=================================
old   1: select * from table(dbms_xplan.display_cursor('&m_sql_id', null))
new   1: select * from table(dbms_xplan.display_cursor('ab08hg3s62rpq', null))

PLAN_TABLE_OUTPUT
---------------------------------------------------------------------------------------------
SQL_ID: ab08hg3s62rpq cannot be found


2 rows selected.

As you can see we start with two child cursors and two different execution plans, then the two child cursors “lose” their execution plans, and finally the cursors disappear completely. (And the parent disappears at the same time.)

Presenting the procedure with a little more formality – the formal declaration of the procedure that I’ve been using reads:


procedure purge(
        name    varchar2, 
        flag    char    DEFAULT 'P', 
        heaps   number  DEFAULT 1
)

To purge a cursor we set the flag to ‘C’ (or ‘c’)  and the name to ‘{address},{hash_value}’ (make sure you don’t get extra spaces in that expression). If we want to purge just the execution plan we need to target heap 6 which means setting the heaps value to power(2,6). Different types of object use difference (sub)heaps so if you want to delete multiple heaps you need to add together the appropriate power(2,N); the default value for heaps is 1, which equates to heap 0, which means the whole object.

For other types of object there is a reference list (under the keep() procedure) that expands on the manuals to give us the following possible values of flag:


  --        Value        Kind of Object to {keep}
  --        -----        ------------------------
  --          P          package/procedure/function
  --          Q          sequence
  --          R          trigger
  --          T          type
  --          JS         java source
  --          JC         java class
  --          JR         java resource
  --          JD         java shared data
  --          C          cursor

You will note, of course, that I’ve used v$sqlarea rather than v$sql when I was searching for the address and hash value. When you purge a cursor you purge every child cursor you can’t identify an individual child. Even if you query v$sql instead of v$sqlarea, and use a single child_address instead of the (parent) address the purge() procedure will still purge every child for the parent.

Warning

There is a comment in my original notes about a bug that was fixed by 11.2.0.4 – if any of the child cursors is currently executing then the purge() procedure will go into a loop waiting on “cursor: pin X”, timing out every 1/100 second. Unfortunately my notes didn’t make any explicit comment about what impact this had on any session trying to parse or execute any of the children for that parent cursor – but I suspect that you’d end up with a nasty race condition and an apparent hang.

 

December 18, 2019

Wait for Java

Filed under: Oracle,Troubleshooting — Jonathan Lewis @ 9:59 am GMT Dec 18,2019

This is a note courtesy of Jack can Zanen on the Oracle-L list server who asked a question about “wait for CPU” and then produced the answer a couple of days later. It’s a simple demonstration of how Java in the database can be very deceptive in terms of indicating CPU usage that isn’t really CPU usage.

Bottom line – when you call Java Oracle knows you’re about to start doing some work on the CPU, but once you’re inside the java engine Oracle has no way of knowing whether the java code is on the CPU or waiting. So if the java starts to wait (e.g. for some slow file I/O) Oracle will still be reporting your session as using CPU.

To demonstrate the principle, I’m going to create little java procedure that simply goes to sleep – and see what I find in the active session history (ASH) after I’ve been sleeping in java for 10 seconds.

rem
rem     Script:         java_wait_for_cpu.sql
rem     Author:         Jonathan Lewis
rem     Dated:          Nov 2019
rem
rem     Last tested 
rem             19.3.0.0
rem             12.2.0.1
rem
rem     Based on an email from Jack van Zanen to Oracle-L
rem

set time on

create or replace procedure milli_sleep(i_milliseconds in number) 
as 
        language java
        name 'java.lang.Thread.sleep(int)';
/

set pagesize 60
set linesize 132
set trimspool on

column sample_time format a32
column event       format a32
column sql_text    format a60
column sql_id      new_value m_sql_id

set echo on
execute milli_sleep(1e4)

select 
        sample_time, sample_id, session_state, sql_id, event 
from 
        v$active_session_history
where 
        session_id = sys_context('userenv','sid')
and     sample_time > sysdate - 1/1440 
order by 
        sample_time
;

select  sql_id, round(cpu_time/1e6,3) cpu_time, round(elapsed_time/1e6,3) elapsed, sql_text 
from    v$sql 
where   sql_id = '&m_sql_id'
;

I’ve set timing on and set echo on so that you can see when my code starts and finishes and correlate it with the report from v$active_session_history for my session. Since I’ve reported the last minute you may find some other stuff reported before the call to milli_sleep() but you should find that you get a report of about 10 seconds “ON CPU” even though your session is really not consuming any CPU at all. I’ve included a report of the SQL that’s “running” while the session is “ON CPU”.

Here (with a little edit to remove the echoed query against v$active_session_history) are the results from a run on 12.2.0.1 (and the run on 19.3.0.0 was very similar):


Procedure created.

18:51:17 SQL> execute milli_sleep(1e4)

PL/SQL procedure successfully completed.

SAMPLE_TIME                       SAMPLE_ID SESSION SQL_ID        EVENT
-------------------------------- ---------- ------- ------------- --------------------------------
16-DEC-19 06.51.11.983 PM          15577837 ON CPU  8r3xn050z2uqm
16-DEC-19 06.51.12.984 PM          15577838 ON CPU  8r3xn050z2uqm
16-DEC-19 06.51.13.985 PM          15577839 ON CPU  8r3xn050z2uqm
16-DEC-19 06.51.14.985 PM          15577840 ON CPU  8r3xn050z2uqm
16-DEC-19 06.51.15.986 PM          15577841 ON CPU  8r3xn050z2uqm
16-DEC-19 06.51.16.996 PM          15577842 ON CPU  8r3xn050z2uqm
16-DEC-19 06.51.17.995 PM          15577843 ON CPU  4jt6zf4nybawp
16-DEC-19 06.51.18.999 PM          15577844 ON CPU  4jt6zf4nybawp
16-DEC-19 06.51.20.012 PM          15577845 ON CPU  4jt6zf4nybawp
16-DEC-19 06.51.21.018 PM          15577846 ON CPU  4jt6zf4nybawp
16-DEC-19 06.51.22.019 PM          15577847 ON CPU  4jt6zf4nybawp
16-DEC-19 06.51.23.019 PM          15577848 ON CPU  4jt6zf4nybawp
16-DEC-19 06.51.24.033 PM          15577849 ON CPU  4jt6zf4nybawp
16-DEC-19 06.51.25.039 PM          15577850 ON CPU  4jt6zf4nybawp
16-DEC-19 06.51.26.047 PM          15577851 ON CPU  4jt6zf4nybawp
16-DEC-19 06.51.27.058 PM          15577852 ON CPU  4jt6zf4nybawp

16 rows selected.

18:51:27 SQL>

SQL_ID          CPU_TIME    ELAPSED SQL_TEXT
------------- ---------- ---------- ------------------------------------------------------------
4jt6zf4nybawp       .004     10.029 BEGIN milli_sleep(1e4); END;


As you can see I had a statement executing for a few seconds before the call to milli_sleep(), but then we see milli_sleep() “on” the CPU for 10 consecutive samples; but when the sleep ends the query for actual usage shows us that the elapsed time was 10 seconds but the CPU usage was only 4 milliseconds.

December 16, 2019

IOT Bug

Filed under: Bugs,Oracle,Performance,Troubleshooting — Jonathan Lewis @ 3:58 pm GMT Dec 16,2019

Here’s a worrying bug that showed up a couple of days ago on the Oracle-L mailing list. It’s a problem that I’ve tested against 12.2.0.1 and 19.3.0.0 – it may be present on earlier versions of Oracle. One of the nastiest things about it is that you might not notice it until you get an “out of space” error from the operating system. You won’t get any wrong results from it, but it may well be adding an undesirable performance overhead.

Basically it seems that (under some circumstances, at least) Oracle is setting the “block guess” component of the secondary index on Index Organized Tables (IOTs) to point to blocks in the overflow segment instead of blocks in the primary key segment. As a result, when you execute a query that accesses the IOT through the secondary index and has to do reads from disc to satisfy the query – your session goes through the following steps:

  • Identify index entry from secondary index – acquire “block guess”
  • Read indicated block and discover the object number on the block is wrong, and the block type is wrong
  • Write a (silent) ORA-01410 error and do a block dump into the trace file
  • Use the “logical rowid” from the secondary index (i.e. the stored primary key value) to access the primary key index by key value

So your query runs to completion and you get the right result because Oracle eventually gets there using the primary key component stored in the secondary index, but it always starts with the guess[see sidebar] and for every block you read into the cache because of the guess you get a dump to the trace file.

Here’s a little code to demonstrate. The problem with this code is that everything appears to works perfectly, you have to be able to find the trace file for your session to see what’s gone wrong. First we create some data – this code is largely copied from the original posting on Oracle-L, with a few minor changes:


rem
rem     Script:         iot_bug_12c.sql
rem     Author:         Jonathan Lewis
rem     Dated:          Nov 2019
rem     Purpose:        
rem
rem     Last tested 
rem             19.3.0.0
rem             12.2.0.1
rem
rem     Notes
rem     THe OP had tested on 19.5.0.0 to get the same effect, see:
rem     //www.freelists.org/post/oracle-l/IOT-cannot-get-valid-consensus-bug-or-unexplained-behavio
rem

drop table randomload purge;

create table randomload(
        roll number,
        name varchar2(40),
        mark1 number,
        mark2 number,
        mark3 number,
        mark4 number,
        mark5 number,
        mark6 number,
        primary key (roll)
) 
organization index 
including mark3 overflow
;

create index randomload_idx on randomload(mark6);

insert into randomload 
select 
        rownum, 
        dbms_random.string(0,40) name, 
        round(dbms_random.value(0,100)), 
        round(dbms_random.value(0,100)), 
        round(dbms_random.value(0,100)), 
        round(dbms_random.value(0,100)), 
        round(dbms_random.value(0,100)), 
        round(dbms_random.value(0,10000)) 
from 
        dual 
connect by 
        level < 1e5 -- > comment to avoid wordpress format issue
;

commit;

exec dbms_stats.gather_table_stats(null,'randomload', cascade=>true);

prompt  ==================================================
prompt  pct_direct_access should be 100 for randomload_idx
prompt  ==================================================

select 
        table_name, index_name, num_rows, pct_direct_access, iot_redundant_pkey_elim  
from 
        user_indexes
where
        table_name = 'RANDOMLOAD'
;

It should take just a few seconds to build the data set and you should check that the pct_direct_access is 100 for the index called randomload_idx.

The next step is to run a query that will do an index range scan on the secondary index.

 
column mark6 new_value m_6

select 
        mark6, count(*) 
from
        randomload 
group by 
        mark6
order by 
        count(*)
fetch first 5 rows only
;

alter system flush buffer_cache;
alter session set events '10046 trace name context forever, level 8';
set serveroutput off

select avg(mark3) 
from 
        randomload 
where 
        mark6 = &m_6
;

select * from table(dbms_xplan.display_cursor);

alter session set events '10046 trace name context off';
set serveroutput on

I’ve started by selecting one of the least frequencly occuring values of m_6 (a column I know to be in the overflow); then I’ve flushed the buffer cache so that any access I make to the data will have to start with disk reads (the original poster suggested restarting the database at this point, but that’s not necessary).

Then I’ve enabled sql_trace to show wait states (to capture details of what blocks were read and which object they belong to),, and I’ve run a query for m_3 (a column that is in the primary key (TOP) segment of the IOT) and pulled its execution plan from memory to check that the query did use a range scan of the secondary index. Here’s the plan:

----------------------------------------------------------------------------------------
| Id  | Operation          | Name              | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |                   |       |       |    11 (100)|          |
|   1 |  SORT AGGREGATE    |                   |     1 |     7 |            |          |
|*  2 |   INDEX UNIQUE SCAN| SYS_IOT_TOP_77298 |    10 |    70 |    11   (0)| 00:00:01 |
|*  3 |    INDEX RANGE SCAN| RANDOMLOAD_IDX    |    10 |       |     1   (0)| 00:00:01 |
----------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("MARK6"=1316)
   3 - access("MARK6"=1316)

As you can see the plan shows what we are hoping to see – an index range scan of the secondary index that let’s it follow up with a unique scan of the primary key segment. It’s just a little odd that the access predicate reported for operation 2 (unique scan of TOP) suggests that the access is on a column that isn’t in the primary key and isn’t even in the TOP section.

So the query works and gives the right answer. But what do we find in the trace directory ? If you’re running 12c (possibly only 12.2), each time the error occurs the following pattern of information will be written to the alert log (it didn’t appear in 19.3)


ORCL(3):Hex dump of (file 22, block 16747) in trace file /u01/app/oracle/diag/rdbms/orcl12c/orcl12c/trace/orcl12c_ora_7888.trc
ORCL(3):
ORCL(3):Corrupt block relative dba: 0x0580416b (file 22, block 16747)
ORCL(3):Bad header found during multiblock buffer read (logical check)
ORCL(3):Data in bad block:
ORCL(3): type: 6 format: 2 rdba: 0x0580416b
ORCL(3): last change scn: 0x0000.0b86.0e36484c seq: 0x1 flg: 0x06
ORCL(3): spare3: 0x0
ORCL(3): consistency value in tail: 0x484c0601
ORCL(3): check value in block header: 0x4408
ORCL(3): computed block checksum: 0x0
ORCL(3):

And the following pattern of information is written to the trace file [Update: a follow-up test on 11.2.0.4 suggests that the basic “wrong block address” error also happens in that version of Oracle, but doesn’t result in a dump to the trace file]:


kcbzibmlt:: encounter logical error ORA-1410, try re-reading from other mirror..
cursor valid? 1 makecr 0 line 15461 ds_blk (22, 16747) bh_blk (22, 16747)
kcbds 0x7ff1ca8c0b30: pdb 3, tsn 8, rdba 0x0580416b, afn 22, objd 135348, cls 1, tidflg 0x8 0x80 0x0
    dsflg 0x108000, dsflg2 0x0, lobid 0x0:0, cnt 0, addr 0x0, exf 0x10a60af0, dx 0x0, ctx 0
    whr: 'qeilwh03: qeilbk'
env [0x7ff1ca8e3e54]: (scn: 0x00000b860e364893   xid: 0x0000.000.00000000  uba: 0x00000000.0000.00  statement num=0  parent xid:  0x0000.000.00000000  st-scn: 0x0000000000000000  hi-scn: 0x0000000000000000  ma-scn: 0x00000b860e364879  flg: 0x00000660)
BH (0xb1fd6278) file#: 22 rdba: 0x0580416b (22/16747) class: 1 ba: 0xb1c34000
  set: 10 pool: 3 bsz: 8192 bsi: 0 sflg: 2 pwc: 763,14
  dbwrid: 0 obj: 135348 objn: 135348 tsn: [3/8] afn: 22 hint: f
  hash: [0x9eff0528,0x77cff808] lru: [0xb1fd2578,0x9ff84658]
  ckptq: [NULL] fileq: [NULL]
  objq: [0xb6f654c0,0x9ff84680] objaq: [0xb6f654d0,0x9ff84690]
  use: [0x77b78128,0x77b78128] wait: [NULL]
  st: READING md: EXCL tch: 0
  flags: only_sequential_access
  Printing buffer operation history (latest change first):
  cnt: 5
  01. sid:10 L122:zgb:set:st          02. sid:10 L830:olq1:clr:WRT+CKT
  03. sid:10 L951:zgb:lnk:objq        04. sid:10 L372:zgb:set:MEXCL
  05. sid:10 L123:zgb:no:FEN          06. sid:10 L083:zgb:ent:fn
  07. sid:08 L192:kcbbic2:bic:FBD     08. sid:08 L191:kcbbic2:bic:FBW
  09. sid:08 L604:bic2:bis:REU        10. sid:08 L190:kcbbic2:bic:FAW
  11. sid:08 L602:bic1_int:bis:FWC    12. sid:08 L822:bic1_int:ent:rtn
  13. sid:08 L832:oswmqbg1:clr:WRT    14. sid:08 L930:kubc:sw:mq
  15. sid:08 L913:bxsv:sw:objq        16. sid:08 L608:bxsv:bis:FBW
Hex dump of (file 22, block 16747)

   ... etc.

Corrupt block relative dba: 0x0580416b (file 22, block 16747)
Bad header found during multiblock buffer read (logical check)
Data in bad block:
 type: 6 format: 2 rdba: 0x0580416b
 last change scn: 0x0000.0b86.0e36484c seq: 0x1 flg: 0x06
 spare3: 0x0
 consistency value in tail: 0x484c0601
 check value in block header: 0x4408
 computed block checksum: 0x0
TRCMIR:kcf_reread     :start:  16747:0:/u01/app/oracle/oradata/orcl12c/orcl/test_8k_assm.dbf
TRCMIR:kcf_reread     :done :  16747:0:/u01/app/oracle/oradata/orcl12c/orcl/test_8k_assm.dbf

The nasty bit, of course, is the bit I’ve removed and replaced with just “etc.”: it’s a complete block dump (raw and symbolic) which in my example was somthing like 500 lines and 35KB in size.

It’s not immediately obvious exactly what’s going on and why, but the 10046 trace helps a little. From another run of the test (on 19.3.0.0) I got the following combination of details – which is an extract showing the bit of the wait state trace leading into the start of the first block dump:

WAIT #140478118667016: nam='db file scattered read' ela= 108 file#=13 block#=256 blocks=32 obj#=77313 tim=103574529210
WAIT #140478118667016: nam='db file scattered read' ela= 2236 file#=13 block#=640 blocks=32 obj#=77313 tim=103574531549
WAIT #140478118667016: nam='db file scattered read' ela= 534 file#=13 block#=212 blocks=32 obj#=77312 tim=103574532257
kcbzibmlt: encounter logical error ORA-1410, try re-reading from other mirror..
cursor valid? 1 warm_up abort 0 makecr 0 line 16082 ds_blk (13, 212) bh_blk (13, 212)

Object 77313 is the secondary index, object 77312 is the primary key index (IOT_TOP). It may seem a little odd that Oracle is using db file scattered reads of 32 blocks to read the indexes but this is a side effect of flushing the buffer – Oracle may decide to prefeetch many extra blocks of an object to “warmup” the cache just after instance startup or a flush of the buffer cache. The thing I want to check, though, is what’s wrong with the blocks that Oracle read from object 77312:


alter system dump datafile 13 block min 212 block max 243;

BH (0xc8f68e68) file#: 13 rdba: 0x034000d4 (13/212) class: 1 ba: 0xc8266000
  set: 10 pool: 3 bsz: 8192 bsi: 0 sflg: 2 pwc: 0,15
  dbwrid: 0 obj: 77311 objn: 77311 tsn: [3/6] afn: 13 hint: f

BH (0xa7fd6c38) file#: 13 rdba: 0x034000d4 (13/212) class: 1 ba: 0xa7c2a000
  set: 12 pool: 3 bsz: 8192 bsi: 0 sflg: 2 pwc: 0,15
  dbwrid: 0 obj: 77311 objn: 77311 tsn: [3/6] afn: 13 hint: f

BH (0xa5f75780) file#: 13 rdba: 0x034000d5 (13/213) class: 0 ba: 0xa5384000
  set: 11 pool: 3 bsz: 8192 bsi: 0 sflg: 2 pwc: 0,15
  dbwrid: 0 obj: 77311 objn: 77311 tsn: [3/6] afn: 13 hint: f

BH (0xdafe9220) file#: 13 rdba: 0x034000d5 (13/213) class: 1 ba: 0xdadcc000
  set: 9 pool: 3 bsz: 8192 bsi: 0 sflg: 2 pwc: 0,15
  dbwrid: 0 obj: 77311 objn: 77311 tsn: [3/6] afn: 13 hint: f

I’ve reported the first few lines of the symbolic dump for the first few blocks of the resulting trace file. Look at the third line of each group of three BH lines: it’s reporting object 77311 (the overflow segment), not 77312 (the TOP segment). And every single block reported in the db file scattered read of 32 blocks for object 77312 reports itself, when dumped, as being part of object 77311. And that’s possibly the immediate cause of the ORA-01410.

We can take the investigation a little further by dumping a leaf block or two from the secondary index.


alter session set events 'immediate trace name treedump level 77313';

----- begin tree dump
branch: 0x3400104 54526212 (0: nrow: 542, level: 1)
   leaf: 0x340010d 54526221 (-1: row:278.278 avs:2479)
   leaf: 0x340075e 54527838 (0: row:132.132 avs:5372)
   leaf: 0x34005fb 54527483 (1: row:41.41 avs:7185)

alter system dump datafile 13 block 1886   -- leaf: 0x340075e

BH (0xd5f5d090) file#: 13 rdba: 0x0340075e (13/1886) class: 1 ba: 0xd5158000
  set: 9 pool: 3 bsz: 8192 bsi: 0 sflg: 2 pwc: 0,15
  dbwrid: 0 obj: 77313 objn: 77313 tsn: [3/6] afn: 13 hint: f
...
row#6[5796] flag: K------, lock: 0, len=18
col 0; len 2; (2):  c1 1d
col 1; len 4; (4):  c3 07 41 5c
tl: 8 fb: --H-FL-- lb: 0x0  cc: 1
col  0: [ 4]  03 40 05 7c

I’ve done a treedump of the secondary index and picked a leaf block address from the treedump and dumped that leaf block, and from that leaf block I’ve extracted one index entry to show you the three components: the key value (c1 1d), the primary key for the row (c3 07 41 5c), and the block guess (03 40 05 75). Read the block guess as a 4 byte hex number, and it translates to file 13, block 1397 – which should belong to the TOP segment. So the exciting question is – what object does block (13, 1397) think it belongs to ?


alter system dump datafile 13 block 1397;

Block header dump:  0x03400575
 Object id on Block? Y
 seg/obj: 0x12dff  csc:  0x00000b860e308c46  itc: 2  flg: E  typ: 1 - DATA
     brn: 0  bdba: 0x3400501 ver: 0x01 opc: 0
     inc: 0  exflg: 0

Converting from Hex to Decimal: obj: 0x12dff = 77311 which is the overflow segment. The secondary index block guess is pointing at a block in the overflow segment.

There are two ways to handle this problem – you could simply rebuild the index (alter index rebuild) or, as the original poster did, use the “update block references” command to correct all the block guesses: “alter index randomload_idx update block references;”. Neither is desirable, but if you’re seeing a lot of large trace files following the pattern above then it may be necessary.

There was one particular inconsistency in the tests – which I ran many times – occasionally the pct_direct_access for the secondary index would be reported as zero (which, technically, should always happen given the error).  If it did, of course, Oracle wouldn’t follow the guess but would go straight to the step where it used the primary key “logical rowid” – thus bypassing the error and block dump.

tl;dr

In some circumstances the block guesses in the secondary indexes of IOTs may be pointing to the overflow segment instead of the primary key (TOP) segment. If this happens then queries will still run and give the right answers, but whenever they read a “guessed” block from disc they will report an ORA-01410 error and dump a block trace. This will affect performance and may cause space problems at the O/S level.

Sidebar

An entry in the secondary index of an Index Organized Table (IOT) consists of three parts, which intially we can think in the form:

({key-value}, {logical rowid}, {block guess})

Since IOTs don’t have real rowids the “logical rowid” is actually the primary key of the row where the {key value} will be found. As a short cut for efficient execution Oracle includes the block address (4 bytes) where that primary key value was stored when the row was inserted. Because an IOT is an index “rows” in the IOT can move as new data is inserted and leaf blocks split, so eventually any primary key may move to a different block – this is why we refer to the block address as a guess – a few days, hours, or minutes after you’ve inserted the row the block address may no longer be correct.)

To help the runtime engine do the right thing Oracle collects a statistic called pct_direct_access for secondary indexes of IOTs. This is a measure of what percentage of the block guesses are still correct at the time that the statistics are gathered. If this value is high enough the run-time engine will choose to try using the block guesses while executing a query (falling back to using the logical rowid if it turns out that the guess is invalid), but if the value drops too low the optimizer will ignore the block guesses and only use the logical rowid.

Not relevant to this note – but a final point about secondary indexes and logical rowids – if the definition of the index includes  some of the columns from the primary keys Oracle won’t store those columns twice (in more recent version, that is) – the code is clever enough to use the values stored in the (key value) component when it needs to use the (logical rowid) component.

 

Update (Jan 2020)

I passed this example on to Oracle, and there are now two bugs recorded for it:

  • Bug 30733525 – ALERT LOG ENTRIES RE BLOCK GUESSES IN THE SECONDARY INDEXES OF IOTS POINTING TO OVERFLOW SEGMENT INSTEAD OF INDEX SEGMENT
  • Bug 30733563 – WRONG GUESS DBA IN INDEX

 

 

 

December 6, 2019

Temp space

Filed under: Execution plans,Oracle,Performance,Troubleshooting — Jonathan Lewis @ 12:18 pm GMT Dec 6,2019

A question about hunting down the source of the error “ORA-01652 unable to extend temp segment by NNN in tablespace XXX” shows up on the Oracle-L mailing list or the Oracle developer community forum from time to time. In most cases the tablespace referenced is the temporary tablespace, which means the session reporting the error was probably trying to allocate some space for sorting, or doing a hash join, or instantiating a GTT (global temporary table) or a CTE (common table expression / “with” subquery). The difficulty in cases like this is that the session reporting the error might be the victim of some other session’s greed – so looking at what the session was doing won’t necessarily point you to the real problem.

Of course you then run into a further problem tracking down the source of the problem. By the time you hear the complaint (even if it’s only seconds after the error appeared) the session that had been hogging the temporary tablespace may have finished what it was doing, leaving a huge amount of free space in the temporary tablespace and suggesting (to the less experienced and cynical user) that there’s something fundamentally wrong with the way Oracle has been accounting for space usage.

If you find yourself in this situation remember that (if you’re licensed to take advantage of it) the active session history may be able to help.  One of the columns in v$active_session_history is called temp_space_allocated with the slightly misleading description: “Amount of TEMP memory (in bytes) consumed by this session at the time this sample was taken”. A simple query against v$active_session_history may be enough to identify the session and SQL  statement that had been holding the temporary space when the error was raised, for example:


column pga_allocated        format 999,999,999,999
column temp_space_allocated format 999,999,999,999

break on session_id skip 1 on session_serial#

select
        session_id, session_serial#, 
        sample_id, 
        sql_id, 
        pga_allocated,
        temp_space_allocated
from
        v$active_session_history
where
        sample_time between sysdate - 5/1440 and sysdate
and     nvl(temp_space_allocated,0) != 0
order by
        session_id, sample_id
/

All I’ve done for this example is query v$active_session_history for the last 5 minutes reporting a minimum of information from any rows that show temp space usage. As a minor variation on the theme you can obviously change the time range, and you might want to limit the output to rows reporting more than 1MB (say) of temp space usage.

You’ll notice that I’ve also reported the pga_allocated (Description: Amount of PGA memory (in bytes) consumed by this session at the time this sample was taken) in this query; this is just a little convenience – a query that’s taking a lot of temp space will probably start by acquiring a lot of memory so it’s nice to be able to see the two figures together.

There are plenty of limitations and flaws in the usefulness of this report and I’ll say something about that after showing an example of usage. Let’s start with a script to build some data before running a space-consuming query:


rem
rem     Script:         allocate_tempspace.sql
rem     Author:         Jonathan Lewis
rem     Dated:          Nov 2019
rem
rem     Last tested 
rem             19.3.0.0
rem             12.2.0.1
rem

create table t1 as 
select * from all_objects
;

insert into t1 select * from t1;
commit;

insert into t1 select * from t1;
commit;

insert into t1 select * from t1;
commit;

insert into t1 select * from t1;
commit;

execute dbms_stats.gather_table_stats(null,'t1')

execute dbms_lock.sleep(20)

set pagesize  60
set linesize 255
set trimspool on
set serveroutput off
alter session set statistics_level = all;

with ttemp as (
        select /*+ materialize */ * from t1
)
select 
        /*+ no_partial_join(@sel$2 t1b) no_place_group_by(@sel$2) */ 
        t1a.object_type,
        max(t1a.object_name)
from
        ttemp t1a, ttemp t1b
where
        t1a.object_id = t1b.object_id
group by
        t1a.object_type
order by
        t1a.object_type
;

select * from table(dbms_xplan.display_cursor(null,null,'allstats last'));

My working table t1 consists of 16 copies of the view all_objects – so close to 1 million rows in my case – and the query is hinted to avoid any of the clever transformations that the optimizer could use to reduce the workload so it’s going to do a massive hash join and aggregation to report a summary of a couple of dozen rows. Here’s the execution plan (in this case from 12.2.0.1, though the plan is the same for 19.3 with some variations in the numbers).


SQL_ID  1cwabt12zq6zb, child number 0
-------------------------------------

Plan hash value: 1682228242

----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                                | Name                       | Starts | E-Rows | A-Rows |   A-Time   | Buffers | Reads  | Writes |  OMem |  1Mem | Used-Mem | Used-Tmp|
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                         |                            |      1 |        |     29 |00:00:10.03 |   47413 |  21345 |  12127 |       |       |          |         |
|   1 |  TEMP TABLE TRANSFORMATION               |                            |      1 |        |     29 |00:00:10.03 |   47413 |  21345 |  12127 |       |       |          |         |
|   2 |   LOAD AS SELECT (CURSOR DURATION MEMORY)| SYS_TEMP_0FD9D665B_E2772D3 |      1 |        |      0 |00:00:01.51 |   28915 |      0 |   9217 |  2068K|  2068K|          |         |
|   3 |    TABLE ACCESS FULL                     | T1                         |      1 |    989K|    989K|00:00:00.24 |   19551 |      0 |      0 |       |       |          |         |
|   4 |   SORT GROUP BY                          |                            |      1 |     29 |     29 |00:00:08.51 |   18493 |  21345 |   2910 |  6144 |  6144 | 6144  (0)|         |
|*  5 |    HASH JOIN                             |                            |      1 |     15M|     15M|00:00:03.93 |   18493 |  21345 |   2910 |    48M|  6400K|   65M (1)|   25600 |
|   6 |     VIEW                                 |                            |      1 |    989K|    989K|00:00:00.36 |    9233 |   9218 |      0 |       |       |          |         |
|   7 |      TABLE ACCESS FULL                   | SYS_TEMP_0FD9D665B_E2772D3 |      1 |    989K|    989K|00:00:00.35 |    9233 |   9218 |      0 |       |       |          |         |
|   8 |     VIEW                                 |                            |      1 |    989K|    989K|00:00:00.40 |    9257 |   9217 |      0 |       |       |          |         |
|   9 |      TABLE ACCESS FULL                   | SYS_TEMP_0FD9D665B_E2772D3 |      1 |    989K|    989K|00:00:00.39 |    9257 |   9217 |      0 |       |       |          |         |
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   5 - access("T1A"."OBJECT_ID"="T1B"."OBJECT_ID")

Critically this plan shows us two uses of the temp space but only reports one of them as Used-Tmp. The “hash join” at operation 5 tells us that it reached 65MB of (tunable PGA) memory before going “1-pass”, eventually dumping 25,600 KB to disc. This space usage is corroborated by the 2,910 writes (which, at an 8KB block size, would be 23,280 KB). The missing Used-Tmp, however, is the space taken up by the materialized CTE. We can see that operation 2 is a “load as select” that writes 9,217 blocks to disc (subsequently read back twice – the tablescans shown in operations 7 and 9). That’s  74,000 KB of temp space that doesn’t get reported Used-Tmp.

If we take a look at the plan from 19.3 we see different numbers, but the same “error of omission”:

SQL_ID  1cwabt12zq6zb, child number 0
-------------------------------------

Plan hash value: 1682228242

----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                                | Name                       | Starts | E-Rows | A-Rows |   A-Time   | Buffers | Reads  | Writes |  OMem |  1Mem | Used-Mem | Used-Tmp|
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                         |                            |      1 |        |     25 |00:00:08.15 |   34905 |  13843 |   8248 |       |       |          |         |
|   1 |  TEMP TABLE TRANSFORMATION               |                            |      1 |        |     25 |00:00:08.15 |   34905 |  13843 |   8248 |       |       |          |         |
|   2 |   LOAD AS SELECT (CURSOR DURATION MEMORY)| SYS_TEMP_0FD9D6624_E259E68 |      1 |        |      0 |00:00:01.26 |   23706 |      0 |   5593 |  2070K|  2070K|          |         |
|   3 |    TABLE ACCESS FULL                     | T1                         |      1 |    907K|    907K|00:00:00.21 |   18024 |      0 |      0 |       |       |          |         |
|   4 |   SORT GROUP BY                          |                            |      1 |     25 |     25 |00:00:06.89 |   11193 |  13843 |   2655 |  6144 |  6144 | 6144  (0)|         |
|*  5 |    HASH JOIN                             |                            |      1 |     14M|     14M|00:00:03.55 |   11193 |  13843 |   2655 |    44M|  6400K|   64M (1)|      23M|
|   6 |     VIEW                                 |                            |      1 |    907K|    907K|00:00:00.26 |    5598 |   5594 |      0 |       |       |          |         |
|   7 |      TABLE ACCESS FULL                   | SYS_TEMP_0FD9D6624_E259E68 |      1 |    907K|    907K|00:00:00.25 |    5598 |   5594 |      0 |       |       |          |         |
|   8 |     VIEW                                 |                            |      1 |    907K|    907K|00:00:00.34 |    5595 |   5594 |      0 |       |       |          |         |
|   9 |      TABLE ACCESS FULL                   | SYS_TEMP_0FD9D6624_E259E68 |      1 |    907K|    907K|00:00:00.33 |    5595 |   5594 |      0 |       |       |          |         |
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   5 - access("T1A"."OBJECT_ID"="T1B"."OBJECT_ID")


With slightly fewer rows in t1 (907K vs. 989K) we write 5,593 blocks for the materialized CTE  (instead of 9,217) and spill 2,655 blocks during the hash join (instead of 2,910). But again it’s only the hash join spill that is reported under Used-Tmp. Note, by the way, that the Used-Tmp in 12.2 was reported in KB when it’s reported in MB in 19.3.0.0.

Side note: comparing the number of rows created and blocks written for the CTE, it looks as if 19.3 is using the data blocks much more efficiently than 12.2. There’s no obvious reason for this (though a first guess would be that the older mechanism is to write a GTT with pctfree=10 while the new avoid any free space and transactional details) so, as ever, I now have another draft for a blog note reminding me to investigate (eventually) what differences there are in CTE storage on the upgrade. It’s something that might make a difference in a few special cases.

With the figures from the execution plans in mind we can now look at the results of the query against v$active_session_history. Conveniently the queries took a few seconds to complete, so we’re going to see several rows for each execution.

First the results from 12.2.0.1

SESSION_ID SESSION_SERIAL#  SAMPLE_ID SQL_ID           PGA_ALLOCATED TEMP_SPACE_ALLOCATED
---------- --------------- ---------- ------------- ---------------- --------------------
        14           22234   15306218 1cwabt12zq6zb       95,962,112            1,048,576
                             15306219 1cwabt12zq6zb       97,731,584           37,748,736
                             15306220 1cwabt12zq6zb      148,194,304           77,594,624
                             15306221 1cwabt12zq6zb      168,117,248           85,983,232
                             15306222 1cwabt12zq6zb      168,117,248           90,177,536
                             15306223 1cwabt12zq6zb      168,117,248           95,420,416
                             15306224 1cwabt12zq6zb      168,117,248           98,566,144
                             15306225 1cwabt12zq6zb      168,117,248          102,760,448
                             15306226 1cwabt12zq6zb      116,933,632          103,809,024
                             15306227 1cwabt12zq6zb      116,933,632          103,809,024
                             15306228 b66ycurnwpgud        8,602,624            1,048,576

I pointed out that we had 25,600 KB reported as Used-Tmp and roughly 74,000 KB unreported – a total of nearly 100,000 KB that is reasonably close to the 103,800,000 bytes reported by ASH. Moreover the timing of the plan (loading the CTE in the first 2 seconds) seems to agree with the growth to 77,590,000 of temp_space_allocated by the time we get to sample_id 15306220 in ASH. Then we have several seconds of slow growth as the hash join takes place and feeds its resulte up to the sort group by. At the end of the query we happen to have been lucky enough to catch one last sample just before the session had released all its temp space and ceased to be active.  (Note: however, that the sql_id at that sample point was not the sql_id of our big query – and that’s a clue about one of the limitations of using ASH to find the greedy SQL.)

We see the same pattern of behaviour in 19.3.0.0:


SESSION_ID SESSION_SERIAL#  SAMPLE_ID SQL_ID           PGA_ALLOCATED TEMP_SPACE_ALLOCATED
---------- --------------- ---------- ------------- ---------------- --------------------
       136           42767    2217500 1cwabt12zq6zb      143,982,592           46,137,344
                              2217501 1cwabt12zq6zb      193,527,808           54,525,952
                              2217502 1cwabt12zq6zb      193,527,808           57,671,680
                              2217503 1cwabt12zq6zb      193,527,808           61,865,984
                              2217504 1cwabt12zq6zb      197,722,112           67,108,864
                              2217505 1cwabt12zq6zb      150,601,728           70,254,592
                              2217506 1cwabt12zq6zb      150,601,728           70,254,592

We start with an almost instantaneous jump to 46MB of temp_space_allocated in the first second of the query – that’s the 5,593 blocks of the CTE being materialized, then the slow growth of temp space as the hash join runs, spills to disc, and passes its data up to the sort group by. Again we can see that the peak usage was the CTE (46MB) plus the reported spill of 23MB (plus rounding errors and odd bits).

Preliminary Observations

Queries against ASH (v$active_session_history) can show us sessions that were holding space in the temporary tablespace at the moment a sample of active sessions was taken. This may allow us to identify greedy sessions that were causing other sessions to fail with ORA-01652 (unable to allocate temp segment).

We have seen that there is at least one case where we get better information about temp space allocation from ASH than we do from the variants on v$sql_plan that include the SQL Workarea information (v$sql_workarea, v$sql_workarea_active) because the space acquired during materialization of CTEs is not reported as a “tunable SQL workarea” but does appear in the ASH temp_space_allocated.

At first sight it looks as if we may be able to use the query against ASH to identify the statement (by sql_id) that was the one being run by the greedy session when it consumed all the space. As we shall see in a further article, there are various reasons why this may over-optimistic, however in many cases there’s a fair chance that when you see the same sql_id appearing in a number of consecutive rows of the report then that statement may be the thing that is responsible for the recent growth in temp space usage – and you can query v$sql to find the text and call dbms_xplan.display_cursor() to get as much execution plan information as possible.

Further questions

  • When does a session release the temp_space_allocated? Will the space be held (locked) as long as the cursor is open, or can it be released when it is no longer needed? Will it be held, but releasable, even after the cursor has (from the client program’s perspective) been closed?
  • Could we be fooled by a report that said a session was holding a lot of space when it didn’t need it and would have released it if the demand had appeared?
  • Under what conditions might the temp_space_allocated in an ASH sample have nothing to do with the sql_id reported in the same sample?
  • Are there any other reasons why ASH might report temp_space_allocated when an execution plan doesn’t?
  • Is temp_space_allocated only about the temporary tablespace, or could it include informatiom about other (“permanent”) tablespaces ?

Stay tuned for further analysis of the limitations of using v$active_session_history.temp_space_allocated to help identify the srouce of a space management ORA-01652 issue.

 

 

December 4, 2019

E-rows / A-rows

Filed under: Execution plans,Oracle — Jonathan Lewis @ 1:17 pm GMT Dec 4,2019

This note was prompted by an error I made at the UKOUG TechFest19 yesterday. It’s fairly well-known that when you read an execution plan that includes the rowsource execution stats – so you get the E-rows (estimated) and A-rows (Actual) reported – then a sensible check of the quality of the optimizer’s calculations is to compare the estimates and actuals allowing for the fact that the E-rows is “per start” and the A-rows is “cumulative”, so A-rows = E-rows * Starts.

The error I made yesterday was to forget that this relationship isn’t always true. In particular partitioning and parallel query introduced the need to be a little flexibility in reading the numbers – which I’ll demonstrate with a coupld of simple examples running under 12.2.0.1


rem
rem     Script:         estimate_actual.sql
rem     Author:         Jonathan Lewis
rem     Dated:          Nov 2019
rem

create table pt_composite_1 (
        id,
        grp,
        small_vc,
        padding
)
nologging
partition by range(id) 
subpartition by hash (grp)
subpartitions 4
(
        partition p1 values less than (  4000),
        partition p2 values less than (  8000),
        partition p3 values less than ( 16000),
        partition p4 values less than ( 32000),
        partition p5 values less than ( 64000),
        partition p6 values less than (128000)
)
as
with generator as (
        select 
                rownum id
        from dual 
        connect by 
                level <= 1e4 -- > comment to avoid WordPress format issue
)
select
        rownum                                            id,
        trunc(rownum/100)                                 grp,
        cast(to_char(trunc(rownum/20)) as varchar2(10))   small_vc,
        cast(rpad('x',100) as varchar2(100))              padding
from
        generator       g1,
        generator       g2
where 
        rownum <= 1e5 -- > comment to avoid WordPress format issue
/

create table t3 
nologging pctfree 80
storage (initial 1M next 1M)
as
select * from pt_composite_1
/

All I’ve done is create a couple of tables with 100,000 rows each – and now I’m going to count the rows and see what I get from the execution plans with rowsource execution stats enabled:


set serveroutput off
alter session set statistics_level = all;

prompt  =================
prompt  Partition effects
prompt  =================

select count(id) from pt_composite_1;
select * from table(dbms_xplan.display_cursor(null,null,'allstats last'));

prompt  ================
prompt  Parallel effects
prompt  ================

select /*+ parallel (t3 4) */ count(id) from t3;
select * from table(dbms_xplan.display_cursor(null,null,'allstats'));

With a little cosmetic tidying, here are the two execution plans (note that I haven’t used the “last” format option when reporting the parallel plan:


=================
Partition effects
=================

select count(id) from pt_composite_1

-------------------------------------------------------------------------------------------------
| Id  | Operation            | Name           | Starts | E-Rows | A-Rows |   A-Time   | Buffers |
-------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT     |                |      1 |        |      1 |00:00:00.04 |    1866 |
|   1 |  SORT AGGREGATE      |                |      1 |      1 |      1 |00:00:00.04 |    1866 |
|   2 |   PARTITION RANGE ALL|                |      1 |    100K|    100K|00:00:00.04 |    1866 |
|   3 |    PARTITION HASH ALL|                |      6 |    100K|    100K|00:00:00.04 |    1866 |
|   4 |     TABLE ACCESS FULL| PT_COMPOSITE_1 |     24 |    100K|    100K|00:00:00.04 |    1866 |
-------------------------------------------------------------------------------------------------


================
Parallel effects
================

select /*+ parallel (t3 4) */ count(id) from t3

------------------------------------------------------------------------------------------------------
| Id  | Operation              | Name     | Starts | E-Rows | A-Rows |   A-Time   | Buffers | Reads  |
------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT       |          |      1 |        |      1 |00:00:00.04 |      20 |      0 |
|   1 |  SORT AGGREGATE        |          |      1 |      1 |      1 |00:00:00.04 |      20 |      0 |
|   2 |   PX COORDINATOR       |          |      1 |        |      4 |00:00:00.04 |      20 |      0 |
|   3 |    PX SEND QC (RANDOM) | :TQ10000 |      0 |      1 |      0 |00:00:00.01 |       0 |      0 |
|   4 |     SORT AGGREGATE     |          |      4 |      1 |      4 |00:00:00.11 |    8424 |   7692 |
|   5 |      PX BLOCK ITERATOR |          |      4 |    100K|    100K|00:00:00.11 |    8424 |   7692 |
|*  6 |       TABLE ACCESS FULL| T3       |     61 |    100K|    100K|00:00:00.06 |    8424 |   7692 |
------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   6 - access(:Z>=:Z AND :Z<=:Z)

As you can see, the lines specifying partition selection report E-Rows for the whole table, not for any partition-level approximation, so for operations 3 and 4 we shouldn’t multiply Starts by E-rows to compare with A-row. (Starts = 6 for operation 3 because we have 6 partitions, and Start = 24 for operation 4 because at the lowest level we have a total of 24 data segments).

For the parallel query we see the same pattern – every parallel slave reports the expected total number of rows, and every “block iterator” (rowid range) reports the expected total number of rows. Again we see that using multiplication to compare E-rows and A-rows would not be valid.

In fact it’s not just partitioning and parallelism that can cause confusion. Even something as simple as a serial nested loop join has a couple of surprises (largely thanks to the evolution of the mechanics – without a matching adjustment to the execution plans – over time). Here’s a script to generate a couple of tables, which we will then join – hinting various mechanisms for the nested loop.


create table t1
as
select
        rownum           id,
        mod(rownum,100)  n1,
        cast(lpad(rownum,20) as varchar2(20)) v1 
from
        dual
connect by
        level <= 1000 -- > comment to avoid WordPress format issue
;

create table t2
as
select  * from t1
union all
select  * from t1
union all
select  * from t1
;

create index t2_i1 on t2(id);

It’s not a subtle test – as you can see we have 3 rows in table t2 for every row in t1. So let’s pick some t1 rows and join to t2 on id. Again it’s 12.2.0.1:

set serveroutput off
alter session set statistics_level = all;

prompt  ==============================
prompt  Nested loop join (traditional)
prompt  ==============================

select
        /*+ 
                leading(t1 t2) use_nl_with_index(t2 t2_i1) 
                opt_param('_nlj_batching_enabled', 0)
                no_nlj_prefetch(t2)
        */
        t1.v1, t2.v1
from
        t1, t2
where
        t1.n1 = 4
and
        t2.id = t1.id
;

select * from table(dbms_xplan.display_cursor(null,null,'allstats last'));

prompt  ==============================
prompt  Nested loop join with prefetch
prompt  ==============================

select
        /*+ 
                leading(t1 t2) use_nl_with_index(t2 t2_i1) 
                nlj_prefetch(t2) 
        */
        t1.v1, t2.v1
from
        t1, t2
where
        t1.n1 = 4
and
        t2.id = t1.id
;

select * from table(dbms_xplan.display_cursor(null,null,'allstats last'));

prompt  ==============================
prompt  Nested loop join with batching
prompt  ==============================

select
        /*+ 
                leading(t1 t2) use_nl_with_index(t2 t2_i1) 
                nlj_batching(t2) 
        */
        t1.v1, t2.v1
from
        t1, t2
where
        t1.n1 = 4
and
        t2.id = t1.id
;

select * from table(dbms_xplan.display_cursor(null,null,'allstats last'));

Here are the three plans, with a couple of comments after each – all three queries returned the same 30 *- 10 * 3) rows.


==============================
Nested loop join (traditional)
==============================

--------------------------------------------------------------------------------------------------------
| Id  | Operation                            | Name  | Starts | E-Rows | A-Rows |   A-Time   | Buffers |
--------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                     |       |      1 |        |     30 |00:00:00.01 |      60 |
|   1 |  NESTED LOOPS                        |       |      1 |     30 |     30 |00:00:00.01 |      60 |
|*  2 |   TABLE ACCESS FULL                  | T1    |      1 |     10 |     10 |00:00:00.01 |      15 |
|   3 |   TABLE ACCESS BY INDEX ROWID BATCHED| T2    |     10 |      3 |     30 |00:00:00.01 |      45 |
|*  4 |    INDEX RANGE SCAN                  | T2_I1 |     10 |      3 |     30 |00:00:00.01 |      15 |
--------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - filter("T1"."N1"=4)
   4 - access("T2"."ID"="T1"."ID")

This is the original nested loop structured (apart from the “batched” option that appeared in 12c) and follows the rule/guideline:

  • Operation 2 operates once and returns the 10 rows predicted.
  • Operation 3 is started 10 times by operation 1, with a prediction of 3 rows per start – and the actual comes out at 30 rows.
  • Operation 4 is started 10 times (once for each start of operation 3), with a predication of 3 rowids per start – and the actual comes out at 30 rows
==============================
Nested loop join with prefetch
==============================

-------------------------------------------------------------------------------------------------------
| Id  | Operation                           | Name  | Starts | E-Rows | A-Rows |   A-Time   | Buffers |
-------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                    |       |      0 |        |      0 |00:00:00.01 |       0 |
|   1 |  TABLE ACCESS BY INDEX ROWID BATCHED| T2    |      1 |      3 |     30 |00:00:00.01 |      60 |
|   2 |   NESTED LOOPS                      |       |      1 |     30 |     30 |00:00:00.01 |      30 |
|*  3 |    TABLE ACCESS FULL                | T1    |      1 |     10 |     10 |00:00:00.01 |      15 |
|*  4 |    INDEX RANGE SCAN                 | T2_I1 |     10 |      3 |     30 |00:00:00.01 |      15 |
-------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   3 - filter("T1"."N1"=4)
   4 - access("T2"."ID"="T1"."ID")
 

Again in the order of rowsource generation

  • Operation 3 starts once with a prediction of 10 rows and the rule works.
  • Operation 4 is started 10 times by operation 2, with a prediction of 3 rows (rowids) per start, and the rule works.
  • Operation 2 was started once by operation 1, with a predication of 30 rows (rowids), and the rule works.
  • Operation 1 starts once, but the prediction is reported as the value you would have got from the original NLJ shape – and breaks the rule.
==============================
Nested loop join with batching
==============================

------------------------------------------------------------------------------------------------
| Id  | Operation                    | Name  | Starts | E-Rows | A-Rows |   A-Time   | Buffers |
------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |       |      1 |        |     30 |00:00:00.01 |      60 |
|   1 |  NESTED LOOPS                |       |      1 |     30 |     30 |00:00:00.01 |      60 |
|   2 |   NESTED LOOPS               |       |      1 |     30 |     30 |00:00:00.01 |      30 |
|*  3 |    TABLE ACCESS FULL         | T1    |      1 |     10 |     10 |00:00:00.01 |      15 |
|*  4 |    INDEX RANGE SCAN          | T2_I1 |     10 |      3 |     30 |00:00:00.01 |      15 |
|   5 |   TABLE ACCESS BY INDEX ROWID| T2    |     30 |      3 |     30 |00:00:00.01 |      30 |
------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   3 - filter("T1"."N1"=4)
   4 - access("T2"."ID"="T1"."ID")


In the order in which rowsources are created

  • Operation 3 starts once with a prediction of 10 rows – and the A-rows matches the rule
  • Operation 4 is started 10 times by opreation 2, with a prediction of 3 rows per start – and the A-rows matches the rule.
  • Operation 5 is started 30 times by operation 1, with a prediction of 3 rows per start – again reporting the value that you would have seen from the original representation of the NLJ, the prediction obviously should be 1 – so the rule is broken again

tl;dr

It is important to remember that the basic rule of “A-rows = starts * E-rows” does not hold for the partition-related lines or the PX related lines of partitioned and parallel execution plans.

You may also find a few other cases where you need be a little cautious about trusting the rule without first thinking carefully about the mechanics of what the shape of the plan is telling you.

November 17, 2019

Parse Time

Filed under: Oracle,Troubleshooting — Jonathan Lewis @ 7:37 pm GMT Nov 17,2019

This is a note I started drafting In October 2012. It’s a case study from an optimizer (10053) trace file someone emailed to me, and it describes some of the high-level steps I went through to see if I could pinpoint what the optimizer was doing that fooled it into spending a huge amount of time optimising a statement that ultimately executed very quickly.

Unfortunately I never finished my notes and I can no longer find the trace file that the article was based on, so I don’t really know what I was planning to say to complete the last observation I had recorded.

I was prompted a  couple of days ago to publish the notes so far becuase I was reminded in a conversation with members of the Oak Table Network about an article that Franck Pachot wrote a couple of years ago. In 12c Oracle Corp. introduced a time-reporting mechanism for the optimizer trace. If some optimisation step takes “too long” (1 second, by default) then then optimizer will write a “TIMER:” line into the trace file telling you what the operation was and how long it took to complete and how much CPU time it used.  The default for “too long” can be adjusted by setting a “fix control”.  This makes it a lot easier to find out where the time went if you see a very long parse time.

But let’s get back to the original trace file and drafted blog note. It started with a question on OTN and an extract from a tkprof output to back up a nasty  performance issue.

=============================================================================================

 

What do you do about a parse time of 46 seconds ? That was the question that came up on OTN a few days ago – and here’s the tkprof output to demonstrate it.

call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        1     46.27      46.53          0          5          0           0
Execute      1      0.00       0.00          0          0          0           0
Fetch        2      0.33       0.63        129      30331          0           1
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total        4     46.60      47.17        129      30336          0           1

Misses in library cache during parse: 1
Optimizer mode: ALL_ROWS
Parsing user id: 144  
Number of plan statistics captured: 1
 
Rows (1st) Rows (avg) Rows (max)  Row Source Operation
---------- ---------- ----------  ---------------------------------------------------
         1          1          1  SORT AGGREGATE (cr=30331 pr=129 pw=0 time=637272 us)
       863        863        863   VIEW  VM_NWVW_1 (cr=30331 pr=129 pw=0 time=637378 us cost=1331 size=10 card=1)
       ... and lots more lines of plan

According to tkprof, it takes 46 seconds – virtually all CPU time – to optimise this statement, then 0.63 seconds to run it. You might spot that this is 11gR2 (in fact it’s 11.2.0.3) from the fact that the second line of the “Row Source Operation” includes a report of the estimated cost of the query, which is only 1,331.

Things were actually worse than they seem at first sight; when we saw more of tkprof output the following also showed up:

SELECT /* OPT_DYN_SAMP */ /*+ ALL_ROWS IGNORE_WHERE_CLAUSE 
  NO_PARALLEL(SAMPLESUB) opt_param('parallel_execution_enabled', 'false') 
  NO_PARALLEL_INDEX(SAMPLESUB) NO_SQL_TUNE */ NVL(SUM(C1),:"SYS_B_00"), 
  NVL(SUM(C2),:"SYS_B_01") 
FROM
 (SELECT /*+ IGNORE_WHERE_CLAUSE NO_PARALLEL("VAL_000002") FULL("VAL_000002") 
  NO_PARALLEL_INDEX("VAL_000002") */ :"SYS_B_02" AS C1, 
  CASE WHEN
    ...
  END AS C2 FROM "BISWEBB"."RECORDTEXTVALUE" 
  SAMPLE BLOCK (:"SYS_B_21" , :"SYS_B_22") SEED (:"SYS_B_23") "VAL_000002" 
  WHERE ... 
 ) SAMPLESUB
 
 
call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        5      0.00       0.00          0          0          0           0
Execute      5      0.00       0.00          0          0          0           0
Fetch        5     21.41      24.14      11108      37331          0           5
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total       15     21.41      24.15      11108      37331          0           5
 
Misses in library cache during parse: 1
Misses in library cache during execute: 1
Optimizer mode: ALL_ROWS
Parsing user id: 144     (recursive depth: 1)
Number of plan statistics captured: 3
 
Rows (1st) Rows (avg) Rows (max)  Row Source Operation
---------- ---------- ----------  ---------------------------------------------------
         1          1          1  SORT AGGREGATE (cr=7466 pr=3703 pw=0 time=5230126 us)
   3137126    3137126    3137126   PARTITION HASH ALL PARTITION: 1 128 (cr=7466 pr=3703 pw=0 time=2547843 us cost=18758 size=131597088 card=3133264)
   3137126    3137126    3137126    TABLE ACCESS SAMPLE RECORDTEXTVALUE PARTITION: 1 128 (cr=7466 pr=3703 pw=0 time=2372509 us cost=18758 size=131597088 card=3133264)

This piece of SQL executed five times as the query was optimised, adding a further 24 seconds elapsed time and 21 CPU seconds which, surprisingly, weren’t included in the headline 46 seconds. The total time spent in optimising the statement was around 70 seconds, of which about 68 seconds were spent on (or waiting for) the CPU.

This is unusual – I don’t often see SQL statements taking more than a few seconds to parse – not since 8i, and not without complex partition views – and I certainly don’t expect to see a low cost query in 11.2.0.3 taking anything like 70 (or even 46) seconds to optimise.

The OP had enabled the 10046 and the 10053 traces at the same time – and since the parse time was sufficiently unusual I asked him to email me the raw trace file – all 200MB of it.

Since it’s not easy to process 200MB of trace the first thing to do is extract a few headline details, and I thought you might be interested to hear about some of the methods I use on the rare occasions when I decide to look at a 10053.

My aim is to investigate a very long parse time and the tkprof output had already shown me that there were a lot of tables in the query, so I had the feeling that the problem would relate to the amount of work done testing possible join orders; I’ve also noticed that the dynamic sampling code ran five times – so I’m expecting to see some critical stage of the optimisation run 5 times (although I don’t know why it should).

Step 1: Use grep (or find if you’re on Windows) to do a quick check for the number of join orders considered. I’m just searching for the text “Join order[” appearing at the start of line and then counting how many times I find it:

[jonathan@linux01 big_trace]$ grep "^Join order\[" orcl_ora_25306.trc  | wc -l
6266

That’s 6,266 join orders considered – let’s take a slightly closer look:

[jonathan@linux01 big_trace]$ grep -n "^Join order\[" orcl_ora_25306.trc >temp.txt
[jonathan@linux01 big_trace]$ tail -2 temp.txt
4458394:Join order[581]:  RECORDTYPEMEMBER[RTM]#9  RECORD_[VAL_000049]#13  ...... from$_subquery$_008[TBL_000020]#2
4458825:Join order[1]:  VM_NWVW_1[VM_NWVW_1]#0

The line of dots represents another 11 tables (or similar objects) in the join order. But there are only 581 join orders (apparently) before the last one in the file (which is a single view transformation). I’ve used the “-n” option with grep, so if I wanted to look at the right bit of the file I could tail the last few thousand lines, but my machine is happy to use vi on a 200MB file, and a quick search (backwards) through the file finds the number 581 in the following text (which does not appear in all versions of the trace file):

Number of join permutations tried: 581

So a quick grep for “join permutations” might be a good idea. (In the absence of this line I’d have got to the same result by directing the earlier grep for “^Join order\[“ to a file and playing around with the contents of the file.

[jonathan@linux01 big_trace]$ grep -n "join permutations" orcl_ora_25306.trc
11495:Number of join permutations tried: 2
11849:Number of join permutations tried: 1
12439:Number of join permutations tried: 2
13826:Number of join permutations tried: 2
14180:Number of join permutations tried: 1
14552:Number of join permutations tried: 2
15938:Number of join permutations tried: 2
16292:Number of join permutations tried: 1
16665:Number of join permutations tried: 2
18141:Number of join permutations tried: 2
18550:Number of join permutations tried: 2
18959:Number of join permutations tried: 2
622799:Number of join permutations tried: 374
624183:Number of join permutations tried: 2
624592:Number of join permutations tried: 2
624919:Number of join permutations tried: 1
625211:Number of join permutations tried: 2
1759817:Number of join permutations tried: 673
1760302:Number of join permutations tried: 1
1760593:Number of join permutations tried: 2
1760910:Number of join permutations tried: 1
1761202:Number of join permutations tried: 2
2750475:Number of join permutations tried: 674
2751325:Number of join permutations tried: 2
2751642:Number of join permutations tried: 1
2751933:Number of join permutations tried: 2
2752250:Number of join permutations tried: 1
2752542:Number of join permutations tried: 2
3586276:Number of join permutations tried: 571
3587133:Number of join permutations tried: 2
3587461:Number of join permutations tried: 1
3587755:Number of join permutations tried: 2
3588079:Number of join permutations tried: 1
3588374:Number of join permutations tried: 2
4458608:Number of join permutations tried: 581
4458832:Number of join permutations tried: 1

The key thing we see here is that there are five sections of long searches, and a few very small searches. Examination of the small search lists shows that they relate to some inline views which simply join a couple of tables. For each of the long searches we can see that the first join order in each set is for 14 “tables”. This is where the work is going. But if you add up the number of permutations in the long searches you get a total of 2,873, which is a long way off the 6,266 that we found with our grep for “^Join order[“ – so where do the extra join orders come from ? Let’s take a closer look at the file where we dumped all the Join order lines – the last 10 lines look like this:

4452004:Join order[577]:  RECORD_[VAL_000033]#10  from$_subquery$_017[TBL_000029]#1 ...
4452086:Join order[577]:  RECORD_[VAL_000033]#10  from$_subquery$_017[TBL_000029]#1 ...
4453254:Join order[578]:  RECORD_[VAL_000033]#10  from$_subquery$_017[TBL_000029]#1 ...
4453382:Join order[578]:  RECORD_[VAL_000033]#10  from$_subquery$_017[TBL_000029]#1 ...
4454573:Join order[579]:  RECORD_[VAL_000033]#10  from$_subquery$_017[TBL_000029]#1 ...
4454655:Join order[579]:  RECORD_[VAL_000033]#10  from$_subquery$_017[TBL_000029]#1 ...
4455823:Join order[580]:  RECORD_[VAL_000033]#10  from$_subquery$_017[TBL_000029]#1 ...
4455905:Join order[580]:  RECORD_[VAL_000033]#10  from$_subquery$_017[TBL_000029]#1 ...
4457051:Join order[581]:  RECORDTYPEMEMBER[RTM]#9  RECORD_[VAL_000049]#13  ...
4458394:Join order[581]:  RECORDTYPEMEMBER[RTM]#9  RECORD_[VAL_000049]#13  ...
4458825:Join order[1]:  VM_NWVW_1[VM_NWVW_1]#0

Every single join order seems to have appeared twice, and doubling the counts we got for the sum of the permutations gets us close to the total we got for the join order search. Again, we could zoom in a little closer, does the text near the start of the two occurrences of join order 581 give us any clues ? We see the following just before the second one:

****** Recost for ORDER BY (using join row order) *******

The optimizer has tried to find a way of eliminating some of the cost by letting the table join order affect the order of the final output. Let’s do another grep to see how many join orders have been recosted:

[jonathan@linux01 big_trace]$ grep "Recost for ORDER BY" orcl_ora_25306.trc | sort | uniq -c
    452 ****** Recost for ORDER BY (using index) ************
   2896 ****** Recost for ORDER BY (using join row order) *******

So we’ve done a huge amount recosting. Let’s check arithmetic: 452 + 2,896 + 2,873 = 6,221, which is remarkably close to the 6,266 we needed (and we have ignored a few dozen join orders that were needed for the inline views, and the final error is too small for me to worry about).

We can conclude, therefore, that we did a huge amount of work costing a 14 table join a little over 6,000 times. It’s possible, of course, that we discarded lots of join orders very early on in the cost stage, so we could count the number of times we see a “Now joining” message – to complete a single pass on a 14 table join the optimizer will have to report “Now joining” 13 times.

[jonathan@linux01 big_trace]$ grep -n "Now joining" orcl_ora_25306.trc | wc -l
43989

Since the message appeared 44,000 times from 6,200 join orders we have an average of 7 steps evaluated per join order. Because of the way that the optimizer takes short-cuts I think this is a fairly strong clue that most of the join order calculations actually completed, or get very close to completing, over the whole 14 tables. (The optimizer remembers “partial results” from previous join order calculations, so doesn’t have to do 13 “Now joining” steps on every single join order.)

We still need to know why the optimizer tried so hard before supplying a plan – so let’s look for the “Best so far” lines, which the trace file reports each time the optimizer finds a better plan than the previous best. Here’s an example of what we’re looking for:

       Cost: 206984.61  Degree: 1  Resp: 206984.61  Card: 0.00 Bytes: 632
***********************
Best so far:  Table#: 0  cost: 56.9744  card: 1.0000  bytes: 30
              Table#: 3  cost: 59.9853  card: 0.0000  bytes: 83
              Table#: 6  cost: 60.9869  card: 0.0000  bytes: 151
              Table#:10  cost: 61.9909  card: 0.0000  bytes: 185
              Table#: 5  cost: 62.9928  card: 0.0000  bytes: 253
              Table#: 2  cost: 65.0004  card: 0.0000  bytes: 306
              Table#: 1  cost: 122.4741  card: 0.0000  bytes: 336
              Table#: 8  cost: 123.4760  card: 0.0000  bytes: 387
              Table#: 4  cost: 125.4836  card: 0.0000  bytes: 440
              Table#: 7  cost: 343.2625  card: 0.0000  bytes: 470
              Table#: 9  cost: 345.2659  card: 0.0000  bytes: 530
              Table#:11  cost: 206981.5979  card: 0.0000  bytes: 564
              Table#:12  cost: 206982.6017  card: 0.0000  bytes: 598
              Table#:13  cost: 206984.6055  card: 0.0000  bytes: 632
***********************

As you can see, we get a list of the tables (identified by their position in the first join order examined) with details of accumulated cost. But just above this tabular display there’s a repeat of the cost that we end up with. So let’s write, and apply, a little awk script to find all the “Best so far” lines and then print the line two above. Here’s a suitable script, followed by a call to use it:

{
        if (index($0,"Best so far") != 0) {print NR m2}
        m2 = m1; m1 = $0;
}

awk -f cost.awk orcl_ora_25306.trc >temp.txt

There was a bit of a mess in the output – there are a couple of special cases (relating, in our trace file, to the inline views and the appearance of a “group by placement”) that cause irregular patterns to appear, but the script was effective for the critical 14 table join. And looking through the list of costs for the various permutations we find that almost all the options show a cost of about 206,000 – except for the last few in two of the five “permutation sets” that suddenly drop to costs of around 1,500 and 1,300. The very high starting cost explains why the optimizer was prepared to spend so much time trying to find a good path and why it kept working so hard until the cost dropped very sharply.

Side bar: I have an old note from OCIS (the precursor or the precursor of the precursor of MOS) that the optimizer will stop searching when the number of join orders tests * the number of “non-single-row” tables (according to the single table access path) * 0.3 is greater than the best cost so far.  I even have a test script (run against 8.1.7.4, dated September 2002) that seems to demonstrate the formula.  The formula may be terribly out of date by now and the rules of exactly how and when it applies may have changed – the model didn’t seem to work when I ran it against 19.3 – but the principle probably still holds true.

At this point we might decide that we ought to look at the initial join order and at the join order where the cost dropped dramatically, and try to work out why Oracle picked such a bad starting join order, and what it was about the better join order that the optimizer had missed. This might allow us to recognise some error in the statistics for either the “bad” starting order or the “good” starting order and allow us to solve the problem by (e.g.) creating a column group or gather some specific statistics. We might simply decide that we’ll take a good join order and pass it to the optimizer through a /*+ leading() */ hint, or simply take the entire outline and attach it to the query through a faked SQL Profile (or embedded set of hints).

However, for the purposes of this exercise (and because sometimes you have to find a strategic solution rather than a “single statement” solution) I’m going to carry on working through mechanisms for dissecting the trace file without looking too closely at any of the fine detail.

The final “high-level” target I picked was to pin down why there were 5 sets of join orders. I had noticed something particular about the execution plan supplied – it showed several occurrences of the operation “VIEW PUSHED PREDICATE” so I wondered if this might be relevant. So I did a quick check near the start of the main body of the trace file for anything that might be a clue, and found the following just after the “QUERY BLOCK SIGNATURE”.

QUERY BLOCK SIGNATURE
---------------------
  signature(): NULL
***********************************
Cost-Based Join Predicate Push-down
***********************************
JPPD: Checking validity of push-down in query block SEL$6E5D879B (#4)
JPPD:   Checking validity of push-down from query block SEL$6E5D879B (#4) to query block SEL$C20BB4FE (#6)
Check Basic Validity for Non-Union View for query block SEL$C20BB4FE (#6)
JPPD:     JPPD bypassed: View has non-standard group by.
JPPD:   No valid views found to push predicate into.
JPPD: Checking validity of push-down in query block SEL$799AD133 (#3)
JPPD:   Checking validity of push-down from query block SEL$799AD133 (#3) to query block SEL$EFE55ECA (#7)
Check Basic Validity for Non-Union View for query block SEL$EFE55ECA (#7)
JPPD:     JPPD bypassed: View has non-standard group by.
JPPD:   No valid views found to push predicate into.
JPPD: Checking validity of push-down in query block SEL$C2AA4F6A (#2)
JPPD:   Checking validity of push-down from query block SEL$C2AA4F6A (#2) to query block SEL$799AD133 (#3)
Check Basic Validity for Non-Union View for query block SEL$799AD133 (#3)
JPPD:     Passed validity checks
JPPD:   Checking validity of push-down from query block SEL$C2AA4F6A (#2) to query block SEL$6E5D879B (#4)
Check Basic Validity for Non-Union View for query block SEL$6E5D879B (#4)
JPPD:     Passed validity checks
JPPD:   Checking validity of push-down from query block SEL$C2AA4F6A (#2) to query block SEL$FC56C448 (#5)
Check Basic Validity for Non-Union View for query block SEL$FC56C448 (#5)
JPPD:     Passed validity checks
JPPD: JPPD:   Pushdown from query block SEL$C2AA4F6A (#2) passed validity checks.
Join-Predicate push-down on query block SEL$C2AA4F6A (#2)
JPPD: Using search type: linear
JPPD: Considering join predicate push-down
JPPD: Starting iteration 1, state space = (3,4,5) : (0,0,0)

As you can see we are doing cost-based join-predicate pushdown, and there are three targets which are valid for the operation. Notice the line that says “using search type: linear”, and the suggestive “starting iteration 1” – let’s look for more lines with “Starting iteration”

[jonathan@linux01 big_trace]$ grep -n "Starting iteration" orcl_ora_25306.trc
9934:GBP: Starting iteration 1, state space = (20,21) : (0,0)
11529:GBP: Starting iteration 2, state space = (20,21) : (0,C)
11562:GBP: Starting iteration 3, state space = (20,21) : (F,0)
12479:GBP: Starting iteration 4, state space = (20,21) : (F,C)
12517:GBP: Starting iteration 1, state space = (18,19) : (0,0)
13860:GBP: Starting iteration 2, state space = (18,19) : (0,C)
13893:GBP: Starting iteration 3, state space = (18,19) : (F,0)
14587:GBP: Starting iteration 4, state space = (18,19) : (F,C)
14628:GBP: Starting iteration 1, state space = (16,17) : (0,0)
15972:GBP: Starting iteration 2, state space = (16,17) : (0,C)
16005:GBP: Starting iteration 3, state space = (16,17) : (F,0)
16700:GBP: Starting iteration 4, state space = (16,17) : (F,C)
16877:JPPD: Starting iteration 1, state space = (3,4,5) : (0,0,0)
622904:JPPD: Starting iteration 2, state space = (3,4,5) : (1,0,0)
1759914:JPPD: Starting iteration 3, state space = (3,4,5) : (1,1,0)
2750592:JPPD: Starting iteration 4, state space = (3,4,5) : (1,1,1)

There are four iterations for state space (3,4,5) – and look at the huge gaps between their “Starting iteration” lines. In fact, let’s go a little closer and combine their starting lines with the lines above where I searched for “Number of join permutations tried:”


16877:JPPD: Starting iteration 1, state space = (3,4,5) : (0,0,0)
622799:Number of join permutations tried: 374

622904:JPPD: Starting iteration 2, state space = (3,4,5) : (1,0,0)
1759817:Number of join permutations tried: 673

1759914:JPPD: Starting iteration 3, state space = (3,4,5) : (1,1,0)
2750475:Number of join permutations tried: 674

2750592:JPPD: Starting iteration 4, state space = (3,4,5) : (1,1,1)
3586276:Number of join permutations tried: 571

4458608:Number of join permutations tried: 581

At this point my notes end and I don’t know where I was going with the investigation. I know that I suggested to the OP that the cost-based join predicate pushdown was having a huge impact on the optimization time and suggested he experiment with disabling the feature. (Parse time dropped dramatically, but query run-time went through the roof – so that proved a point, but wasn’t a useful strategy). I don’t know, however, what the fifth long series of permutations was for, so if I could find the trace file one of the things I’d do next would be to look at the detail a few lines before line 4,458,608 to see what triggered that part of the re-optimization. I’d also want to know whether the final execution plan came from the fifth series and could be reached without involving all the join predicate pushdown work, or whether it was a plan that was only going to appear after the optimizer had worked through all 4 iterations.

The final plan did involve all 3 pushed predicates (which looksl like it might have been from iteration 4), so it might have been possible to find a generic strategy for forcing unconditional predicate pushing without doing all the expensive intermediate work.

Version 12c and beyond

That was then, and this is now. And something completely different might have appeared in 12c (or 19c) – but the one thing that is particularly helpful is that you can bet that every iteration of the JPPD state spaces would have produced a “TIMER:” line in the trace file, making it very easy to run grep -n “TIMER:” (or -nT as I recently discovered) against the trace file to pinpoint the issue very quickly.

Here’s an example from my “killer_parse.sql” query after setting “_fix_control”=’16923858:4′ (1e4 microseconds = 1/100th second) in an instance of 19c:


$ grep -nT TIMER or19_ora_21051.trc

16426  :TIMER:      bitmap access paths cpu: 0.104006 sec elapsed: 0.105076 sec
252758 :TIMER:     costing general plans cpu: 0.040666 sec elapsed: 0.040471 sec
309460 :TIMER:      bitmap access paths cpu: 0.079509 sec elapsed: 0.079074 sec
312584 :TIMER: CBQT OR expansion SEL$765CDFAA cpu: 10.474142 sec elapsed: 10.508788 sec
313974 :TIMER: Complex View Merging SEL$765CDFAA cpu: 1.475173 sec elapsed: 1.475418 sec
315716 :TIMER: Table Expansion SEL$765CDFAA cpu: 0.046262 sec elapsed: 0.046647 sec
316036 :TIMER: Star Transformation SEL$765CDFAA cpu: 0.029077 sec elapsed: 0.026912 sec
318207 :TIMER: Type Checking after CBQT SEL$765CDFAA cpu: 0.220506 sec elapsed: 0.219273 sec
318208 :TIMER: Cost-Based Transformations (Overall) SEL$765CDFAA cpu: 13.632516 sec elapsed: 13.666360 sec
328948 :TIMER:      bitmap access paths cpu: 0.093973 sec elapsed: 0.095008 sec
632935 :TIMER: Access Path Analysis (Final) SEL$765CDFAA cpu: 7.703016 sec elapsed: 7.755957 sec
633092 :TIMER: SQL Optimization (Overall) SEL$765CDFAA cpu: 21.539010 sec elapsed: 21.632012 sec

The closing 21.63 seconds (line 633092) is largely 7.7559 seconds (632,935) plus 13.666 seconds (line 318208) Cost-Based Transformation time, and that 13.666 seconds is mostly the 1.475 seconds (line 313,974) plus 10.508 seconds (line 312,584) for CBQT OR expansion – so let’s try disabling OR expansion (alter session set “_no_or_expansion”=true;) and try again:


$ grep -nT TIMER or19_ora_22205.trc
14884  :TIMER:      bitmap access paths cpu: 0.062453 sec elapsed: 0.064501 sec
15228  :TIMER: Access Path Analysis (Final) SEL$1 cpu: 0.256751 sec elapsed: 0.262467 sec
15234  :TIMER: SQL Optimization (Overall) SEL$1 cpu: 0.264099 sec elapsed: 0.268183 sec

Not only was optimisation faster, the runtime was quicker too.

Warning – it’s not always that easy.

 

« Previous PageNext Page »

Powered by WordPress.com.