Oracle Scratchpad

October 2, 2019

_cursor_obsolete_threshold

Filed under: 12c,Infrastructure,Oracle,Performance — Jonathan Lewis @ 2:39 pm BST Oct 2,2019

At the recent Trivadis Performance Days in Zurich, Chris Antognini answered a question that had been bugging me for some time. Why would Oracle want to set the default value of _cursor_obsolete_threshold to a value like 8192 in 12.2 ?

In 11.2.0.3 the parameter was introduced with the default value 100; then in 11.2.0.4, continuing into 12.1, the default value increased to 1,024 – what possible reason could anyone have for thinking that 8192 was a good idea ?

The answer is PDBs – specifically the much larger number of PDBs a single CBD can (theoretically) support in 12.2.

In fact a few comments, and the following specific explanation, are available on MoS in Doc ID 2431353.1 “High Version Counts For SQL Statements (>1024) Post Upgrade To 12.2 and Above Causing Database Slow Performance”:

The default value of _cursor_obsolete_threshold is increased heavily (8192 from 1024) from 12.2 onwards in order to support 4096 PDBs which was only 252 PDBs till 12.1. This parameter value is the maximum limit for obsoleting the parent cursors in an multitenant environment and cannot be increased beyond 8192.

Having said, this is NOT applicable for non-CDB environment and hence for those databases, this parameter should be set to 12.1 default value manually i.e. 1024. The default value of 1024 holds good for non-CDB environment and the same parameter can be adjusted case-to-case basis should there be a problem.

It’s all about PDBs – more precisely, it’s all about CDBs running a huge number of PDBs, which is not necessarily the way that many companies are likely to use PDBs. So if you’re a fairly typical companyy running a handful of PDBs in a single CDB then it’s probably a good idea to set the parameter down to the 12.1 value of 1024 (and for bad applications I’d consider going even lower) – and this MOS note actually makes that an official recommendation.

Impact analysis

What’s the worst that could happen if you actually have many PDBs all executing the same application and that application has a few very popular and frequently executed statements? Chris Antognini described a model he’d constructed and some tests he’d done to show the effects. The following code is a variation onhis work. It addresses the following question:

If you have an application that repeatedly issues (explicitly or implicitly) parse calls but doesn’t take advantage of the session cursor cache it has to search the library cache by hash_value / sql_id for the parent cursor, then has to walk the chain of child cursors looking for the right child. What’s the difference in the work done if this “soft parse” has to walk the list to child number 8,191 instead of finding the right cursor at child number 0.

Here’s the complete code for the test:


create table t1
select 1 id from dual
/

alter table t1 add constraint t1_pk primary key (id)
/

spool cursor_obsolete.lst

alter system flush shared_pool;
alter system flush shared_pool;

set serveroutput off
select /*+ index(t1) */ id from t1 where id > 0;
select * from table(dbms_xplan.display_cursor);

execute snap_my_stats.start_snap
execute snap_my_stats.start_snap

declare
        m_id number;
begin
        for i in 100+1..100+8192 loop
                execute immediate 'alter session set optimizer_index_cost_adj = ' || i ;
                select /*+ index(t1) */ id into m_id from t1 where id > 0;
        end loop;
end;
/

set serveroutput on
execute snap_my_stats.end_snap

column sql_text format a60
select sql_id, child_number, loaded_versions, executions, sql_text from v$sql where sql_text like 'SELECT%T1%' order by child_number;

prompt  ===============
prompt  Low child reuse
prompt  ===============

set serveroutput off
execute snap_my_stats.start_snap

declare
        m_id number;
begin
        for i in 100+1..100+1024 loop
                execute immediate 'alter session set optimizer_index_cost_adj = ' || i ;
                select /*+ index(t1) */ id into m_id from t1 where id > 0;
        end loop;
end;
/

set serveroutput on
execute snap_my_stats.end_snap

prompt  ================
prompt  High child reuse
prompt  ================

set serveroutput off
execute snap_my_stats.start_snap

declare
        m_id number;
begin
        for i in 7168+1..7168+1024 loop
                execute immediate 'alter session set optimizer_index_cost_adj = ' || i ;
                select /*+ index(t1) */ id into m_id from t1 where id > 0;
        end loop;
end;
/

set serveroutput on
execute snap_my_stats.end_snap

spool off

I’ve created a table with just one row and given it a primary key. My testing query is going to be very short and simple. A query hinted to return that one row by primary key index range scan.

I’ve flushed the shared pool (twice) to minimise fringe contention from pre-existing information, then executed the statement to populate the dictionary cache and some library cache information and to check the execution plan.

The call to the package snap_my_stats is my standard method for reporting changes in v$mystat across the test. I’ve called the start_snap procedure twice in a row to make sure that its first load doesn’t add some noise to the statistics that we’re trying to capture.

The test runs in three parts.

  • First I loop 8192 times executing the same statement, but with a different value for the optimizer_index_cost_adj for each execution – this gives me the limit of 8192 child cursors, each reporting “Optimizer Mismatch” as the reason for not sharing. I’ve run a query against v$sql after this to check that I have 8192 child cursors – you’ll need to make sure your shared pool is a few hundred megabytes if you want to be sure of keeping them all in memory.
  • The second part of the test simply repeats the loop, but only for the first 1,024 child cursors. At this point the child cursors exist, so the optimizer should be doing “soft” parses rather than hard parses.
  • The final part of the test repeats the loop again, but only for the last 1,024 child cursors. Again they should exist and be usable, so the optimizer should again be doing “soft” parses rather than hard parses.

What I’m looking for is the extra work it takes for Oracle to find the right child cursor when there’s a very long chain of child cursors. From my memory of dumping the library cache in older versions of Oracle, the parent will point to a “segmented array” of pointers to child cursors, and each segment of the array will consist of 16 pointers, plus a pointer to the next segment. So if you have to find child cursor 8191 you will have to following 512 segment pointers, and 16 pointers per segment (totalling 8708 pointers) before you find the child you want – and you’re probably holding a mutex (or latch) while doing so.

One preipheral question to ask, of course, is whether Oracle keeps appending to the segmented array, or whether it uses a “pushdown” approach when allocating a new segment so that newer child cursors are near the start of the array. (i.e. will searching for child cursor 0 be the cheapest one or the most expensive one).

And the results, limited to just the second and third parts, with just a couple of small edits are as follows:


host sdiff -w 120 -s temp1.txt temp2.txt >temp.txt

===============                                            |    ================
Low child reuse                                            |    High child reuse
===============                                            |    ================

Interval:-  0 seconds                                      |    Interval:-  6 seconds

opened cursors cumulative                      2,084       |    opened cursors cumulative                      2,054
recursive calls                                6,263       |    recursive calls                                6,151
recursive cpu usage                               33       |    recursive cpu usage                              570
session logical reads                          1,069       |    session logical reads                          1,027
CPU used when call started                        33       |    CPU used when call started                       579
CPU used by this session                          37       |    CPU used by this session                         579
DB time                                           34       |    DB time                                          580
non-idle wait count                               16       |    non-idle wait count                                5
process last non-idle time                         1       |    process last non-idle time                         6
session pga memory                           524,288       |    session pga memory                            65,536
enqueue requests                                  10       |    enqueue requests                                   3
enqueue releases                                  10       |    enqueue releases                                   3
consistent gets                                1,069       |    consistent gets                                1,027
consistent gets from cache                     1,069       |    consistent gets from cache                     1,027
consistent gets pin                            1,039       |    consistent gets pin                            1,024
consistent gets pin (fastpath)                 1,039       |    consistent gets pin (fastpath)                 1,024
consistent gets examination                       30       |    consistent gets examination                        3
consistent gets examination (fastpath)            30       |    consistent gets examination (fastpath)             3
logical read bytes from cache              8,757,248       |    logical read bytes from cache              8,413,184
calls to kcmgcs                                    5       |    calls to kcmgcs                                    3
calls to get snapshot scn: kcmgss              1,056       |    calls to get snapshot scn: kcmgss              1,026
table fetch by rowid                              13       |    table fetch by rowid                               1
rows fetched via callback                          6       |    rows fetched via callback                          1
index fetch by key                                 9       |    index fetch by key                                 1
index scans kdiixs1                            1,032       |    index scans kdiixs1                            1,024
session cursor cache hits                         14       |    session cursor cache hits                          0
cursor authentications                         1,030       |    cursor authentications                         1,025
buffer is not pinned count                     1,066       |    buffer is not pinned count                     1,026
parse time cpu                                    23       |    parse time cpu                                   558
parse time elapsed                                29       |    parse time elapsed                               556
parse count (total)                            2,076       |    parse count (total)                            2,052
parse count (hard)                                11       |    parse count (hard)                                 3
execute count                                  1,050       |    execute count                                  1,028
bytes received via SQL*Net from client         1,484       |    bytes received via SQL*Net from client         1,486

Two important points to note:

  • the CPU utilisation goes up from 0.33 seconds to 5.7 seconds.
  • the number of hard parses is zero, this is all about searching for the

You might question are the 2,048-ish parse count(total) – don’t forget that we do an “execute immediate” to change the optimizer_index_cost_adj on each pass through the loop. That’s probably why we double the parse count, although the “alter session” doesn’t then report as an “execute count”.

The third call to a statement is often an important one – it’s often the first one that doesn’t need “cursor authentication”, so I ran a similar test executing the last two loops a second time – there was no significant change in the CPU or parse activity between the 2nd and 3rd executions of each cursor. For completeness I also ran a test with the loop for the last 1,024 child cursors ran before the loop for the first child cursors. Again this made no significant difference to the results – the low number child cursors take less CPU to find than the high number child cursors.

Bottom line

The longer the chain of child cursors the more time (elapsed and CPU) you spend searching for the correct child; and when a parent is allowed 8,192 child cursors the extra time can become significant. I would claim that the ca. 5 seconds difference in CPU time appearing in this test corresponds purely to an extra 5 milliseconds walking an extra 7,000 steps down the chain.

If you have a well-behaved application that uses the session cursor cache effectively, or uses “held cursors”, then you may not be worried by very long chains of child cursors. But I have seen many applications where cursor caching is not used and every statement execution from the client turns into a parse call (usually implicit) followed by a hunt through the library cache and walk along the child chain. These applications will not scale well if they are cloned to multiple PDBs sharing the same CDB.

Footnote 1

The odd thing about this “cursor obselete” feature is that I have a distinct memory that when  PDBs were introduced at an ACE Director’s meeting a few years ago the first thought that crossed my mind was about the potential for someone running multiple copies of the same application as separate PDBs seeing a lot of library cache latch contention or cursor mutex contention because any popular statement would now be hitting the same parent cursor from multiple PDBs. I think the casual (i.e. neither formal, nor official) response I got when I raised the point was that the calculation of the sql_id in future releases would take the con_id into consideration. It seems that that idea fell by the wayside.

Footnote 2

If you do see a large number of child cursors for a single parent then you will probably end up looking at v$sql_shared_cursor for the sql_id to see if that gives you some good ideas about why a particular statement has generated so many child cursors. For a list of explainations of the different reasons captured in this view MOS Doc Id  296377.1“Troubleshooting: High Version Count Issues” is a useful reference.

11 Comments »

  1. Jonathan,

    Thanks for this article which answers the same question I have about the reason to have increased the cursor obsolete parameter from 100 in 11gR2 to 1024 in 12cR1. But I have been working for several customers and I have met this situation only at one client site where an extremely executed query having 9 predicates and subject to ECS (Extended Cursor Sharing) has reached 100 cursors (0-99) as shown in the corresponding blog post:

    https://hourim.wordpress.com/2015/04/06/bind_equiv_failure-or-when-you-will-regret-using-adaptive-cursor-sharing/

    But I have rarely seen a similar case. So, I am wondering whether the old value 100 of this parameter was still already too high to reach in real life cases.

    Best regards
    Mohamed

    Comment by hourim — October 2, 2019 @ 4:16 pm BST Oct 2,2019 | Reply

    • Mohamed,

      I’ve come across several cases of statements running up to several hundred, and occasionally a couple of thousand, child cursors. Sometimes the underlying cause is RLS/VPD/FGAC, sometimes it’s multiple schemas with the same structure. It is fairly rare, though.

      Looking at the linked blog note you’ve got the version of Oracle where the default value of 100 may have caused the problem.

      Your 44,000 child cursors and 16M v$sql_cs_selectivity rows may have appeared because you were marking a parent as obsolete every 100 child cursors but failing to hide them properly – your report of v$sql_shared_cursor shows child number 40 several times with different addresses. It’s possible for example, that you could have survived on 250 child cursors and the introduction of the 100 limit kept you obsoleting parents because there was alway a 101st bind combination appearing.

      There were also some problems with obsolete chlid cursors not being cleared from the library cache in the earlier implementations. Some valid reasons, some odd behaviour that looked like bugs. It’s possible, as an example of a valid reason, for a front end connection to keep a child cursor open even when the parent has been made obsolete – so a leaking front end could stop Oracle from discarding obsolete child cursors.

      Comment by Jonathan Lewis — October 2, 2019 @ 4:52 pm BST Oct 2,2019 | Reply

  2. Every step of obtaining a child cursor requires the respective mutexes to be gotten to safeguard consistency.
    Once the processing is at the phase of inspecting the child cursor list in the parent’s heap 0, it will obtain a mutex in shared mode (to safeguard consistency), until it has found a child cursor that is consistent.

    Don’t forget that for finding a compatible child cursor it doesn’t just follow the pointer towards the child cursor, it also needs to validate if the child cursor is consistent. Once it deems the child cursor not to be consistent, it will obtain a pointer to the next child cursor from the list, and validate, until it found a compatible child cursor, or has run out of child cursors, which then means it needs to create a child cursor, which is what is considered hard parsing.

    The heap with the execution information (heap 6) for every child cursor can grow quite big (mostly bigger than the other allocations), and therefore is only kept pinned when it is in active use, even when the cursor is in the session cursor cache. This means that it could be subject to removal/cleanup if there is space pressure in the shared pool. This would be visible by spontaneous hard parsing again. It can be kept pinned using dbms_shared_pool.keep.

    Once a cursor is in the session cursor cache, a pointer to the child handle is stored in the session’s execution context, and thus doesn’t have to do all the work to find a valid child. A SQL cursor requires 4 or 3 executions to be considered for the session cursor cache in SQL mode, and one execution in PL/SQL mode to be considered. Both show as separate cached entries, although they can point to the same child cursor.

    Comment by Frits Hoogland — October 2, 2019 @ 5:25 pm BST Oct 2,2019 | Reply

    • Frits,

      Thanks for the comments. It”s nice to see such a compact description of what has to happen.

      Just one tiny point to clarify – I beleve you meant by “execution in PL/SQL” any embedded SQL, not SQL from “execute immediate”, or passed through dbm_sql.

      Is the “4 or 3 execution” a typo for “2 or 3 executions” ? I think I’ve always seen 3 which I’d label “load, authenticate, cache” – whith reduces to 2 (authenticate, cache) for a new session if a previous session has already loaded a matching child.

      Your comment may tempt someone to investigate the impact of
      a) more objects in the query.
      b) more bind variables in the query (for bind type/size checking)
      c) other

      I suspect (b) won’t make any difference because Oracle will be checking a hash value of the array of bind types. I’m going to guess that (a) will be handled the same way (hash value of an array of object_ids). But there are lots of reasons for mismatches in v$sql_shared_cursor – so however well the process is optimised and however the sequence of tests has been arranged to improve the risk of failing early there’s bound to be some feature of some SQL that means a lot of steps have to be taken for each child before it fails to match.

      Comment by Jonathan Lewis — October 3, 2019 @ 8:31 am BST Oct 3,2019 | Reply

      • – I am saying ‘finding a consistent child cursor’, that is not well worded, it should be: finding a child cursor that is compatible with the execution in the session.
        – execution in PL/SQL: yes, I wasn’t verbose enough, this is executing embedded or regular SQL in PL/SQL, not SQL execution via execute immediate.
        – 4 or 3 executions: yes, I was wrong here, indeed, it’s load, authenticate, cache or authenticate, cache if the child cursor already is loaded (it’s in the session cursor cache if you want to execute it a fourth time, which is how my thought process worked)
        – a) more objects in the query: for every child cursor creation, every dependent object for the cursor requires dictionary cache access (I think) which might require recursive execution, and access to the library cache representation of the dependent object (instance, schema, synonym, table, index, etc.) in which the child cursor inserts itself as a dependency for the object. This requires a mutex get (dependency mutex) of the library cache object. It doesn’t change the child cursor caching as far as I have seen; once the child cursor is created, it will be considered during trying to find a suitable, compatible child cursor, which has no direct relation to the number of dependencies of the child cursor. Of course with more dependencies, there is a higher chance of mismatches.
        – b) more bind variables: when bind variables are introduced, part of the child creation/hard parsing is moved to execution time. The parts that are moved are semantic checking and optimisation, the first part, syntax checking, is still done on parse time. Bind variables are one of the things that are considered during finding a compatible child cursor.

        I am not sure what you point at with the last paragraph. I do read it as if a session that wants to use an existing child cursor it needs to tests object dependencies (“mismatches in v$sql_shared_cursor”). I do not believe this is how that works. Because every library cache object has a list of objects that are dependent on it, any change or invalidation of a library cache object will be propagated to the dependent objects by the invalidator invalidating dependent objects, so users of an existing child cursor do not have to do that. So all a session has to do to use an existing child cursor, is to see if it’s compatible and not invalidated, it does not need to check anything for the cursor itself (like object dependencies and whether the dependencies are stil valid or syntax checking, semantic checking, optimisation).

        Comment by Frits Hoogland — October 3, 2019 @ 9:25 am BST Oct 3,2019 | Reply

        • Frits,

          I think I read the statement the way you meant it. It wasn’t until you pointed out the use of the word “consistent” that it occurred to me that it was open to misinterpretation

          Agree with your comments in (a) and (b) about the way bits and pieces are accessed and hang together in the child cursor. The point I was trying to make was that Oracle probably creates a number of checksum values as it’s doing that work and stores the results somewhere in the top level memory of the child cursor so that it can minimise the work needed when checking future matching parse calls.

          E.g. If an existing child cursor has (say) 25 object dependencies I would NOT expect a new parse call for the statement to say, as it walked the list of child cursors, for each child cursor in turn “I will check each of your 25 object dependencies separately and see if they match my 25 object dependencies”, I would expect a checksum to have been calculated for both the existing and the new and the code to say “If your stored checksum DOESN’T match my newly calculated checksum I move on to the next child, if your checksum does match my checksum I will check specific dependencies in detail.” (Though it seems likely that there’s a short list of checksums for different aspects of the cursor and Oracle ensures all the checksum match before checking any details; and it’s also likely that there are some things that can be checked very cheaply anyway and don’t need a checksum mechanism to reduce the workload)

          UPDATE – I’ve just realised there’s an example of what I mean built into v$sql. v$sql_optimizer_env reports as a fairly large list of values for an SQL_ID and child number; but that’s an expansion of the very dense column v$sql.optimizer_env which is declared as raw(2000) – but there’s also a colukmn v$sql.optimizer_env_hash_value which is reported as type number. So any check for a compatible child cursor would probably check the optimizer_env_hash_value and only check optimizer_env if the hash value resulted in a match.

          Comment by Jonathan Lewis — October 3, 2019 @ 11:37 am BST Oct 3,2019

        • “E.g. If an existing child cursor has (say) 25 object dependencies I would NOT expect a new parse call for the statement to say, as it walked the list of child cursors, for each child cursor in turn “I will check each of your 25 object …”
          I am not sure we are aligned here.

          For a SQL cursor, the SQL text is hashed, and the hashed value leads to the bucket where the parent handle of namespace SQL AREA, type CURSOR is (going to be) located. Always. Even in the case of Markhot (in which case the parent handle is marked with the flag ‘HOC’, which then leads to a rehash to find the alternative parent handle).

          The first thing that happens once the parent handle is found, is the child list in heap 0 is used, and for each child, the compatibility for the current execution is checked, until a child is found that is compatible. If a compatible child is found, no validation of dependencies is necessary, because an in-tact version of a child cursor is found, and thus that work is not needed.

          If no compatible child is found, a hard parse follows to create an additional child. I have seen no indication that any specific optimisations take place here, but I wouldn’t rule it out. Based on what I seen, it will just do a regular hard parse, meaning it will syntax check and, semantically check the cursor, then add itself as a dependency to all objects it is dependent on, and generate an optimised version of the execution plan.

          Comment by Frits Hoogland — October 3, 2019 @ 12:27 pm BST Oct 3,2019

        • Frits,

          I think we’re focusing on different things here.

          I’m talking about the things Oracle might have to do 8,192 times as it walks down a chain of child cursors, and the ways in which it might minimise the work done for each of those 8192 steps.

          Comment by Jonathan Lewis — October 3, 2019 @ 12:36 pm BST Oct 3,2019

  3. I first saw this issue on a 12.2.0.1 upgrade, but it was when “cursor: mutex X” wait events became the top wait event for an ETL or report batch run. The solution in the note was the same, drop the parameter down to 1024.

    Oracle Support Document 2298504.1 (Cursor Mutex X Wait Events: After Upgrading To 12.2) can be found at:
    https://support.oracle.com/epmos/faces/DocumentDisplay?id=2298504.1

    i was surprised at how “exponential” it was — on 12.1 and after dropping the parameter down, that wait event was nowhere near 1/8th of what it was — it almost disappeared completely.

    Comment by Bob Bryla — October 2, 2019 @ 9:19 pm BST Oct 2,2019 | Reply

    • Bob,

      Thanks for the comment.

      Frits’ comment above helps to explain the “exponential” growth of the problem.

      If there are circumstances that result in a large number of child cursors then that probably means there are a lot of sessions wanting to find a particular child – and occasionally creating a new one.

      To create a new child cursor you have to acquire the mutex protecting the child chain in exclusive mode (I think it’s the mutex labelled “Mutex” in the library cache dump). But lots of other users may be holding it in shared mode; and the more interested users you have the longer the chain is likely to grow over time, so (a) you’re more likely to have to wait for the mutex, and (b) the longer it will be held – in share mode – by another session. So it’s an n-squared type of problem, (Okay, so it’s probably “geometric” rather than “exponential” – but both words mean “slow response time” in the vocabulary that matters to the end users.)

      Apart from reducing the _cursor_obsolete_threshold another option (for systems with larger numbers of CPUs) may be to mark individual statements as ‘HOT’ to spread the interest across several library cache hash buckets by introducing another factor into the calculation of the SQL_ID.

      Comment by Jonathan Lewis — October 3, 2019 @ 8:43 am BST Oct 3,2019 | Reply

      • Thanks, I learn something new every day. In fact, several new things. We don’t have that unusual of a workload for ETL, and maybe a few PDBs might be active at a time with some overlapping DML that would parse to the same SQL_ID — I would think that this would be one of the basic scenarios Oracle Dev would use in regression testing.

        Comment by Bob Bryla — October 4, 2019 @ 2:37 am BST Oct 4,2019 | Reply


RSS feed for comments on this post. TrackBack URI

Comments and related questions are welcome.

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Powered by WordPress.com.