Oracle Scratchpad

October 7, 2019

Resumable

Filed under: Infrastructure,Oracle,Tuning — Jonathan Lewis @ 10:31 am BST Oct 7,2019

There are two questions about temporary space that appear fairly regularly on the various Oracle forums. One is of the form:

From time to time my temporary tablespace grows enormously (and has to be shrunk), how do I find what’s making this happen?

The other follows the more basic pattern:

My process sometimes crashes with Oracle error: “ORA-01652: unable to extend temp segment by %n in tablespace %s” how do I stop this happening?

Before moving on to the topic of the blog, it’s worth pointing out two things about the second question:

  • First, it’s too easy to get stuck at the word temp and leap to the conclusion that the problem is about the temporary tablespace without noticing that the error message includes the specific tablespace that’s raised the problem. If, for example, you rebuild an index in a nominated tablespace Oracle first creates the index as a temporary segment (with a name like {starting_file_number}.{starting_block_number}) in that tablespace then renames it to match the original index name once the rebuild is complete and drops the old index.
  • Secondly a process that raises ORA-01652 isn’t necessarily the guilty party – it may be the victim of some other process hogging all the available space when it shouldn’t. Moreover that other process may have completed and released its space by the time you start looking for the problem – causing extra confusion because your process seems to have crashed without a cause. Taking my example of an index rebuild – your index rebuild may fail because someone else was rebuilding a different index at the same time in the same tablespace; but when you check the tablespace all the space from their original index is now free as their rebuild completed in the interim.

So, before you start chasing something that you think is a problem with your code, pause a moment to double-check the error message and think about whether you could have been the victim of some concurrent, but now complete, activity.

I’ve listed the two questions as variants on the same theme because the workaround to one of them introduces the risk of the other – if you want to avoid ORA-01652 you could make all your data files and temp files “autoextensible”, but then there may be occasions when they extend far too much and you need to shrink them down again (and that’s not necessarily easy if it’s not the temporary tablespace). Conversely, if you think your data or temp files randomly explode to ludicrous sizes you could decide on a maximum size for your files and disable autoextension – then handle the complaints when a user reports an ORA-01652.

There are various ways you could monitor your system in near real time to spot the threat as it builds, of course; and there are various ways to identify potentially guilty SQL after the event. You could keep an eye on various v$ dynamic performance views or dba_ administrative views to try and intercept a problem; you could set event 1652 to dump an errorstack (or even systemstate) for post-crash analysis to see what that reported. Neither is an ideal solution – one requires you to pay excessive attention to the system, the other is designed to let the problem happen then leave you to clean up afterwards.  There is, however, a strategy that may stop the problem from appearing without requiring constant monitoring. The strategy is to enable (selectively) resumable operations.

If a resumable operation needs to allocate space but is unable to do so – i.e. it would normally be about to raise ORA-01652 – it will suspend itself for a while going into the wait state “statement suspended, wait error to be cleared” which will show up as the event in v$session_wait, timing out every 2 seconds The session will also be reporting its current action in the view v$resumable or, for slightly more information, dba_resumable. As it suspends the session will also write a message to the alert log but you can also create an “after suspend” database trigger to alert you that a problem has occurred.

If you set the resumable timeout to a suitable value then you may find:

  • the problem goes away of its own accord and the session resumes before the timeout is reached

or

  • you receive a warning and have some time to identify the source of the problem and take the minimum action needed to allow the session to resume

Implementation

The parameter resumable_timeout is a general control for resumable sessions if you don’t handle the feature at a more granular level than the system.

By default this parameter is set to zero which translates into a default value of 7,200 seconds but that default doesn’t come into effect unless a session declares itself resumable. If you set the parameter to a non-zero value all session will automatically be operating as resumable sessions – and you’ll soon hear why you don’t want to do that.

The second enabling feature for resumable sessions is the resumable privilege – a session can’t control it’s own resumability unless the schema has been granted the resumable privilege – which may be granted through a role. If a session has the privilege it may set its own resumable_timeout, even if the system value is zero.

Assume we have set resumable_timeout to 10 (seconds) through the instance parameter file and restarted the instance. If we now issue (for example) the following ‘create table’ statement:


create table t1 (n1, v1 ) 
pctfree 90 pctused 10
tablespace tiny
as
select 
        rownum, cast(lpad('x',800) as varchar2(1000))
from    all_objects
where   rownum <= 20000
/

This will attempt to allocate 1 row per block for 20,000 blocks (plus about 1.5% for bitmap space management blocks) – and tablespace tiny lives up (or down) to its name, consisting of a single file of only 10,000 Oracle blocks. Shortly after starting, the session will hit Oracle error “ORA-01652: unable to extend temp segment by 128 in tablespace TINY”, but it won’t report it; instead it will suspend itself for 10 seconds before failing and reporting the error. This will happen whether or not the session has the resumable privilege – in this case the behaviour is dictated by our setting the system parameter. If you look in the alert log after the session finally errors out you will find text like the following:

2019-10-04T14:01:11.847943+01:00
ORCL(3):ORA-1652: unable to extend temp segment by 128 in tablespace TINY [ORCL] 
ORCL(3):statement in resumable session 'User TEST_USER(138), Session 373, Instance 1' was suspended due to
ORCL(3):    ORA-01652: unable to extend temp segment by 128 in tablespace TINY
2019-10-04T14:01:23.957586+01:00
ORCL(3):statement in resumable session 'User TEST_USER(138), Session 373, Instance 1' was timed out

Note that there’s a 10 (plus a couple) second gap between the point where the session reports that it is suspending itself and the point where it fails with a timeout. The two-extra seconds appear because the session polls every 2 seconds to see whether the problem is still present or whether it has spontaneously disappeared so allowing the session to resume.

Let’s change the game slightly; let’s try to create the table again, but this time execute the following statement first:

alter session enable resumable timeout 60 name 'Help I''m stuck';

The initial response to this will be Oracle error “ORA-01031: insufficient privileges” because the session doesn’t have the resumable privilege, but after granting resumable to the user (or a relevant role) we try again and find we will be allowed a little extra time before the CTAS times out. Our session now overrides the system timeout and will wait 60 seconds (plus a bit) before failing.The “timeout” clause is optional and if we omit it the session will use the system value, similarly the “name” clause is optional though there’s no default for it, it’s just a message that will get into various views and reports.

There are several things you might check in this 60 second grace period. The session wait history will confirm that your session has been timing out every two seconds (as will the active session history if you’re licensed to use it):


select seq#, event, wait_time from v$session_wait_history where sid = 373

      SEQ# EVENT							     WAIT_TIME
---------- ---------------------------------------------------------------- ----------
	 1 statement suspended, wait error to be cleared			   204
	 2 statement suspended, wait error to be cleared			   201
	 3 statement suspended, wait error to be cleared			   201
	 4 statement suspended, wait error to be cleared			   201
	 5 statement suspended, wait error to be cleared			   200
	 6 statement suspended, wait error to be cleared			   200
	 7 statement suspended, wait error to be cleared			   202
	 8 statement suspended, wait error to be cleared			   200
	 9 statement suspended, wait error to be cleared			   200
	10 statement suspended, wait error to be cleared			   200

Then there’s a special dynamic performance view, v$resumable which I’ve reported below using a print_table() procedure that Tom Kyte wrote many, many years ago to report rows in a column format:

SQL> set serveroutput on
SQL> execute print_table('select * from v$resumable where sid = 373')

ADDR                          : 0000000074515B10
SID                           : 373
ENABLED                       : YES
STATUS                        : SUSPENDED
TIMEOUT                       : 60
SUSPEND_TIME                  : 10/04/19 14:26:20
RESUME_TIME                   :
NAME                          : Help I'm stuck
ERROR_NUMBER                  : 1652
ERROR_PARAMETER1              : 128
ERROR_PARAMETER2              : TINY
ERROR_PARAMETER3              :
ERROR_PARAMETER4              :
ERROR_PARAMETER5              :
ERROR_MSG                     : ORA-01652: unable to extend temp segment by 128 in tablespace TINY
CON_ID                        : 0
-----------------
1 rows selected

Notice how the name column reports the name I supplied when I enabled the resumable session. The view also tells us when the critical statement was suspended and how long it is prepared to wait (in total) – leaving us to work out from the current time how much time we have left to work around the problem.

There’s also a dba_resumable variant of the view which is slightly more informative (though the sample below is not consistent with the one above because I ran the CTAS several times, editing the blog as I did so):

SQL> execute print_table('select * from dba_resumable where session_id = 373')

USER_ID                       : 138
SESSION_ID                    : 373
INSTANCE_ID                   : 1
COORD_INSTANCE_ID             :
COORD_SESSION_ID              :
STATUS                        : SUSPENDED
TIMEOUT                       : 60
START_TIME                    : 10/04/19 14:21:14
SUSPEND_TIME                  : 10/04/19 14:21:16
RESUME_TIME                   :
NAME                          : Help I'm stuck
SQL_TEXT                      : create table t1 (n1, v1 ) pctfree 90 pctused 10 tablespace tiny as  select rownum, 
                                cast(lpad('x',800) as varchar2(1000)) from all_objects where rownum <= 20000
ERROR_NUMBER                  : 1652
ERROR_PARAMETER1              : 128
ERROR_PARAMETER2              : TINY
ERROR_PARAMETER3              :
ERROR_PARAMETER4              :
ERROR_PARAMETER5              :
ERROR_MSG                     : ORA-01652: unable to extend temp segment by 128 in tablespace TINY
-----------------
1 rows selected

This view includes the text of the statement that has been suspended and shows us when it started running (so that we can decide whether we really want to rescue it, or might be happy to kill it to allow some other suspended session to resume).

If you look at the alert log in this case you’ll see that the name has been reported there instead of the user, session and instance – which means you might want to think carefully about how you use the name option:


2019-10-04T14:21:16.151839+01:00
ORCL(3):statement in resumable session 'Help I'm stuck' was suspended due to
ORCL(3):    ORA-01652: unable to extend temp segment by 128 in tablespace TINY
2019-10-04T14:22:18.655808+01:00
ORCL(3):statement in resumable session 'Help I'm stuck' was timed out

Once your resumable task has completed (or timed out and failed) you can stop the session from being resumable with the command:

alter session disable resumable;

And it’s important that every time you enable resumability you should disable it as soon as the capability is no longer needed. Also, be careful about when you enable it, don’t be tempted to make every session resumable. Use it only for really important cases. Once a session is resumable virtually everything that goes on in that session is deemed to be resumable, and this has side effects.

The first side effect that may spring to mind is the impact of the view v$resumable – it’s a memory structure in the SGA so that everyone can see it and all the resumable sessions can populate and update it. That means there’s got to be some latch (or mutex) protection going on – and if you look at v$latch you’ll discover that there;s just a single (child) latch doing the job, so resumability can introduce a point of contention. Here’s a simple script (using my “start_XXX” strategy to “select 1 from dual;” one thousand times, with calls to check the latch activity:

set termout off
set serveroutput off
execute snap_latch.start_snap

@start_1000

set termout on
set serveroutput on
execute snap_latch.end_snap(750)

And here are the results of running the script – reporting only the latches with more than 750 gets in the interval – first without and then with a resumable session:

---------------------------------
Latch waits:-   04-Oct 15:04:31
Lower limit:-  750
---------------------------------
Latch                              Gets      Misses     Sp_Get     Sleeps     Im_Gets   Im_Miss Holding Woken Time ms
-----                              ----      ------     ------     ------     -------   ------- ------- ----- -------
session idle bit                  6,011           0          0          0           0         0       0     0      .0
enqueue hash chains               2,453           0          0          0           0         0       0     0      .0
enqueue freelist latch                1           0          0          0       2,420         0       0     0      .0
JS queue state obj latch          1,176           0          0          0           0         0       0     0      .0

SQL> alter session enable resumable;

SQL> @test
---------------------------------
Latch waits:-   04-Oct 15:04:46
Lower limit:-  750
---------------------------------
Latch                              Gets      Misses     Sp_Get     Sleeps     Im_Gets   Im_Miss Holding Woken Time ms
-----                              ----      ------     ------     ------     -------   ------- ------- ----- -------
session idle bit                  6,011           0          0          0           0         0       0     0      .0
enqueue hash chains               2,623           0          0          0           0         0       0     0      .0
enqueue freelist latch                1           0          0          0       2,588         0       0     0      .0
resumable state object            3,005           0          0          0           0         0       0     0      .0
JS queue state obj latch          1,260           0          0          0           0         0       0     0      .0

PL/SQL procedure successfully completed.

SQL> alter session disable resumable;

That’s 1,000 selects from dual – 3,000 latch gets on a single child latch. It looks like every call to the database results in a latch get and an update to the memory structure. (Note: You wouldn’t see the same effect if you ran a loop inside an anonymous PL/SQL block since the block would be the single database call).

For other side effects with resumability think about what else is going on around your session. If you allow a session to suspend for (say) 3600 seconds and it manages to resume just in time to avoid a timeout it now has 3,600 seconds of database changes to unwind if it’s trying to produce a read-consistent result; so not only do you have to allow for increasing the size of the undo tablespace and increasing the undo retention time, you have to allow for the fact that when the process resumes it may run much more slowly than usual because it spends more of its time trying to see the data as it was before it suspended, which may require far more single block reads of the undo tablespace – and the session may then crash anyway with an Oracle error ORA-01555 (which is so well-known that I won’t quote the text).

In the same vein – if a process acquires a huge amount of space in the temporary tablespace (in particular) and fails instantly because it can’t get any more space it normally crashes and releases the space. If you allow that process to suspend for an hour it’s going to hold onto that space – which means other processes that used to run safely may now crash because they find there’s no free space left for them in the temporary tablespace.

Be very cautious when you introduce resumable sessions – you need to understand the global impact, not just the potential benefit to your session.

Getting Alerts

Apart from the (passive) views telling you that a session has suspended it’s also possible to get some form of (active) alert when the event happens. There’s an “after suspend” event that you can use to create a database trigger to take some defensive action, e.g.:

create or replace trigger call_for_help
after suspend
on test_user.schema
begin
        if sysdate between trunc(sysdate) and trunc(sysdate) + 3/24 then
                null;
                -- use utl_mail, utl_smtp et. al. to page the DBA
        end if;
end;
/

This trigger is restricted to the test_user schema, and (code not included) sends a message to the DBA’s pager only between the hours of midnight and 3:00 a.m. Apart from the usual functions in dbms_standard that returnn error codes, names of objects and so on you might want to take a look at the dbms_resumable package for the “helper” functions and procedures it supplies.

For further information on resumable sessions here’s a link to the 12.2 manual to get you started.

September 16, 2019

Updatable Join Views

Filed under: Indexing,Oracle,Tuning — Jonathan Lewis @ 12:51 pm BST Sep 16,2019

Here’s a quick “how to”.

If you want to update a column in table A with a column value from table B, then there’s a simple way to check if the required result can be achieved through an updatable join view.

Step 1: write a query that joins table A to table B and reports the rows in table A that you want to update, with the value from table B that should be used to update them, e.g.


select  a.rowid, a.col1, b.col2 
from    
        tableA a,
        tableB b
where
        a.status = 'Needs Update'
and     b.colX   = a.colX
and     b.colY   = a.colY
and     b.colZ   = a.colZ
/

Step 2: If there is a uniqueness constraint (or suitable index) on table B (the table from which you are copying a value) that enforces the restriction that there should be at most one row in B for any combination of the join columns (colX, colY, colZ) then you can take this query, and turn it into an inline-view in an update statement:


update (
        select a.rowid, a.col1, b.col2 
        from    
                tableA a,
                tableB b
        where
                a.status = 'Needs Update'
        and     b.colX   = a.colX
        and     b.colY   = a.colY
        and     b.colZ   = a.colZ
)  v
set     v.col1 = v.col2
/

If there is nothing enforcing the uniqueness of (colX, colY, colZ) this statement will result in Oracle raising error ORA-01779 “cannot modify a column which maps to a non key-preserved table”. This error will appear even if there are currently no actual duplicates in table B that could cause a problem.

Footnote

This example ignores the extra bit of mess that is needed to deal with the case where B rows are supposed to match A rows when the columns in the join predicates can be null; but that just means your original query will probably have to include some predicates like (b.colX = a.colX or (a.colX is null and b.colX is null)) or make use of the sys_op_map_nonnull() function.

 

August 26, 2019

Troubleshooting

Filed under: CBO,Oracle,Troubleshooting,Tuning — Jonathan Lewis @ 12:19 pm BST Aug 26,2019

A recent thread on the Oracle Developer Community starts with the statement that a query is taking a very long time (with the question “how do I make it go faster?” implied rather than asked). It’s 12.1.0.2 (not that that’s particularly relevant to this blog note), and we have been given a number that quantifies “very long time” (again not particularly relevant to this blog note – but worth mentioning because your “slow” might be my “wow! that was fast” and far too many people use qualitative adjectives when the important detail is quantative). The query had already been running for 15 hours – and here it is:


SELECT 
        OWNER, TABLE_NAME 
FROM
        DBA_LOGSTDBY_NOT_UNIQUE 
WHERE
        (OWNER, TABLE_NAME) NOT IN (
                SELECT 
                        DISTINCT OWNER, TABLE_NAME 
                        FROM     DBA_LOGSTDBY_UNSUPPORTED
        ) 
AND     BAD_COLUMN = 'Y'

There are many obvious suggestions anyone could make for things to do to investigate the problem – start with the execution plan, check whether the object statistics are reasonably representative, run a trace with wait state tracing enabled to see where the time goes; but sometimes that are a couple of very simple observation you can make that point you to simple solutions.

Looking at this query we can recognise that it’s (almost certainly) about a couple of Oracle data dictionary views (which means it’s probably very messy under the covers with a horrendous execution plan) and, as I’ve commented from time to time in the past, Oracle Corp. developers create views for their own purposes so you should take great care when you re-purpose them. This query also has the very convenient feature that it looks like two simpler queries stitched together – so a very simple step in trouble-shooting, before going into any fine detail, is to unstitch the query and run the two parts separately to see how much data they return and how long they take to complete:


SELECT OWNER, TABLE_NAME FROM DBA_LOGSTDBY_NOT_UNIQUE WHERE BAD_COLUMN = 'Y'

SELECT DISTINCT OWNER, TABLE_NAME FROM DBA_LOGSTDBY_UNSUPPORTED

It’s quite possble that the worst case scenario for the total run time of the original query could be reduced to the sum of the run time of these two queries. One strategy to achieve this would be a rewrite of the form:

select  * 
from    (
        SELECT OWNER, TABLE_NAME FROM DBA_LOGSTDBY_NOT_UNIQUE WHERE BAD_COLUMN = 'Y'
        minus
        SELECT DISTINCT OWNER, TABLE_NAME FROM DBA_LOGSTDBY_UNSUPPORTED
)

Unfortunately the immediately obvious alternative may be illegal thanks to things like duplicates (which disappear in MINUS operations) or NULLs (which can make ALL the data “disappear” in some cases). In this case the original query might be capable of returning duplicates of (owner, table_name) from dba_lgstdby_not_unique which would collapse to a single ocurrence each in my rewrite – so my version of the query is not logically equivalent (unless the definition of the view enforces uniqueness); on the other hand tracking, back through the original thread to the MoS article where this query comes from, we can see that even if the query could return duplicates we don’t actually need to see them.

And this is the point of the blog note – it’s a general principle (that happens to be a very obvious strategy in this case): if a query takes too long, how does it compare with a simplified version of the query that might be a couple of steps short of the final target. If it’s easy to spot the options for simplification, and if the simplified version operates efficiently, them isolate it (using a no_merge hint if necessary), and work forwards from there. Just be careful that your rewrite remains logically equivalent to the original (if it really needs to).

In the case of this query, the two parts took 5 seconds and 9 seconds to complete, returning 209 rows and 815 rows respectively. Combining the two queries with a minus really should get the required result in no more than 14 seconds.

Footnote

The “distinct” in the second query is technically redundant as the minus operation applies a sort unique operation to both the two intermediate result sets before comparing them.  Similarly the  “distinct” was also redundant when the second query was used for the “in subquery” construction – again there would be an implied uniqueness operation if the optimizer decided to do a simple unnest of the subquery.

 

 

 

 

August 21, 2019

sql_patch

Filed under: Infrastructure,Oracle,Tuning — Jonathan Lewis @ 4:49 pm BST Aug 21,2019

This note is a short follow-up to a note I wrote some time ago about validating foreign key constraints where I examined the type of SQL Oracle generates internally to do the validation between parent and child tables.  In that article I suggested (before testing) that you could create an SQL patch for the generated SQL to over-ride the plan taken by Oracle – a plan dictated to some extent by hints (including a “deprecated” ordered hint) embedded in the code. I did say that the strategy might not work for SQL optimised by SYS, but it turned out that it did.

Here’s a little script I ran to test a few variations on the theme:


declare
        v1      varchar2(128);
begin
        v1 :=   dbms_sqldiag.create_sql_patch(
                        sql_id  => 'g2z10tbxyz6b0',
                        name    => 'validate_fk',
                        hint_text => 'ignore_optim_embedded_hints'
--                      hint_text => 'parallel(a@sel$1 8)'      -- worked
--                      hint_text => 'parallel(8)'              -- worked
--                      hint_text => q'{opt_param('_fast_full_scan_enabled' 'false')}'  -- worked
                );
        dbms_output.put_line(v1);
end;
/

I’ve tested this on 12.2.0.1 and 19.3.0.0, but for earlier versions of Oracle, and depending what patches you’ve applied, you will need to modify the code.

The SQL_ID represents the query for my specific tables, of course, so you will have to do a test run to find the query and SQL_ID for the validation you want to do. This is what the statement for my parent/child pair looked like (cosmetically adjusted):

select /*+ all_rows ordered dynamic_sampling(2) */ 
        A.rowid, :1, :2, :3
from
        "TEST_USER"."CHILD" A , 
        "TEST_USER"."PARENT" B 
where
        ("A"."OBJECT_ID" is not null) 
and     ("B"."OBJECT_ID"(+) = "A"."OBJECT_ID")
and     ("B"."OBJECT_ID" is null)

The patch that the script creates simply tells Oracle to ignore the embedded hints (in particular I don’t want that ordered hint), but I’ve left a few other options in the text, commenting them out.

Without the patch I got the following plan:.


Rows (1st) Rows (avg) Rows (max)  Row Source Operation
---------- ---------- ----------  ---------------------------------------------------
         0          0          0  NESTED LOOPS ANTI (cr=399 pr=279 pw=0 time=47801 us starts=1 cost=70 size=22000 card=1000)
    100000     100000     100000   INDEX FAST FULL SCAN CHI_FK_PAR (cr=250 pr=247 pw=0 time=19943 us starts=1 cost=32 size=1700000 card=100000)(object id 73191)
     10000      10000      10000   INDEX UNIQUE SCAN PAR_PK (cr=149 pr=32 pw=0 time=3968 us starts=10000 cost=0 size=49995 card=9999)(object id 73189)

Rerunning the validation test with the patch in place I got the following plan – clearly the patch had had an effect.

Rows (1st) Rows (avg) Rows (max)  Row Source Operation
---------- ---------- ----------  ---------------------------------------------------
         0          0          0  HASH JOIN RIGHT ANTI (cr=246 pr=242 pw=0 time=96212 us starts=1 cost=39 size=22000 card=1000)
     10000      10000      10000   INDEX FAST FULL SCAN PAR_PK (cr=24 pr=23 pw=0 time=1599 us starts=1 cost=4 size=50000 card=10000)(object id 73235)
    100000     100000     100000   INDEX FAST FULL SCAN CHI_FK_PAR (cr=222 pr=219 pw=0 time=27553 us starts=1 cost=32 size=1700000 card=100000)(object id 73237)
(object id 73229)

Don’t worry too much about the fact that in my tiny example, and with a very new, nicely structured, data set the original plan was a little faster. In a production environment creating a hash table from the parent keys and probing it with the child keys may reduce the CPU usage and random I/O quite dramatically.

Bear in mind that the best possible plan may depend on many factors, such as the number of child rows per parent, the degree to which the parent and child keys arrive in sorted (or random) order, and then you have to remember that Oracle gets a little clever with the original anti-join (note that there are only 10,000 probes for 100,000 child rows – there’s an effect similar to the scalar subquery caching going on there), so trying to patch the plan the same way for every parent/child pair may not be the best strategy.

If you want to drop the patch after playing around with this example a call to execute dbms_sqldiag.drop_sql_patch(name=>’validate_fk’) will suffice.

 

August 9, 2019

Split Partition

Filed under: Oracle,Partitioning,Performance,Tuning — Jonathan Lewis @ 1:02 pm BST Aug 9,2019

This is a little case study on “pre-emptive trouble-shooting”, based on a recent question on the ODC database forum asking about splitting a range-based partition into two at a value above the current highest value recorded in a max_value partition.

The general principle for splitting (range-based) partitions is that if the split point is above the current high value Oracle will recognise that it can simply rename the existing partition and create a new, empty partition, leaving all the indexes (including the global and globally partitioned indexes) in a valid state. There are, however, three little wrinkles to this particular request:

  • first is that the question relates to a system running 10g
  • second is that there is a LOB column in the table
  • third is that the target is to have the new (higher value) partition(s) in a different tablespace

It’s quite possible that 10g won’t have all the capabilities of partition maintenance of newer versions, and if anything is going to go wrong LOBs are always a highly dependable point of failure, and since all the examples in the manuals tend to be very simple examples maybe any attempt to introduce complications like tablespace specification will cause problems.

So, before you risk doing the job in production, what are you going to test?

In Oracle terms we want to check the following

  • Will Oracle have silently copied/rebuilt some segments rather than simply renaming old segments and creating new, empty segments.
  • Will the segments end up where we want them
  • Will all the indexes stay valid

To get things going, the OP had supplied a framework for the table and told us about two indexes, and had then given us two possible SQL statements to do the split, stating they he (or she) had tested them and they both worked. Here’s the SQL (with a few tweaks) that creates the table and indexes. I’ve also added some data – inserting one row into each partition.

rem
rem     Script:         split_pt_lob.sql
rem     Author:         Jonathan Lewis
rem     Dated:          July 2019
rem
rem     Last tested 
rem             12.2.0.1
rem             10.2.0.5
rem

define m_old_ts = 'test_8k'
define m_new_ts = 'assm_2'

drop table part_tab purge;

create table part_tab(
  pt_id            NUMBER,
  pt_name          VARCHAR2(30),
  pt_date          DATE default SYSDATE,
  pt_lob           CLOB,
  pt_status        VARCHAR2(2)
)
tablespace &m_old_ts
lob(pt_lob) store as (tablespace &m_old_ts)
partition by range (pt_date)
(
  partition PRT1 values less than (TO_DATE('2012-01-01', 'YYYY-MM-DD')),
  partition PRT2 values less than (TO_DATE('2014-09-01', 'YYYY-MM-DD')),
  partition PRT_MAX values less than (MAXVALUE)
)
/

alter table part_tab
add constraint pt_pk primary key(pt_id)
/

create index pt_i1 on part_tab(pt_date, pt_name) local
/

insert into part_tab(
    pt_id, pt_name, pt_date, pt_lob, pt_status
)
values(
    1,'one',to_date('01-Jan-2011'),rpad('x',4000),'X'
)
/

insert into part_tab(
    pt_id, pt_name, pt_date, pt_lob, pt_status
)
values(
    2,'two',to_date('01-Jan-2013'),rpad('x',4000),'X'
)cascade=>trueee
/

insert into part_tab(
    pt_id, pt_name, pt_date, pt_lob, pt_status
)
values(
    3,'three',to_date('01-Jan-2015'),rpad('x',4000),'X'
)
/

commit;

execute dbms_stats.gather_table_stats(null,'part_tab',cascade=>true,granularity=>'ALL')

We were told that

The table has
– Primary Key on pt_id column with unique index (1 Different table has FK constraint that refers to this PK)
– Composite index on pt_date and pt_name columns

This is why I’ve added a primary key constraint (which will generate a global index) and created an index on (pt_date,pt_name) – which I’ve created as a local index since it contains the partitioning column.

The description of the requirement was:

  • The Task is to split partition(PRT_MAX) to a different tablespace
  • New partition(s) won’t have data at the moment of creation

And the two “tested” strategies were:

alter table part_tab split partition PRT_MAX at(TO_DATE('2019-08-01', 'YYYY-MM-DD')) into (
        PARTITION PRT3    tablespace &m_old_ts,
        PARTITION PRT_MAX tablespace &m_new_ts
);

alter table part_tab split partition PRT_MAX at(TO_DATE('2019-08-01', 'YYYY-MM-DD')) into (
        PARTITION PRT3    tablespace &m_old_ts LOB (pt_lob) store as (TABLESPACE &m_old_ts), 
        PARTITION PRT_MAX tablespace &m_new_ts LOB (pt_lob) store as (TABLESPACE &m_new_ts)
)
;
 

If we’re going to test these strategies properly we will need queries similar to the following:


break on object_name skip 1
select object_name, subobject_name, object_id, data_object_id  from user_objects order by object_name, subobject_name;

break on index_name skip 1
select index_name, status from user_indexes;
select index_name, partition_name, status from user_ind_partitions order by index_name, partition_name;

break on segment_name skip 1
select segment_name, partition_name, tablespace_name from user_segments order by segment_name, partition_name;

First – what are the object_id and data_object_id for each object before and after the split. Have we created new “data objects” while splitting, or has an existing data (physical) object simply changed its name.

Secondly – are there any indexes or index partitions that are no longer valid

Finally – which tablespaces do physical objects reside in.

On a test run of the first, simpler, split statement here are the before and after results for the object_id and data_object_id, followed by the post-split results for index and segment details:


Before Split
============

OBJECT_NAME                      SUBOBJECT_NAME          OBJECT_ID DATA_OBJECT_ID
-------------------------------- ---------------------- ---------- --------------
PART_TAB                         PRT1                        23677          23677
                                 PRT2                        23678          23678
                                 PRT_MAX                     23679          23679
                                                             23676

PT_I1                            PRT1                        23690          23690
                                 PRT2                        23691          23691
                                 PRT_MAX                     23692          23692
                                                             23689

PT_PK                                                        23688          23688

SYS_IL0000023676C00004$$         SYS_IL_P252                 23685          23685
                                 SYS_IL_P253                 23686          23686
                                 SYS_IL_P254                 23687          23687

SYS_LOB0000023676C00004$$        SYS_LOB_P249                23681          23681
                                 SYS_LOB_P250                23682          23682
                                 SYS_LOB_P251                23683          23683
                                                             23680          23680

After split
===========

OBJECT_NAME                      SUBOBJECT_NAME          OBJECT_ID DATA_OBJECT_ID
-------------------------------- ---------------------- ---------- --------------
PART_TAB                         PRT1                        23677          23677
                                 PRT2                        23678          23678
                                 PRT3                        23693          23679
                                 PRT_MAX                     23679          23694
                                                             23676

PT_I1                            PRT1                        23690          23690
                                 PRT2                        23691          23691
                                 PRT3                        23700          23692
                                 PRT_MAX                     23699          23699
                                                             23689

PT_PK                                                        23688          23688

SYS_IL0000023676C00004$$         SYS_IL_P252                 23685          23685
                                 SYS_IL_P253                 23686          23686
                                 SYS_IL_P257                 23697          23687
                                 SYS_IL_P258                 23698          23698

SYS_LOB0000023676C00004$$        SYS_LOB_P249                23681          23681
                                 SYS_LOB_P250                23682          23682
                                 SYS_LOB_P255                23695          23683
                                 SYS_LOB_P256                23696          23696
                                                             23680          23680


INDEX_NAME                       STATUS
-------------------------------- --------
PT_I1                            N/A
PT_PK                            VALID
SYS_IL0000023676C00004$$         N/A


INDEX_NAME                       PARTITION_NAME         STATUS
-------------------------------- ---------------------- --------
PT_I1                            PRT1                   USABLE
                                 PRT2                   USABLE
                                 PRT3                   USABLE
                                 PRT_MAX                USABLE

SYS_IL0000023676C00004$$         SYS_IL_P252            USABLE
                                 SYS_IL_P253            USABLE
                                 SYS_IL_P257            USABLE
                                 SYS_IL_P258            USABLE


SEGMENT_NAME              PARTITION_NAME         TABLESPACE_NAME
------------------------- ---------------------- ------------------------------
PART_TAB                  PRT1                   TEST_8K
                          PRT2                   TEST_8K
                          PRT3                   TEST_8K
                          PRT_MAX                ASSM_2

PT_I1                     PRT1                   TEST_8K
                          PRT2                   TEST_8K
                          PRT3                   TEST_8K
                          PRT_MAX                ASSM_2

PT_PK                                            TEST_8K

SYS_IL0000023676C00004$$  SYS_IL_P252            TEST_8K
                          SYS_IL_P253            TEST_8K
                          SYS_IL_P257            TEST_8K
                          SYS_IL_P258            TEST_8K

SYS_LOB0000023676C00004$$ SYS_LOB_P249           TEST_8K
                          SYS_LOB_P250           TEST_8K
                          SYS_LOB_P255           TEST_8K
                          SYS_LOB_P256           TEST_8K

Before the split partition PRT_MAX – with 4 segments: table, index, LOB, LOBINDEX – has object_id = data_object_id, with the values: 23679 (table), 23692 (index), 23683 (LOB), 23687 (LOBINDEX); and after the split these reappear as the data_object_id values for partition PRT3 (though the object_id values are larger than the data_object_id values) – so we infer that Oracle has simply renamed the various PRT_MAX objects to PRT3 and created new, empty PRT_MAX objects.

We can also see that all the indexes (including the global primary key index) have remained valid. We also note that the data_object_id of the primary key index has not changed, so Oracle didn’t have to rebuild it to ensure that it stayed valid.

There is a problem, though, the LOB segment and LOBINDEX segments for the new PRT_MAX partition are not in the desired target tablespace. So we need to check the effects of the second version of the split command where we add the specification of the LOB tablespaces. This is what we get – after rerunning the entire test script from scratch:


OBJECT_NAME                      SUBOBJECT_NAME          OBJECT_ID DATA_OBJECT_ID
-------------------------------- ---------------------- ---------- --------------
PART_TAB                         PRT1                        23727          23727
                                 PRT2                        23728          23728
                                 PRT_MAX                     23729          23729
                                                             23726

PT_I1                            PRT1                        23740          23740
                                 PRT2                        23741          23741
                                 PRT_MAX                     23742          23742
                                                             23739

PT_PK                                                        23738          23738

SYS_IL0000023726C00004$$         SYS_IL_P272                 23735          23735
                                 SYS_IL_P273                 23736          23736
                                 SYS_IL_P274                 23737          23737

SYS_LOB0000023726C00004$$        SYS_LOB_P269                23731          23731
                                 SYS_LOB_P270                23732          23732
                                 SYS_LOB_P271                23733          23733
                                                             23730          23730


OBJECT_NAME                      SUBOBJECT_NAME          OBJECT_ID DATA_OBJECT_ID
-------------------------------- ---------------------- ---------- --------------
PART_TAB                         PRT1                        23727          23727
                                 PRT2                        23728          23728
                                 PRT3                        23743          23743
                                 PRT_MAX                     23729          23744
                                                             23726

PT_I1                            PRT1                        23740          23740
                                 PRT2                        23741          23741
                                 PRT3                        23750          23750
                                 PRT_MAX                     23749          23749
                                                             23739

PT_PK                                                        23738          23738

SYS_IL0000023726C00004$$         SYS_IL_P272                 23735          23735
                                 SYS_IL_P273                 23736          23736
                                 SYS_IL_P277                 23747          23747
                                 SYS_IL_P278                 23748          23748

SYS_LOB0000023726C00004$$        SYS_LOB_P269                23731          23731
                                 SYS_LOB_P270                23732          23732
                                 SYS_LOB_P275                23745          23745
                                 SYS_LOB_P276                23746          23746
                                                             23730          23730

INDEX_NAME                       STATUS
-------------------------------- --------
PT_I1                            N/A
PT_PK                            UNUSABLE
SYS_IL0000023726C00004$$         N/A

INDEX_NAME                       PARTITION_NAME         STATUS
-------------------------------- ---------------------- --------
PT_I1                            PRT1                   USABLE
                                 PRT2                   USABLE
                                 PRT3                   UNUSABLE
                                 PRT_MAX                USABLE

SYS_IL0000023726C00004$$         SYS_IL_P272            USABLE
                                 SYS_IL_P273            USABLE
                                 SYS_IL_P277            USABLE
                                 SYS_IL_P278            USABLE

SEGMENT_NAME              PARTITION_NAME         TABLESPACE_NAME
------------------------- ---------------------- ------------------------------
PART_TAB                  PRT1                   TEST_8K
                          PRT2                   TEST_8K
                          PRT3                   TEST_8K
                          PRT_MAX                ASSM_2

PT_I1                     PRT1                   TEST_8K
                          PRT2                   TEST_8K
                          PRT3                   TEST_8K
                          PRT_MAX                ASSM_2

PT_PK                                            TEST_8K

SYS_IL0000023726C00004$$  SYS_IL_P272            TEST_8K
                          SYS_IL_P273            TEST_8K
                          SYS_IL_P277            TEST_8K
                          SYS_IL_P278            ASSM_2

SYS_LOB0000023726C00004$$ SYS_LOB_P269           TEST_8K
                          SYS_LOB_P270           TEST_8K
                          SYS_LOB_P275           TEST_8K
                          SYS_LOB_P276           ASSM_2


Before looking at the more complex details the first thing that leaps out to hit the eye is the word UNUSABLE – which appears for the status of the (global) primary key index and the PRT3 subpartition. The (empty) PRT_MAX LOB and LOBINDEX partitions are where we wanted them, but by specifying the location we seem to have broken two index segments that will need to be rebuilt.

It gets worse, because if we check the data_object_id of the original PRT_MAX partition (23729) and its matching index partition (23742) we see that they don’t correspond to the (new) PRT3 data_object_id values which are 23743 and 23750 respectively – the data has been physically copied from one data object to another completely unnecessarily; moreover the same applies to the LOB and LOBINDEX segments – the data object ids for the PRT_MAX LOB and LOBINDEX partitions were 23733 and 23737, the new PRT3 data object ids are 23746 and 23747.

If you did a test with only a tiny data set you might not notice the implicit threat that these changes in data_object_id tell you about – you’re going to be copying the whole LOB segment when you don’t need to.

Happy Ending (maybe)

A quick check with 12.2 suggested that Oracle had got much better at detecting that it didn’t need to copy LOB data and invalidate indexes with the second form of the code; but the OP was on 10g – so that’s not much help. However it was the thought that Oracle might misbehave when you specifyied tablespaces that made me run up this test – in particular I had wondered if specifying a tablespace for the partition that would end up holding the existing data might trigger an accident, so here’s a third variant of the split statement I tested, with the results on the indexes, segments, and data objects. Note that I specify the tablespace only for the new (empty) segments:


alter table part_tab split partition PRT_MAX at(TO_DATE('2019-08-01', 'YYYY-MM-DD')) into (
    PARTITION PRT3,
    PARTITION PRT_MAX tablespace &m_new_ts  LOB (pt_lob) store as (TABLESPACE &m_new_ts)
)
/

OBJECT_NAME                      SUBOBJECT_NAME          OBJECT_ID DATA_OBJECT_ID
-------------------------------- ---------------------- ---------- --------------
PART_TAB                         PRT1                        23752          23752
                                 PRT2                        23753          23753
                                 PRT_MAX                     23754          23754
                                                             23751

PT_I1                            PRT1                        23765          23765
                                 PRT2                        23766          23766
                                 PRT_MAX                     23767          23767
                                                             23764

PT_PK                                                        23763          23763

SYS_IL0000023751C00004$$         SYS_IL_P282                 23760          23760
                                 SYS_IL_P283                 23761          23761
                                 SYS_IL_P284                 23762          23762

SYS_LOB0000023751C00004$$        SYS_LOB_P279                23756          23756
                                 SYS_LOB_P280                23757          23757
                                 SYS_LOB_P281                23758          23758
                                                             23755          23755

OBJECT_NAME                      SUBOBJECT_NAME          OBJECT_ID DATA_OBJECT_ID
-------------------------------- ---------------------- ---------- --------------
PART_TAB                         PRT1                        23752          23752
                                 PRT2                        23753          23753
                                 PRT3                        23768          23754
                                 PRT_MAX                     23754          23769
                                                             23751

PT_I1                            PRT1                        23765          23765
                                 PRT2                        23766          23766
                                 PRT3                        23775          23767
                                 PRT_MAX                     23774          23774
                                                             23764

PT_PK                                                        23763          23763

SYS_IL0000023751C00004$$         SYS_IL_P282                 23760          23760
                                 SYS_IL_P283                 23761          23761
                                 SYS_IL_P287                 23772          23762
                                 SYS_IL_P288                 23773          23773

SYS_LOB0000023751C00004$$        SYS_LOB_P279                23756          23756
                                 SYS_LOB_P280                23757          23757
                                 SYS_LOB_P285                23770          23758
                                 SYS_LOB_P286                23771          23771
                                                             23755          23755
INDEX_NAME                       STATUS
-------------------------------- --------
PT_I1                            N/A
PT_PK                            VALID
SYS_IL0000023751C00004$$         N/A

INDEX_NAME                       PARTITION_NAME         STATUS
-------------------------------- ---------------------- --------
PT_I1                            PRT1                   USABLE
                                 PRT2                   USABLE
                                 PRT3                   USABLE
                                 PRT_MAX                USABLE

SYS_IL0000023751C00004$$         SYS_IL_P282            USABLE
                                 SYS_IL_P283            USABLE
                                 SYS_IL_P287            USABLE
                                 SYS_IL_P288            USABLE

SEGMENT_NAME              PARTITION_NAME         TABLESPACE_NAME
------------------------- ---------------------- ------------------------------
PART_TAB                  PRT1                   TEST_8K
                          PRT2                   TEST_8K
                          PRT3                   TEST_8K
                          PRT_MAX                ASSM_2

PT_I1                     PRT1                   TEST_8K
                          PRT2                   TEST_8K
                          PRT3                   TEST_8K
                          PRT_MAX                ASSM_2

PT_PK                                            TEST_8K

SYS_IL0000023751C00004$$  SYS_IL_P282            TEST_8K
                          SYS_IL_P283            TEST_8K
                          SYS_IL_P287            TEST_8K
                          SYS_IL_P288            ASSM_2

SYS_LOB0000023751C00004$$ SYS_LOB_P279           TEST_8K
                          SYS_LOB_P280           TEST_8K
                          SYS_LOB_P285           TEST_8K
                          SYS_LOB_P286           ASSM_2

All the index and index partitions stay valid; the new empty segments all end up in the target tablespace, and all the data object ids for the old PRT_MAX partitions becaome the data object ids for the new PRT3 partitions. Everything we want, and no physical rebuilds of any data sets.

Moral:

When you’re testing, especially when you’re doing a small test while anticipating a big data set, don’t rely on the clock; check the data dictionary (and trace files, if necessary) carefully to find out what activity actually took place.

Footnote:

It’s possible that there are ways to fiddle around with the various default attributes of the partitioned table to get the same effect – but since 12.2 is much better behaved anyway there’s no point in me spending more time looking for alternative solutions to a 10g problem.

 

November 19, 2018

Table order

Filed under: ANSI Standard,Execution plans,Oracle,Tuning — Jonathan Lewis @ 1:30 pm BST Nov 19,2018

Over the last few days I’ve highlighted on Twitter a couple of older posts showing how a change in the order that tables appear in the from clause could affect the execution plan of a query. In one case the note was purely theoretical describing a feature of the way the optimizer works with simple query blocks, in the other case the note was about an anomaly with table elimination that could appear with both “ANSI” and “traditional” Oracle syntax.

Here’s another note that might be more generally useful – an example of an odd side effect of ordering and “ANSI” syntax, with a suggestion for a pattern for writing ANSI SQL. It’s based on a test I wrote to play around with a problem that showed up on the Oracle database forum more than six years ago and shows a strange inconsistency. The setup is a little long-winded as the example involves 4 tables, so I’ll leave the script to create, load and index the tables to the end of the note. Here’s the query that introduced the problem; it’s a fairly straightforward 4 table join with two (left) outer joins:


select
        episode.episode_id , episode.cross_ref_id , episode.date_required ,
        product.number_required,
        request.site_id
from
        episode
left join
        request
on      episode.cross_ref_id = request.cross_ref_id
join
        product
ON      episode.episode_id = product.episode_id
left join
        product_sub_type
ON      product.prod_sub_type_id = product_sub_type.prod_sub_type_id
where
        episode.department_id = 2
and     product.status = 'I'
order by
        episode.date_required
;

And here’s the execution plan:


----------------------------------------------------------------------------------------
| Id  | Operation            | Name    | Rows  | Bytes |TempSpc| Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT     |         | 33333 |  1725K|       | 17135   (4)| 00:00:01 |
|   1 |  SORT ORDER BY       |         | 33333 |  1725K|  2112K| 17135   (4)| 00:00:01 |
|*  2 |   HASH JOIN OUTER    |         | 33333 |  1725K|  1632K| 16742   (4)| 00:00:01 |
|*  3 |    HASH JOIN         |         | 33333 |  1236K|       |   436   (8)| 00:00:01 |
|*  4 |     TABLE ACCESS FULL| PRODUCT | 33333 |   325K|       |    54  (12)| 00:00:01 |
|*  5 |     TABLE ACCESS FULL| EPISODE |   300K|  8203K|       |   375   (6)| 00:00:01 |
|   6 |    TABLE ACCESS FULL | REQUEST |  4000K|    57M|       | 13542   (3)| 00:00:01 |
----------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("EPISODE"."CROSS_REF_ID"="REQUEST"."CROSS_REF_ID"(+))
   3 - access("EPISODE"."EPISODE_ID"="PRODUCT"."EPISODE_ID")
   4 - filter("PRODUCT"."STATUS"='I')
   5 - filter("EPISODE"."DEPARTMENT_ID"=2)

The first thing you’ll notice, of course, is that the plan reports a three table join. Thanks to various referential integrity constraints, the absence of the table in the final select list, and the nature of the join to that table, the optimizer has determined that the product_sub_type table could be eliminated from the join without changing the result set.

What you can’t tell from the plan is that there’s an index on the request table that holds all the columns needed to satisfy the query, and an index fast full scan on the index would be significantly more efficient than the tablescan that appears at operation 6.

Having noticed from the plan that product_sub_type is redundant, the obvious thing to do before investigating further is to rewrite the statement to remove the table . Here’s the resulting query, with execution plan:

----------------------------------------------------------------------------------------------
| Id  | Operation              | Name        | Rows  | Bytes |TempSpc| Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT       |             | 33333 |  1725K|       |  5525   (6)| 00:00:01 |
|   1 |  SORT ORDER BY         |             | 33333 |  1725K|  2112K|  5525   (6)| 00:00:01 |
|*  2 |   HASH JOIN OUTER      |             | 33333 |  1725K|  1632K|  5132   (7)| 00:00:01 |
|*  3 |    HASH JOIN           |             | 33333 |  1236K|       |   436   (8)| 00:00:01 |
|*  4 |     TABLE ACCESS FULL  | PRODUCT     | 33333 |   325K|       |    54  (12)| 00:00:01 |
|*  5 |     TABLE ACCESS FULL  | EPISODE     |   300K|  8203K|       |   375   (6)| 00:00:01 |
|   6 |    INDEX FAST FULL SCAN| IX4_REQUEST |  4000K|    57M|       |  1932   (7)| 00:00:01 |
----------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("EPISODE"."CROSS_REF_ID"="REQUEST"."CROSS_REF_ID"(+))
   3 - access("EPISODE"."EPISODE_ID"="PRODUCT"."EPISODE_ID")
   4 - filter("PRODUCT"."STATUS"='I')
   5 - filter("EPISODE"."DEPARTMENT_ID"=2)

So – when the optimizer removes the product_sub_type from the query the plan reports a tablescan of request, when we remove product_sub_type the plan reports an index fast full scan of an appropriate index – which appears to be roughly one seventh (1,932/13,542) of the size of the table. It’s a little surprising that the optimizer didn’t get it right by itself – but “ANSI” style SQL often displays quirky little side effects because of the way the optimizer transforms it into traditional Oracle style.

We could stop at that point, of course, but then you’d wonder about the significance of the title of the post. So let’s play around with the join order of the original query, without removing the product_sub_type table.

As a general strategy (though not an absolute rule) I tend to arrange code so that outer joins don’t appear before “inner” joins. In this example that means I would have written the original statement as follows:


select
        episode.episode_id, episode.cross_ref_id, episode.date_required,
        product.number_required,
        request.site_id
from
        episode
join
        product
ON      product.episode_id = episode.episode_id
left join
        product_sub_type
ON      product_sub_type.prod_sub_type_id = product.prod_sub_type_id
left join
        request
on      request.cross_ref_id = episode.cross_ref_id
where
        episode.department_id = 2
and     product.status        = 'I'
order by
        episode.date_required
;

All I’ve done is move the join between episode and product up the SQL, following it with the outer join to product_sub_type, finally closing with the outer join between episode and request. Here’s the execution plan – which you might expect to look exactly like the original plan:


----------------------------------------------------------------------------------------------
| Id  | Operation              | Name        | Rows  | Bytes |TempSpc| Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT       |             | 33333 |  1725K|       |  5525   (6)| 00:00:01 |
|   1 |  SORT ORDER BY         |             | 33333 |  1725K|  2112K|  5525   (6)| 00:00:01 |
|*  2 |   HASH JOIN OUTER      |             | 33333 |  1725K|  1632K|  5132   (7)| 00:00:01 |
|*  3 |    HASH JOIN           |             | 33333 |  1236K|       |   436   (8)| 00:00:01 |
|*  4 |     TABLE ACCESS FULL  | PRODUCT     | 33333 |   325K|       |    54  (12)| 00:00:01 |
|*  5 |     TABLE ACCESS FULL  | EPISODE     |   300K|  8203K|       |   375   (6)| 00:00:01 |
|   6 |    INDEX FAST FULL SCAN| IX4_REQUEST |  4000K|    57M|       |  1932   (7)| 00:00:01 |
----------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("REQUEST"."CROSS_REF_ID"(+)="EPISODE"."CROSS_REF_ID")
   3 - access("PRODUCT"."EPISODE_ID"="EPISODE"."EPISODE_ID")
   4 - filter("PRODUCT"."STATUS"='I')
   5 - filter("EPISODE"."DEPARTMENT_ID"=2)

The product_sub_type table has been eliminated and we’re doing an index fast full scan of the ix4_request index instead of a tablescan of the much larger request table.

tl;dr

Changing the order of the tables in an ANSI join – especially when there are outer joins involved – could make a significant difference to the way the query is transformed and optimised. While it is nice to write the table ordering so that “chains” of joins are easily visible, bear in mind that re-ordering the join to postpone outer joins may be enough to help the optimizer produce a better execution plan.

Footnote

If you want to play around with the example, here’s the code to create and load the tables. The code doesn’t follow my usual style as most of it is cut-n-pasted from the Oracle forum thread:


rem
rem     script:         Ansi_outer_5.sql
rem     Dated:          July 2012
rem     Author:         Jonathan Lewis
rem
rem     Last tested
rem             18.3.0.0        iffs still not used by default
rem             12.2.0.1        iffs still not used by default
rem

create table episode (
        episode_id number (*,0),
        department_id number (*,0),
        date_required date,
        cross_ref_id varchar2 (11),
        padding varchar2 (80),
        constraint pk_episode primary key (episode_id)
)
;

create table product_sub_type (
        prod_sub_type_id number (*,0),
        sub_type_name varchar2 (20),
        units varchar2 (20),
        padding varchar2 (80),
        constraint pk_product_sub_type primary key (prod_sub_type_id)
)
;

create table product (
        product_id number (*,0),
        prod_type_id number (*,0),
        prod_sub_type_id number (*,0),
        episode_id number (*,0),
        status varchar2 (1),
        number_required number (*,0),
        padding varchar2 (80),
        constraint pk_product primary key (product_id),
        constraint nn_product_episode check (episode_id is not null) 
)
;

alter table product add constraint fk_product 
        foreign key (episode_id) references episode (episode_id)
;

alter table product add constraint fk_prod_sub_type
        foreign key (prod_sub_type_id) references product_sub_type (prod_sub_type_id)
;

create table request (
        request_id number (*,0),
        department_id number (*,0),
        site_id number (*,0),
        cross_ref_id varchar2 (11),
        padding varchar2 (80),
        padding2 varchar2 (80),
        constraint pk_request primary key (request_id),
        constraint nn_request_department check (department_id is not null),
        constraint nn_request_site_id check (site_id is not null)
)
;

prompt  ===================
prompt  Loading episode ...
prompt  ===================

insert /*+ append */ into episode
with generator as 
(select rownum r
          from (select rownum r from dual connect by rownum <= 1000) a,
               (select rownum r from dual connect by rownum <= 1000) b,
               (select rownum r from dual connect by rownum <= 1000) c
         where rownum <= 1e6
       ) 
select r, 2,
    sysdate + mod (r, 14),
    to_char (r, '0000000000'),
    'ABCDEFGHIJKLMNOPQRSTUVWXYZ' || to_char (r, '000000')
  from generator g
where g.r <= 3e5
/ 

commit;

prompt  ============================
prompt  Loading product_sub_type ...
prompt  ============================

insert /*+ append */ into product_sub_type
with generator as 
(select rownum r
          from (select rownum r from dual connect by rownum <= 1000) a,
               (select rownum r from dual connect by rownum <= 1000) b,
               (select rownum r from dual connect by rownum <= 1000) c
         where rownum <= 1e6
       ) 
select r, 
       to_char (r, '000000'),
       to_char (mod (r, 3), '000000'),
       'ABCDE' || to_char (r, '000000')
  from generator g
where g.r <= 15
/ 

commit;

prompt  ===================
prompt  Loading product ...
prompt  ===================

insert /*+ append */ into product
with generator as 
(select rownum r
          from (select rownum r from dual connect by rownum <= 1000) a,
               (select rownum r from dual connect by rownum <= 1000) b,
               (select rownum r from dual connect by rownum <= 1000) c
         where rownum <= 1e6
       ) 
select r, mod (r, 12) + 1, mod (r, 15) + 1, mod (r, 300000) + 1,
       decode (mod (r, 3), 0, 'I', 1, 'C', 2, 'X', 'U'),
       dbms_random.value (1, 100), NULL
  from generator g
where g.r <= 1e5
/ 

commit;

prompt  ===================
prompt  Loading request ...
prompt  ===================

insert /*+ append */ into request
with generator as 
(select rownum r
          from (select rownum r from dual connect by rownum <= 1000) a,
               (select rownum r from dual connect by rownum <= 1000) b,
               (select rownum r from dual connect by rownum <= 1000) c
         where rownum <= 1e7
       ) 
select 
        r, mod (r, 4) + 1, 1, to_char (r, '0000000000'),
        'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz01234567890123456789' || to_char (r, '000000'),
        'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789012345678' || to_char (r, '000000')
  from generator g
where g.r <= 4e6
/ 

commit;

create index ix1_episode_cross_ref on episode (cross_ref_id);

create index ix1_product_episode on product (episode_id);
create index ix2_product_type on product (prod_type_id);

create index ix1_request_site on request (site_id);
create index ix2_request_dept on request (department_id);
create index ix3_request_cross_ref on request (cross_ref_id);
create index ix4_request on request (cross_ref_id, site_id);

exec dbms_stats.gather_schema_stats ('test_user')

Note that there is a call to gather_schema_stats() at the end, rather than a set of 4 calls to gather_table_stats(); you may want to change this. The entire data set, including indexes, will need about 1.5GB of free space.

 

June 26, 2018

Hacking Profiles

Filed under: Execution plans,Hints,Oracle,Tuning — Jonathan Lewis @ 8:25 am BST Jun 26,2018

Saturday’s posting about setting cursor_sharing to force reminded me about one of the critical limitations of SQL Profiles (which is one of those little reason why you shouldn’t be hacking SQL Profiles as a substitute for SQL Plan Baselines). Here’s a demo (taking advantage of some code that I think Kerry Osborne published several years ago) of creating an SQL Profile from the current execution plan of a simple statement – first we create some data and find the sql_id and child_number for a simple query:

rem
rem     Script:         sql_profile_restriction.sql
rem     Author:         Jonathan Lewis
rem     Dated:          Jun 2018
rem     Purpose:
rem
rem     Last tested
rem             18.3.0.0
rem             12.2.0.1
rem             12.1.0.2

create table t1
as
select
        rownum            n1,
        rownum            n2,
        lpad(rownum,10)   small_vc,
        rpad('x',100,'x') padding
from dual
connect by
        level <= 1e4 -- > comment to avoid WordPress format issue
;

alter system flush shared_pool;

select /*+ find this */ count(*) from t1 where n1 = 15 and n2 = 15;

column sql_id new_value m_sql_id
column child_number new_value m_child_number

select  sql_id , child_number
from    v$sql
where   sql_text like 'selec%find this%'
and     sql_text not like '%v$sql%'
;

Now I can create the SQL Profile for this query using the Kerry Osborne code:


declare
        ar_profile_hints        sys.sqlprof_attr;
        cl_sql_text clob;
begin
        select
                extractvalue(value(d), '/hint') as outline_hints
        bulk collect into 
                ar_profile_hints
        from
                xmltable(
                        '/*/outline_data/hint'
                        passing (
                                select
                                        xmltype(other_xml) as xmlval
                                from
                                        v$sql_plan
                                where
                                        sql_id = '&m_sql_id'
                                and     child_number =  &m_child_number 
                                and     other_xml is not null
                )
        ) d;

        select
                sql_fulltext
        into
                cl_sql_text
        from
                v$sql
        where
                sql_id = '&m_sql_id'
        and     child_number =  &m_child_number
        ;

        dbms_sqltune.import_sql_profile(
                sql_text        => cl_sql_text, 
                profile         => ar_profile_hints, 
                category        => 'DEFAULT',
                name            => 'PROFILE_LITERAL',
                force_match     =>  true
        );
end;
/

Note particularly that I have given the profile a simple name, put it in the DEFAULT category, and set force_match to true (which means that the profile ought to be used even if I change the literal values in the query). So now let’s check that the profile will be used as expected. First I’ll create an index that is a really good index for this query, then I’ll run the query to see if Oracle uses the index or obeys the profile; then I’ll change the query (literals) slightly and check again. I’ll also run a query that won’t be recognised as legally matching (thanks to the changed “hint”) to demonistrate that the index could have been used if the profile hadn’t been there:


alter system flush shared_pool;
set serveroutput off

prompt  =============================
prompt  Is the SQL Profile used ? Yes
prompt  =============================

select /*+ find this */ count(*) from t1 where n1 = 15 and n2 = 15;
select * from table(dbms_xplan.display_cursor);

select /*+ find this */ count(*) from t1 where n1 = 16 and n2 = 16;
select * from table(dbms_xplan.display_cursor);

select /*+ Non-match */ count(*) from t1 where n1 = 16 and n2 = 16;
select * from table(dbms_xplan.display_cursor);

Here (with a little cosmetic adjustment) are the three outputs from dbms_xplan.display_cursor():

SQL_ID  ayxnhrqzd38g3, child number 0
-------------------------------------
select /*+ find this */ count(*) from t1 where n1 = 15 and n2 = 15
---------------------------------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |       |       |    24 (100)|          |
|   1 |  SORT AGGREGATE    |      |     1 |     8 |            |          |
|*  2 |   TABLE ACCESS FULL| T1   |     1 |     8 |    24   (5)| 00:00:01 |
---------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - filter(("N1"=15 AND "N2"=15))

Note
-----
   - SQL profile PROFILE_LITERAL used for this statement


SQL_ID  gqjb8pp35cnyp, child number 0
-------------------------------------
select /*+ find this */ count(*) from t1 where n1 = 16 and n2 = 16
---------------------------------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |       |       |    24 (100)|          |
|   1 |  SORT AGGREGATE    |      |     1 |     8 |            |          |
|*  2 |   TABLE ACCESS FULL| T1   |     1 |     8 |    24   (5)| 00:00:01 |
---------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - filter(("N1"=16 AND "N2"=16))

Note
-----
   - SQL profile PROFILE_LITERAL used for this statement


SQL_ID  3gvaxypny9ry1, child number 0
-------------------------------------
select /*+ Non-match */ count(*) from t1 where n1 = 16 and n2 = 16
---------------------------------------------------------------------------
| Id  | Operation         | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |       |       |       |     1 (100)|          |
|   1 |  SORT AGGREGATE   |       |     1 |     8 |            |          |
|*  2 |   INDEX RANGE SCAN| T1_I1 |     1 |     8 |     1   (0)| 00:00:01 |
---------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("N1"=16 AND "N2"=16)

As you can see the SQL Profile is reported as used in the first two queries, and (visibly) seems to have been used. Then in the third query where we wouldn’t expect a match the SQL Profile is not used and we get a plan that shows the index would have been used for the other queries had the SQL Profile not been there. So far, so good – the profile behaves as everyone might expect.

Bind Variable Breaking

Now let’s repeat the entire experiment but first do a global find and replace to change every occurrence of “n2 = 16” to “n2 = :b1”. We’ll also change the name of the SQL Profile when we create it to PROFILE_MIXED, and we’ll put in a couple of lines at the top of the script to declare the variable b1 and set its value, then the final test in the script will look like this:


alter system flush shared_pool;
create index t1_i1 on t1(n1, n2);

exec :b1 := 15

select /*+ find this */ count(*) from t1 where n1 = 15 and n2 = :b1;
select * from table(dbms_xplan.display_cursor);

exec :b1 := 16

select /*+ find this */ count(*) from t1 where n1 = 16 and n2 = :b1;
select * from table(dbms_xplan.display_cursor);

And here are the execution plans from the two queries:


SQL_ID  236f82vmsvjab, child number 0
-------------------------------------
select /*+ find this */ count(*) from t1 where n1 = 15 and n2 = :b1
---------------------------------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |       |       |    24 (100)|          |
|   1 |  SORT AGGREGATE    |      |     1 |     8 |            |          |
|*  2 |   TABLE ACCESS FULL| T1   |     1 |     8 |    24   (5)| 00:00:01 |
---------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - filter(("N1"=15 AND "N2"=:B1))

Note
-----
   - SQL profile PROFILE_MIXED used for this statement


SQL_ID  7nakm3tw27z3c, child number 0
-------------------------------------
select /*+ find this */ count(*) from t1 where n1 = 16 and n2 = :b1
---------------------------------------------------------------------------
| Id  | Operation         | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |       |       |       |     1 (100)|          |
|   1 |  SORT AGGREGATE   |       |     1 |     8 |            |          |
|*  2 |   INDEX RANGE SCAN| T1_I1 |     1 |     8 |     1   (0)| 00:00:01 |
---------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("N1"=16 AND "N2"=:B1)


As you can see the execution plan for the original query is still doing a full tablescan and reporting the SQL Profile as used; but we’re not using (or reporting) the SQL Profile when we change the literal values – even though a query against dba_sql_profiles will tell us that the profile has force_matching = ‘YES’.

tl;dr

(Clarified in response to Mohammed Houri’s comment below)
If you use an SQL Profile with force_match => true to “hide” the literals in a statement that includes bind variables (even if they appear only in the select list, in fact) the mechanism will not be used, and the SQL Profile will apply only to the original statement.

Update

Christian Antognini has an elegant little script that uses the dbms_sqltune.sqltext_to_signature() function to highlight this point (among others).  Bear in mind, before you run the script, that you need to be licensed to use the dbms_sqltune package to do so.

 

March 6, 2018

Match_recognise – 2

Filed under: 12c,Match_recognize,Oracle,Tuning — Jonathan Lewis @ 7:59 am BST Mar 6,2018

In my previous post I presented a warning about the potential cost of sorting and the cost of failing to find a match after each pass of a long search. In a comment on that post Stew Ashton reminded me that the cost of repeatedly trying to find a match starting from “the next row down” could be less of a threat than the cost of “back-tracking” before moving to the next row down.

Taking the example from the previous posting to explain – the requirement was for customers who had executed a transaction in September but not October, and a match_recognize() clause suggested on the ODC (formerly OTN) database forum to implement this requirement was as follows:

match_recognize
(
        partition by cust_id
        order by trans_dt
        measures
                padding as t1_padding,
        pattern(x+ y* $) 
        define
                x as mth  = 9,
                y as mth != 10
);

In the blog post I raised the issue of an extreme case where there were 100,000 transactions for a single customer of which all but the last was a September transaction and the last was an October transaction. This would have two effects – first that we could have to sort 100,000 rows, including the cust_id that appeared in the “partition by” clause and the 1000-character padding column that was listed in the measures clause, leading to a huge dump to, and constant re-read of, the temporary tablespace; secondly that having failed to find a match starting from row 1 Oracle would go back to row 2 and try again, then to row 3, and so on.

The point that Stew Ashton made was that Oracle doesn’t just “go back to” row 2, it will be unwinding a stack, or reversing out a recursive descent to get there. What this means is that Oracle will fail as it reaches the October at row 100,000 and say “no more X rows, is this a Y row ? no”, backtrack to row 999,998 and say “what if I stop collecting X rows here and start looking for Y rows?”, so it reads row 999,999 as a Y row (since 9 != 10), then finds row 100,000 and fails the match. So it backtracks again to row 999,997 and says “what if I stop collecting X rows here and start looking for Y rows?”, and this time it finds identifies 999,998 and 999,999 as Y rows, then fails on row 100,000.

Remember, this is still part of the attempt to match the pattern starting at row 1 – and there are 999,996 more steps backwards still to go, and the further we go back the further we come forward again until we fail — and there are 999,998 steps we have to back-track before we start to consider a pattern starting are row 2..

To demonstrate the costs I’ve got three variants of the original query. First, the query as it was but limited to just 1,000 rows for a single customer; second a changed pattern that highlights the cost of trying to use back-tracking to match the pattern just once, starting from row 1 (the pattern doesn’t actually meet the original requirement because it would only find customers whose first transaction of the year was in September); finally a changed pattern that achieves the required result much more efficiently than the original (but still very slowly) by adding some human intelligence to the implementation.

Here’s version 1 – which took 257 CPU seconds to handle just 1,000 rows:

select  *
from    (
        select
                t1.*,
                extract(year from trans_dt) yr,
                extract(month from trans_dt) mth
        from
                t1
        )
match_recognize
(
        partition by cust_id
        order by trans_dt
        measures
                padding as t1_padding,
                classifier() cl,
                match_number() as mn
        pattern(x+ y* $)
        define
                x as mth  = 9,
                y as mth != 10
);

You’ll see that I’ve included the “debug” functions of classifier() and match_number() in the SQL above – these are particularly useful with the options “all rows per match” and “with unmatched rows” when you’re trying to figure out why your match_recognize() clause is not producing the right results, so I’ve left them there purely for reference.

Then there’s a version where I’ve made the modification suggested by Stew Ashton to demonstrate the full cost of an attempt to match only if the pattern starts on the first row of the partition. This took just 0.83 CPU seconds to complete. This may sound fairly reasonable, but if you compare that to the time it might take simply to sort and walk once through 1,000 rows you’ll realise that it’s actually pretty expensive – and it’s not surprising that when we had to do the same thing 1,000 times (on a slowly decreasing set, of course, as we work our way down the partition) the original task took 257 CPU seconds.

select  *
from    (
        select
                t1.*,
                extract(year from trans_dt) yr,
                extract(month from trans_dt) mth
        from
                t1
        )
match_recognize
(
        partition by cust_id
        order by trans_dt
        measures
                padding as t1_padding,
                classifier() cl,
                match_number() as mn
        pattern(^ x+ y* $)
        define
                x as mth  = 9,
                y as mth != 10
);

You’ll notice the caret “^” at the start of the pattern – this means the pattern must start at the first row of the partition (just as the “$” means the pattern has to end at the end of the partition).

Finally, thinking of a better way of using match_recognize() for this requirement we realise that we know that November comes after October, and December comes after November so (in the context of our example) the predicate “!= 10” is equivalent to “> 10”. With this code change the original query took 0.82 CPU seconds.


select  *
from    (
        select
                t1.*,
                extract(year from trans_dt) yr,
                extract(month from trans_dt) mth
        from
                t1
        )
match_recognize
(
        partition by cust_id
        order by trans_dt
        measures
                padding as t1_padding,
                classifier() cl,
                match_number() as mn
        pattern(x+ y* $)
        define
                x as mth  = 9,
                y as mth  > 10
);

In this case we still have to do a lot of back tracking, but each time we backtrack one step we then check just one row forward for the match to fail (9 is not greater than 10), whereas with the original if we have backtracked 750 steps (for example) we would then have to check 750 rows before we reached the October row for the match to fail.

Bottom line: back-tracking is a massive cost if you have to take a lot of steps backwards to the previous starting row; and you need the match to fail (or succeed) as fast as possible as you start stepping forward again.

Addendum

Since Stew Ashton had highlighted the omission in the previous blog post I passed him a copy of this post before publishing it, asking him to check whether there were any errors or omissions in the way I had described the work Oracle would do back tracking in this example. He said that he couldn’t think of anything to improve the explanation (though I will still claim responsibility for any errors, omissions, or ambiguities) and then suggested another, far more efficient, way of getting the required answer by (again) re-thinking the question before writing the code. His solution looks like this:


select  *
from    (
        select
                t1.*,
                extract(year from trans_dt) yr,
                extract(month from trans_dt) mth
        from
                t1
        )
match_recognize
(
        partition by cust_id
        order by trans_dt nulls first
        measures
                padding as t1_padding,
                classifier() cl,
                match_number() as mn
        pattern(x{1})
        define
                x as mth  = 9 and (
                             next(mth) is null or next(mth) > 10
                     )
)
;

The pattern here simply says: for the partition find the first “X”-row, but an X-row is defined as “month is september and either there are no more rows or the next row is after October”. You’ll notice that I’ve modified the “order by” clause to put nulls first – there are none in the sample data, but if there were this change to the order would ensure that for a row where “mth = 9″ the “next(mth)” could only be null if the current row were the last in the partition.

If you imagine walking through the pattern-matching process now, you start looking at rows and keep going until you reach the first September, and each time you find a September you check the next row to see if it’s past the end of partition, or a November or December; if it is you report the current row and move to the end of the partition, if it isn’t you just walk to the next row and repeat the process – you never back-track. Effectively the workload here is simply to sort then walk non-stop through the whole list – and Oracle even tells us that we are using this optimum strategy in the execution plan:


---------------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                                       | Name | Starts | E-Rows | A-Rows |   A-Time   | Buffers |  OMem |  1Mem | Used-Mem |
---------------------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                                |      |      1 |        |      0 |00:00:00.01 |     146 |       |       |          |
|   1 |  VIEW                                           |      |      1 |   1000 |      0 |00:00:00.01 |     146 |       |       |          |
|   2 |   MATCH RECOGNIZE SORT DETERMINISTIC FINITE AUTO|      |      1 |   1000 |      0 |00:00:00.01 |     146 |  1186K|   567K| 1054K (0)|
|   3 |    VIEW                                         |      |      1 |   1000 |   1000 |00:00:00.01 |     146 |       |       |          |
|   4 |     TABLE ACCESS FULL                           | T1   |      1 |   1000 |   1000 |00:00:00.01 |     146 |       |       |          |
---------------------------------------------------------------------------------------------------------------------------------------------

Operation 2 – the Match Recognize Sort operation – is reported as “deterministic finite auto”, which basically means the duration of the process is predictable because Oracle knows it is a straight end to end walk with no back-tracking. This is the ideal thing to see when you try to design code using match_recognize().

February 22, 2018

Huge Pages

Filed under: Oracle,RAC,Troubleshooting,Tuning — Jonathan Lewis @ 9:03 am BST Feb 22,2018

A useful quick summary from Neil Chandler replying to a thread on Oracle-L:

Topic: RAC install on Linux

You should always be using Hugepages.

They give a minor performance improvement and a significant memory saving in terms of the amount of memory needed to handle the pages – less Transaction Lookaside Buffers, which also means less TLB misses (which are expensive).

You are handling the memory chopped up into 2MB pieces instead of 4K. But you also have a single shared memory TLB for Hugepages.

The kernel has less work to do, bookkeeping fewer pointers in the TLB.

You also have contiguous memory allocation and it can’t be swapped.

If you are having problems with Hugepages, you have probably overallocated them (I’ve seen this several times at clients so it’s not uncommon). Hugepages can *only* be used for your SGA’s. All of your SGA’s should fit into the Hugepages and that should generally be no more than about 60% of the total server memory (but there are exceptions), leaving plenty of “normal” memory (small pages) for PGA , O/S and other stuff like monitoring agendas.

As an added bonus, AMM can’t use Hugepages, so your are forced to use ASMM. AMM doesn’t work well and has been kind-of deprecated by oracle anyway – dbca won’t let you setup AMM if the server has more than 4GB of memory.

There are a few follow-up emails after Neil’s; particularly helpful are two from Stefan Koehler, here and here.

 

For reference here’s a link to the Linux v6 guide to setting up huge pages for Oracle (if I remember to keep this up to date I’ll add further links for newer versions of Linux – unless someone conveniently supplies them in the comments). And a link to Tim Hall’s pages on the subject that add some further bits and pieces that are likely to be relevant in some environments.

 

 

 

 

 

 

September 19, 2017

With Subquery()

Filed under: CBO,match_recognize,Oracle,Subquery Factoring,Tuning — Jonathan Lewis @ 7:19 pm BST Sep 19,2017

Here’s a little oddity that came up recently on the OTN database forum – an example where a “with” subquery (common table expression / factored subquery) produced a different execution plan from the equivalent statement with the subquery moved to an inline view; tested in 12.1.0.2 and 12.2.0.1. Here are the two variations:


rem
rem     Script:         match_recognize_05.sql
rem     Author:         Jonathan Lewis
rem     Dated:          Sep 2017
rem 
rem     Last tested
rem             18.1.0.0        via LiveSQL
rem             12.2.0.1
rem             12.1.0.2

with  tbl as (
          select 1 col1, 'a'  col2 from dual
union all select 2 , 'a' from dual
union all select 3 , 'b' from dual
union all select 4 , 'a' from dual
union all select 5 , 'a' from dual
union all select 6 , 'b' from dual
union all select 7 , 'b' from dual
),
lag_data as (
        select col1, col2, lag(col2) over (order by col1) col2a from tbl
)
select  col1, col2
from    lag_data
where   col2a is null or col2a <> col2
order by col1
;

with  tbl as (
          select 1 col1, 'a'  col2 from dual
union all select 2 , 'a' from dual
union all select 3 , 'b' from dual
union all select 4 , 'a' from dual
union all select 5 , 'a' from dual
union all select 6 , 'b' from dual
union all select 7 , 'b' from dual
)
select  col1, col2
from    (
        select col1, col2, lag(col2) over (order by col1) col2a from tbl
        )
where   col2a is null or col2a <> col2
order by col1
;

You’ll notice that there’s an explicit “order by” clause at the end of both queries. If you want the result set to appear in a specific order you should always specify the order and not assume that it will appear as a side effect; but in this case the ordering for the “order by” clause seems to match the ordering needed for the analytic function, so we might hope that the optimizer would take advantage of the analytic “window sort” and not bother with a “sort order by” clause. But here are the two plans – first with subquery factoring, then with the inline view:


-------------------------------------------------------------------------
| Id  | Operation        | Name | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------
|   0 | SELECT STATEMENT |      |       |       |    16 (100)|          |
|   1 |  SORT ORDER BY   |      |     7 |    56 |    16  (13)| 00:00:01 |
|*  2 |   VIEW           |      |     7 |    56 |    15   (7)| 00:00:01 |
|   3 |    WINDOW SORT   |      |     7 |    42 |    15   (7)| 00:00:01 |
|   4 |     VIEW         |      |     7 |    42 |    14   (0)| 00:00:01 |
|   5 |      UNION-ALL   |      |       |       |            |          |
|   6 |       FAST DUAL  |      |     1 |       |     2   (0)| 00:00:01 |
|   7 |       FAST DUAL  |      |     1 |       |     2   (0)| 00:00:01 |
|   8 |       FAST DUAL  |      |     1 |       |     2   (0)| 00:00:01 |
|   9 |       FAST DUAL  |      |     1 |       |     2   (0)| 00:00:01 |
|  10 |       FAST DUAL  |      |     1 |       |     2   (0)| 00:00:01 |
|  11 |       FAST DUAL  |      |     1 |       |     2   (0)| 00:00:01 |
|  12 |       FAST DUAL  |      |     1 |       |     2   (0)| 00:00:01 |
-------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - filter(("COL2A" IS NULL OR "COL2A"<>"COL2"



-------------------------------------------------------------------------
| Id  | Operation        | Name | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------
|   0 | SELECT STATEMENT |      |       |       |    15 (100)|          |
|*  1 |  VIEW            |      |     7 |    56 |    15   (7)| 00:00:01 |
|   2 |   WINDOW SORT    |      |     7 |    42 |    15   (7)| 00:00:01 |
|   3 |    VIEW          |      |     7 |    42 |    14   (0)| 00:00:01 |
|   4 |     UNION-ALL    |      |       |       |            |          |
|   5 |      FAST DUAL   |      |     1 |       |     2   (0)| 00:00:01 |
|   6 |      FAST DUAL   |      |     1 |       |     2   (0)| 00:00:01 |
|   7 |      FAST DUAL   |      |     1 |       |     2   (0)| 00:00:01 |
|   8 |      FAST DUAL   |      |     1 |       |     2   (0)| 00:00:01 |
|   9 |      FAST DUAL   |      |     1 |       |     2   (0)| 00:00:01 |
|  10 |      FAST DUAL   |      |     1 |       |     2   (0)| 00:00:01 |
|  11 |      FAST DUAL   |      |     1 |       |     2   (0)| 00:00:01 |
-------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter(("COL2A" IS NULL OR "COL2A"<>"COL2"))

The two plans are different, and the difference is an extra “sort order by” operation even though the optimizer has moved the subquery with the analtyic function inline so that in principle both statements are the technically the same and merely cosmetically different.

It’s been some time since I’ve noticed subquery factoring resulting in a change in plan when the expected effect is purely cosmetic. Interestingly, though, the “unparsed query” in the 10053 (CBO) trace file is the same for the two cases with the only difference being the name of a generated view:


SELECT  
        "LAG_DATA"."COL1" "COL1","LAG_DATA"."COL2" "COL2" 
FROM    (SELECT 
                "TBL"."COL1" "COL1","TBL"."COL2" "COL2",
                DECODE(
                        COUNT(*) OVER ( ORDER BY "TBL"."COL1" ROWS  BETWEEN 1 PRECEDING  AND 1 PRECEDING ),
                        1, FIRST_VALUE("TBL"."COL2") OVER ( ORDER BY "TBL"."COL1" ROWS  BETWEEN 1 PRECEDING  AND 1 PRECEDING ),
                           NULL
                ) "COL2A" 
        FROM    (
                            (SELECT 1 "COL1",'a' "COL2" FROM "SYS"."DUAL" "DUAL") 
                 UNION ALL  (SELECT 2 "2",'a' "'A'" FROM "SYS"."DUAL" "DUAL") 
                 UNION ALL  (SELECT 3 "3",'b' "'B'" FROM "SYS"."DUAL" "DUAL") 
                 UNION ALL  (SELECT 4 "4",'a' "'A'" FROM "SYS"."DUAL" "DUAL") 
                 UNION ALL  (SELECT 5 "5",'a' "'A'" FROM "SYS"."DUAL" "DUAL") 
                 UNION ALL  (SELECT 6 "6",'b' "'B'" FROM "SYS"."DUAL" "DUAL") 
                 UNION ALL  (SELECT 7 "7",'b' "'B'" FROM "SYS"."DUAL" "DUAL")
                ) "TBL"
        ) "LAG_DATA" 
WHERE 
        "LAG_DATA"."COL2A" IS NULL OR "LAG_DATA"."COL2A"<>"LAG_DATA"."COL2" 
ORDER BY "LAG_DATA"."COL1"

The above is the unparsed query for the query with two factored subqueries; the only difference in the unparsed query when I moved the analytic subquery inline was that the view name in the above text changed from “LAG_DATA” to “from$_subquery$_008”.

Footnote:

When I used a real table (with the same data) instead of a “union all” factored subquery for the driving data this anomaly disappeared. The union all is a convenient dirty trick for generating very small test data sets on OTN – it remains to be seen whether a more realistic example of multiple factored subqueries would still result in the optimizer losing an opportunity for eliminating a “sort order by” operation.

In passing – did you notice how the optimizer had managed to rewrite a “lag()” analytic function as a form of “first_value()” function with decode ?

If you’re wondering about the choice of script name for this test – it’s because I created the script to see if I could produce a match_recognize() version of the requirement; I discovered the sort/order by anomaly as an accidental side-effect.

Update

The same anomaly appears in 18.1 (as tested on LiveSQL), although I have to say that I’m doing an “explain plan” on LiveSQL since it won’t yet let you pull a plan from the library cache so it’s possible that the actual plan and the predicted plan are not identical.

February 9, 2017

Index bouncy scan

Filed under: Oracle,Performance,Tuning — Jonathan Lewis @ 1:05 pm BST Feb 9,2017

There’s a thread running on OTN at present about deleting huge volumes of duplicated data from a table (to reduce it from 1.1 billion to about 22 million rows). The thread isn’t what I’m going to talk about, though, other than quoting some numbers from it to explain what this post is about.

An overview of the requirement suggests that a file of about 2.2 million rows is loaded into the table every week with (historically) no attempt to delete duplicates. As a file is loaded into the table every row gets the same timestamp, which is the sysdate at load time. I thought it would be useful to know how many different timestamps there were in the whole table.  (From an averaging viewpoint, 1.1 billion rows at 2.2 million rows per week suggests about 500 dates/files/weeks – or about 9.5 years – but since the table relates to “customer accounts” it seems likely that the file was originally smaller and has grown over time, which means the history may be rather longer than that.)

Conveniently there is an index on the “input_user_date” column in the table so we might feel happy running a query that simply does:


select
        distinct input_user_date
from
        customer_account
order by
        input_user_date
;

We might then refine the query to do a count(*) aggregate, or do some analytics to find any strange gaps in the timing of the weekly loads. However, all I’m really interested in is the number of dates because I’ve suggested we could de-duplicate the data by running a PL/SQL process that does a simple job for each date in turn, and I want to get an idea of how many times that job will run so that I can estimate how long the entire process might take.

The trouble with the basic query is that the table is (as you probably noticed) rather large, and so is the index. If we assume 8 bytes (which includes the length byte) for a date, 7 bytes for the rowid, 4 bytes overhead, and 100% packing we get about 420 index entries per leaf blocks, so with 1.1 billion entries the index is about 2.6 million leaf blocks. If the index had been built with compression (which means you’d only be recording a date once per leaf block) it would still be about 1.6 million leaf blocks. Fortunately we wouldn’t have to do much “real” sorting to report just a list of distinct values, or even the count(*) for each date, if we made Oracle use an index full scan – but it’s still a lot of work to read 1.6 million blocks (possibly using single block reads) and do even something as simple as a running count as you go. So I whipped up a quick and dirty bit of PL/SQL to do the job.

declare
        m_d1 date := to_date('01-Jan-0001');
        m_d2 date := to_date('01-Jan-0001');
        m_ct number := 0;
begin
        loop
                select
                        min(input_user_date)
                into
                        m_d2
                from
                        customer_account
                where
                        input_user_date > m_d1
                ;

                exit when m_d2 is null;

                m_ct := m_ct + 1;
                dbms_output.put_line('Count: ' || m_ct || '  Date: ' || m_d2);
                m_d1 := m_d2;

        end loop;
end;
/

The code assumes that the input_user_date hasn’t gone back to a silly date in the past to represent a “null date” (which shouldn’t exist anyway; if you want to use code like this but have a problem with a special “low-value” then you would probably be safest adding a prequel SQL that selects the min(columnX) where columnX is not null to get the starting value instead of using the a constant as I have done.

The execution path for the SQL statement should be an index-only: “index range scan (min/max)” which typically requires only 3 or 4 logical I/Os to find the relevant item for each date (which compares well with the estimated 2,200,000 / 420 = 5,238 leaf blocks we would otherwise have to scan through for each date). Here’s the path you should see:


--------------------------------------------------------------------------------------
| Id  | Operation                    | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |       |       |       |     3 (100)|          |
|   1 |  SORT AGGREGATE              |       |     1 |     8 |            |          |
|   2 |   FIRST ROW                  |       |     1 |     8 |     3   (0)| 00:00:01 |
|*  3 |    INDEX RANGE SCAN (MIN/MAX)| CA_I1 |     1 |     8 |     3   (0)| 00:00:01 |
--------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   3 - access("INPUT_USER_DATE">:B1)

I did build a little data set as a proof of concept – and produced a wonderful example of how the scale and the preceding events makes a difference that requires you to look very closely at what has happened. I used a table t1 in my example with a column d1, but apart from the change in names the PL/SQL block was as above.Here’s the code I used to create the data and prepare for the test:


create table t1 nologging
as
select
        trunc(sysdate) + trunc((rownum - 1)/100) d1,
        rpad('x',100)   padding
from
        all_objects
where
        rownum <= 50000
;

execute dbms_stats.gather_table_stats(user,'t1')
alter table t1 modify d1 not null;

create index t1_i1 on t1(d1) nologging pctfree 95
;

select index_name, leaf_blocks from user_indexes;

alter system flush buffer_cache;

alter session set events '10046 trace name context forever, level 8';

My data set has 500 dates with 100 rows per date, and the pctfree setting for the index gives me an average of about 8 leaf blocks per date (for a total of 4,167 leaf blocks). It’s only a small index so I’m expecting to see just 2 or 3 LIOs per date, and a total of about 500 physical reads (one per date plus a handful for reading branch blocks). Here’s the output from the running tkprof against the trace file:


SELECT MIN(D1)
FROM
 T1 WHERE D1 > :B1

call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        1      0.00       0.00          0          0          0           0
Execute    501      0.00       0.01          0          0          0           0
Fetch      501      0.08       0.18       4093       1669          0         501
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total     1003      0.09       0.19       4093       1669          0         501

Misses in library cache during parse: 1
Misses in library cache during execute: 1
Optimizer mode: ALL_ROWS
Parsing user id: 62     (recursive depth: 1)
Number of plan statistics captured: 1

Rows (1st) Rows (avg) Rows (max)  Row Source Operation
---------- ---------- ----------  ---------------------------------------------------
         1          1          1  SORT AGGREGATE (cr=3 pr=64 pw=0 time=9131 us)
         1          1          1   FIRST ROW  (cr=3 pr=64 pw=0 time=9106 us cost=3 size=8 card=1)
         1          1          1    INDEX RANGE SCAN (MIN/MAX) T1_I1 (cr=3 pr=64 pw=0 time=9089 us cost=3 size=8 card=1)(object id 252520)

I’ve done a physical read of virtually every single block in the index; but I have done only 3 buffer gets per date – doing fewer buffer gets than physical reads.

I’ve been caught by two optimisations (which turned out to be “pessimisations” in my test): I’ve flushed the buffer cache, so the Oracle runtime engine has decided to consider “warming up” the cache by reading extra blocks from any popular-looking objects that I’m accessing, and the optimizer may have given the run-time engine enough information to allow it to recognise that this index is subject to range scans and could therefore be a suitable object to use while warming up. As you can see from the following extracts from session events and session activity stats – we’ve done a load of multiblock reads through the index.


Event                                             Waits   Time_outs           Csec    Avg Csec    Max Csec
-----                                             -----   ---------           ----    --------    --------
db file sequential read                               1           0           0.03        .031           6
db file scattered read                              136           0          13.54        .100           1

Name                                                                     Value
----                                                                     -----
physical reads                                                           4,095
physical reads cache                                                     4,095
physical read IO requests                                                  137
physical reads cache prefetch                                            3,958
physical reads prefetch warmup                                           3,958

This isn’t likely to happen, of course, in the production system where we’ll be starting with a fully loaded cache and the leaf blocks we need are (logically) spaced apart by several thousand intervening blocks.

Footnote

I can’t remember who first brought this strategy to my attention – though I’m fairly sure it was one of my Russian colleagues [update: Sayan Malakshinov – see comments below], who has blogged about ways to work around what is effectively a limitation of the “index skip scan”. Apologies to the originator; and if you recognise your work here please add a comment with URL below.

November 22, 2016

Delete/Insert

Filed under: Oracle,Performance,Tuning — Jonathan Lewis @ 12:59 pm BST Nov 22,2016

Many of the questions that appear on OTN are deceptively simple until you start thinking carefully about the implications; one such showed up a little while ago:

What i want to do is to delete rows from table where it matches condition upper(CATEGORY_DESCRIPTION) like ‘%BOOK%’.

At the same time i want these rows to be inserted into other table.

The first problem is this: how carefully does the requirement need to be stated before you can decide how to address it? Trying to imagine awkward scenarios, or boundary conditions, can help to clarify the issue.

If you delete before you insert, how do you find the data to insert ?

If you insert before you delete, what happens if someone updates a row you’ve copied so that it no longer matches the condition. Would it matter if the update changes the row in a way that leaves it matching the condition (what you’ve inserted is not totally consistent with what you’ve deleted).

If you insert before you delete, and someone executes some DML that makes another row match the requirement should you delete it (how do you avoid deleting it) or leave it in place.

Once you start kicking the problem about you’ll probably come to the conclusion that the requirement is for the delete and insert to be self-consistent – in other words what you delete has to be an exact match for what you insert as at the time you inserted it. You’ll ignore rows that come into scope in mid-process due to other activity, and you’ll have to stop people changing rows that are being transferred (in case there’s an audit trail that subsequently says that there was, at some point in time, a row that matched the condition but never arrived – and a row that has arrived that didn’t match the final condition of the rows that disappeared).

Somehow your code needs to lock the set of rows to be transferred and then transfer those rows and eliminate them. There are two “obvious” and simple strategies – readers are invited to propose others (or criticise the two I – or any of the comments – suggest). I’ll start with a simple data setup for testing:

rem
rem     Script:         insert_delete.sql
rem     Author:         Jonathan Lewis
rem     Dated:          Nov 2016
rem     Purpose:
rem
rem     Last tested
rem

create table t1
as
select  object_id, object_name, owner
from    all_objects
;

alter table t1 add constraint t1_pk primary key(object_id);

create table t2
as
select  * from t1
where   rownum = 0
;

execute dbms_stats.gather_table_stats(user,'t1')
execute dbms_stats.gather_table_stats(user,'t2')

Option 1:

The simplest approach is often the best – until, perhaps, you spot the drawbacks – do a basic delete of the data to be transferred (which handles the locking) but wrap the statement in a PL/SQL block that captures the data (using the returning clause) and then inserts it into the target table as efficiently as possible. With thanks to Andrew Sayer who prompted this blog post:

declare
        type t1_rows is table of t1%rowtype;
        t1_deleted t1_rows;

begin
        delete from t1 where owner = 'SYSTEM'
        returning object_id, object_name, owner bulk collect into t1_deleted;

        forall i in 1..t1_deleted.count
                insert into t2 values t1_deleted(i);

        commit;
end;
/

The drawback to this, of course, is that if the volume to be transferred is large (where “large” is probably a fairly subjective measure) then you might not want to risk the volume of memory (PGA) it takes to gather all the data with the bulk collect.

Option 2:

For large volumes of data we could reduce the threat to the PGA by gathering only the rowids of the rows to be transferred (locking the rows as we do so) then do the insert and delete based on the rowids:

declare
        type rid_type is table of rowid;
        r rid_type;

        cursor c1 is select rowid from t1 where owner = 'SYSTEM' for update;

begin
        open c1;
        fetch c1 bulk collect into r;
        close c1;

        forall i in 1..r.count
                insert into t2 select * from t1 where rowid = r(i);

        forall i in 1..r.count
                delete from t1 where rowid = r(i);

        commit;
end;
/

Note, particularly, the “for update” in the driving select.

Inevitably there is a drawback to this strategy as well (on top of the threat that the requirement for memory might still be very large even when the return set is restricted to just rowids). We visit the source data (possibly through a convenient index and avoid visiting the table, [silly error deleted – see comment 8]) to collect rowids; then we visit the data again by rowid (which is usually quite efficient) to copy it, then we visit it again (by rowid) to delete it. That’s potentially a significant increase in buffer cache activity (especially latching) over the simple “delete returning” strategy; moreover the first strategy gives Oracle the option to use the index-driven optimisation for maintaining indexes and this method doesn’t. You might note, by the way, that you could include an “order by rowid” clause on the select; depending on your data distribution and indexes this might reduce the volume of random I/O you have to do as Oracle re-visits the table for the inserts and deletes.

We can address the PGA threat, of course, by fetching the rowids with a limit:


declare
        type rid_type is table of rowid;
        r rid_type;

        cursor c1 is select rowid from t1 where owner = 'SYSTEM' for update;

begin
        open c1;

--      dbms_lock.sleep(60);

        loop
                fetch c1 bulk collect into r limit 5;

                forall i in 1..r.count
                        insert into t2 select * from t1 where rowid = r(i);

                forall i in 1..r.count
                        delete from t1 where rowid = r(i);

                exit when r.count != 5;
        end loop;

        close c1; 

        commit;
end;
/

One thing to be aware of is that even though we fetch the rowids in small batches we lock  all the relevant rows when we open the cursor, so we don’t run into the problem of inserting thousands of rows into t2 and then finding that the next batch we select from t1 has been changed or deleted by another session. (The commented out call to dbms_lock.sleep() was something I included as a way of checking that this claim was true.) This doesn’t stop us running into a locking (or deadlocking) problem, of course; if it takes us 10 seconds to lock 1M rows in our select for update another user might manage to lock our millionth row before we get there; if, a few seconds later, it then gets stuck in a TX/6 wait trying to lock one of our locked rows after we start waiting in a TX/6 wait for our millionth row our session will time out after 3 further seconds with an ORA-00060 deadlock error.

The limit of 5 is just for demonstration purposes, of course – there were 9 rows in all_objects that matched the select predicate; in a production system I’d probably raise the limit as high as 255 (which seems to be the limit of Oracle’s internal array-processing).

You’ll notice, of course, that we can’t use this limited fetch approach with the delete command – the entire delete would take place as we opened the equivalent cursor and, though we can use the bulk collect with the returning clause, there is no syntax that allows something like the fetch with limit to take place.

Discarded option

My first thought was to play around with the AS OF SCN clause.  Select the current SCN from v$database and then do things like delete “as of scn”, or “select for update as of scn” – there were ways of getting close, but invariably I ended up running into Oracle error: “ORA-08187: snapshot expression not allowed here”. But maybe someone else can come up with a way of doing this that doesn’t add significant overheads and doesn’t allow for inconsistent results.

Update:

Following a comment from SydOracle1 below I’ve revisited the “as of SCN” approach and discovered that I had fooled myself into discarding the option too quickly. The results are in a follow-up blog.

May 25, 2016

CBO++

Filed under: CBO,Oracle,Tuning — Jonathan Lewis @ 1:23 pm BST May 25,2016

While browsing the web recently for articles on the HyperLogLog algorithm that Oracle uses for some of its approximate functions, I came upon a blog post written in Jan 2014 with the title Use Subqueries to Count Distinct 50X Faster. There are various ways that subqueries can be used to rewrite queries for improved performance but when the title caught my eye I couldn’t think of a way in which they could improve “count distinct”.  It turned out that the word “subquery” was being used (quite correctly) in the sense of “inline view” while my mind had immediately turned to subqueries in the select list or where clause.

The article started by pointing out that if you have a query that does a join then aggregates the result you might be able to improve performance by finding a way of rewriting the query to aggregate before doing the join. (See this note from 2008). The article then went one step further to optimise a “count distinct” by wrapping a “select count” around a “select distinct” inline view as follows:

Original
--------
  select
    dashboard_id,
    count(distinct user_id) as ct
  from time_on_site_logs 
  group by dashboard_id

Rewrite
-------
select 
    inline.dashboard_id, 
    count(1) as ct
  from (
    select distinct dashboard_id, user_id
    from time_on_site_logs
  ) as inline
  group by inline.dashboard_id

(I’ve reproduced only the central part of the query being examined and I’ve changed the name of the inline view to eliminate the potential visual confusion due to the word “distinct” appearing in its name in the original).

The article was written using Postgres SQL with the comment that the technique was universal – and this brings me to the point of the post. The technique can be applied to Oracle’s dialect of SQL. Both steps are good ideas whose effectiveness depends on the data patterns, data volume, and (potentially) indexing; but you may not need to rewrite the code because Oracle’s optimizer is programmed to know that both ideas are good and it can transform your query to the appropriate form internally. The “place group by” transformation appeared in 11.1.0.6 in 2007, and the “transform distinct aggregation” appeared in 11.2.0.1 in 2009.

Here’s a litte demo of Oracle handling a variation of the query I’ve shown above:


rem     Script: transform_distinct_agg.sql
rem     Dated:  May 2016
rem     Author: J.P.Lewis

create table t1 nologging 
as 
select  * 
from    all_objects 
where   rownum <= 60000
;
execute dbms_stats.gather_table_stats(user,'t1', method_opt=>'for all columns size 1')

alter session set statistics_level = all;

select owner, count(distinct object_type) from t1 group by owner;
select * from table(dbms_xplan.display_cursor(null,null,'allstats last outline'));

prompt  ===============
prompt  Rewritten query
prompt  ===============

select  owner, count(1)
from    (
         select distinct owner, object_type
         from   t1
        ) distinct_types
group by
        owner
;

select * from table(dbms_xplan.display_cursor(null,null,'allstats last outline'));

Here are the two execution plans, pulled from memory – with the outline and some other peripheral lines deleted:


-----------------------------------------------------------------------------------------------------------------------
| Id  | Operation            | Name      | Starts | E-Rows | A-Rows |   A-Time   | Buffers |  OMem |  1Mem | Used-Mem |
-----------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT     |           |      1 |        |      5 |00:00:00.23 |     865 |       |       |          |
|   1 |  HASH GROUP BY       |           |      1 |      5 |      5 |00:00:00.23 |     865 |  1452K|  1452K|  728K (0)|
|   2 |   VIEW               | VM_NWVW_1 |      1 |     78 |     30 |00:00:00.23 |     865 |       |       |          |
|   3 |    HASH GROUP BY     |           |      1 |     78 |     30 |00:00:00.23 |     865 |  4588K|  1708K| 2497K (0)|
|   4 |     TABLE ACCESS FULL| T1        |      1 |  60000 |  60000 |00:00:00.12 |     865 |       |       |          |
-----------------------------------------------------------------------------------------------------------------------

===============
Rewritten query
===============

------------------------------------------------------------------------------------------------------------------
| Id  | Operation            | Name | Starts | E-Rows | A-Rows |   A-Time   | Buffers |  OMem |  1Mem | Used-Mem |
------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT     |      |      1 |        |      5 |00:00:00.23 |     865 |       |       |          |
|   1 |  HASH GROUP BY       |      |      1 |      5 |      5 |00:00:00.23 |     865 |  1452K|  1452K|  735K (0)|
|   2 |   VIEW               |      |      1 |     78 |     30 |00:00:00.23 |     865 |       |       |          |
|   3 |    HASH UNIQUE       |      |      1 |     78 |     30 |00:00:00.23 |     865 |  4588K|  1708K| 1345K (0)|
|   4 |     TABLE ACCESS FULL| T1   |      1 |  60000 |  60000 |00:00:00.12 |     865 |       |       |          |
------------------------------------------------------------------------------------------------------------------

Apart from the change from HASH UNIQUE to HASH GROUP BY the two plans are the same using the same resources – the UNIQUE being a special case of the algorithm for the GROUP BY. Here (with some cosmetic editing) is the SQL of the “unparsed query” taken from the 10053 (CBO) trace file – notice how similar it is to the text suggested by the original article – in particular the inline view to get the distinct list of owner and object_type (using a group by with no aggregated columns rather than a distinct):

SELECT 
        VM_NWVW_1.$vm_col_2 OWNER,
        COUNT(VM_NWVW_1.$vm_col_1) COUNT(DISTINCTOBJECT_TYPE)
FROM    (
                SELECT
                        T1.OBJECT_TYPE $vm_col_1,
                        T1.OWNER $vm_col_2
                FROM    TEST_USER.T1 T1
                GROUP BY 
                        T1.OWNER,T1.OBJECT_TYPE
        ) VM_NWVW_1
GROUP BY
        VM_NWVW_1.$vm_col_2
;

The Oracle optimizer is pretty good at finding efficient transformations for the query you wrote so, rather than rewriting a query (with the associated risk of making a mistake as you do so), you may only need to add a couple of hints to generate a suitable SQL Plan Baseline that you can attach to the original query.

Footnote:

Sometimes the optimizer will decide not to transform when it should, or decide to transform when it shouldn’t, so it’s nice to know that there are hints to block transformations – here’s the effect of adding /*+ qb_name(main) no_transform_distinct_agg(main) */ to my query:


----------------------------------------------------------------------------------------------------------------
| Id  | Operation          | Name | Starts | E-Rows | A-Rows |   A-Time   | Buffers |  OMem |  1Mem | Used-Mem |
----------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |      1 |        |      5 |00:00:00.25 |     865 |       |       |          |
|   1 |  SORT GROUP BY     |      |      1 |      5 |      5 |00:00:00.25 |     865 |  4096 |  4096 | 4096  (0)|
|   2 |   TABLE ACCESS FULL| T1   |      1 |  60000 |  60000 |00:00:00.12 |     865 |       |       |          |
----------------------------------------------------------------------------------------------------------------

The interesting thing to note here is that even though this hinted version of the query took a little longer to complete the amount of memory allocated to run it in memory was only 4K compared to the 2M needed by the transformed query (where both workareas would have been in existence at the same time – which won’t necessarily be the case for every query using multiple workareas). The memory difference isn’t significant in this trivial case, but it demonstrates the point that sometimes there is no one best path – you can choose the path that protects the resource that’s under most pressure.

 

January 27, 2016

Add primary key.

Filed under: Indexing,Oracle,Troubleshooting,Tuning — Jonathan Lewis @ 9:07 am BST Jan 27,2016

I thought I had written this note a few years ago, on OTN or Oracle-L if not on my blog, but I can’t find any sign of it so I’ve decided it’s time to write it (again) – starting as a question about the following code:


rem
rem     Script: pk_overhead.sql
rem     Author: J.P.Lewis
rem     Dated:  Feb 2012
rem

create table t1
as
with generator as (
        select  rownum  id
        from            dual
        connect by
                        rownum <= 1000 --> comment to avoid WordPress format issue
)
select
        rownum                                  id,
        trunc((rownum-1)/50)                    clustered,
        mod(rownum,20000)                       scattered,
        lpad(rownum,10)                         vc_small,
        rpad('x',100,'x')                       vc_padding
from
        generator       g1,
        generator       g2
;

execute dbms_stats.gather_table_stats(user,'t1',method_opt=>'for all columns size 1')

alter system flush buffer_cache;

alter table t1 add constraint t1_pk primary key(id, scattered);


I’ve generated a table with 1,000,000 rows, including a column that’s guaranteed to be unique; then I’ve added a (two-column) primary key constraint to that table.

Because of the guaranteed unique column the call to add constraint will succeed. Because Oracle will automatically create a unique index to support that constraint it will have to do a tablescan of the table. So here’s the question: HOW MANY TIMES will it tablescan that table (and how many rows will it scan) ?

Space for thought …

The answer is three tablescans, 3 million rows.

Oracle will scan the table to check the validity of adding a NOT NULL definition and constraint for the id column, repeat the scan to do the same for the scattered column, then one final scan to accumulate the key data and rowids to sort and create the index.

Knowing this, you may be able to find ways to modify bulk data loading operations to minimise overheads.

The most recent version I’ve tested this on is 12.1.0.2.

See also: https://jonathanlewis.wordpress.com/2012/03/02/add-constraint/

Update – May 2016

The extra tablescans occur even if you have pre-existing check constraints (not declarations) on the columns to ensure that they are not null (i.e. things like: “alter table t1 add constraint t1_nn_id check (id is not null)”).

 

January 8, 2016

CTEs and Updates

Filed under: Execution plans,Oracle,Subquery Factoring,Tuning — Jonathan Lewis @ 1:01 pm BST Jan 8,2016

An important target of trouble-shooting, particularly when addressing performance problems, is to minimise the time and effort you have to spend to get a “good enough” result. A recent question on the OTN database forum struck me as a good demonstration of following this strategy; the problem featured a correlated update that had to access a view 84 times to update a small table; but the view was a complex view (apparently non-mergeable) and the update took several hours to complete even though the view, when instantiated, held only 63 rows.

The OP told us that the query “select * from view” took seven minutes to return those 63 rows, and wanted to know if we could find a nice way to perform the update in (approximately) that seven minutes, rather than using the correlated update approach that seemed to take something in the ballpark of 7 minutes per row updated.

Of course the OP could have given us all the details of the view definition, all the table and index definitions, with stats etc. and asked us if we could make the update run faster – but that could lead to a long and frustrating period of experimentation and testing, and a solution that might increase the general maintenance costs of the system (because a subsequent modification to the view might then have to be echoed into the code that did the update). Setting a strictly limited target that clearly ought to be achievable is (if nothing else) a very good starting point for improving the current situation.

I don’t know (as at the time of writing) if the OP implemented the strategy I suggested, but from his description it looked as if it should have been simple to use subquery factoring with materialization to achieve the required result in the most elegant way possible (meaning, in this case, simple SQL and no change to any surrounding code).

The OP has responded to my suggestion with a comment that “it didn’t work”, but it appeared to me that they were looking at and mis-interpreting the output from a call to “Explain Plan” rather than testing the query and pulling the plan from memory – so I thought I’d build a simple model to demonstrate the principle and show you how you could confirm (beyond just checking the clock) that the strategy had worked.

We start with a table to update, a non-mergeable view, and two tables to make up the non-mergeable view:


create table t1
as
select
        trunc((rownum-1)/15)    n1,
        trunc((rownum-1)/15)    n2,
        rpad(rownum,180)        v1
from
        dual
connect by
        level <= 3000
;


create table t2
as
select
        mod(rownum,200)         n1,
        mod(rownum,200)         n2,
        rpad(rownum,180)        v1
from
        dual
connect by
        level <= 3000;
create index t1_i1 on t1(n1);
create index t2_i1 on t2(n1);

begin
        dbms_stats.gather_table_stats(
                user,
                't1',
                method_opt => 'for all columns size 1'
        );

        dbms_stats.gather_table_stats(
                user,
                't2',
                method_opt => 'for all columns size 1'
        );
end;
/

create or replace view v1
as
select distinct
        t1.n1 t1n1, t1.n2 t1n2, t2.n2 t2n2
from
        t1, t2
where
        t1.n1 = t2.n1
;

create table t3
as
select * from v1
;

begin
        dbms_stats.gather_table_stats(
                user,
                't3',
                method_opt => 'for all columns size 1'
        );
end;
/

I’ve created the table t3 by copying the content of the view v1 and I’m going to update every row in t3 from v1; I gathered stats on t1 and t2 before creating the view and table simply to avoid the need for Oracle to do dynamic sampling as it created t3. Depending on your version of Oracle, of course, the stats collections might be redundant.

Having set the scene with the data, here’s the “original” code for doing the required update, followed by its execution plan (pulled from the memory of a 12.1.0.2 instance):


set serveroutput off
set linesize 180
set trimspool on

alter session set statistics_level = all;

spool cte_update

update t3
        set t2n2 = (
                select  v1.t2n2
                from    v1
                where   v1.t1n1 = t3.t1n1
                and     v1.t1n2 = t3.t1n2
        )
;

select * from table(dbms_xplan.display_cursor(null,null,'allstats last'));

---------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                                | Name  | Starts | E-Rows | A-Rows |   A-Time   | Buffers |  OMem |  1Mem | Used-Mem |
---------------------------------------------------------------------------------------------------------------------------------------
|   0 | UPDATE STATEMENT                         |       |      1 |        |      0 |00:00:01.22 |   46745 |       |       |          |
|   1 |  UPDATE                                  | T3    |      1 |        |      0 |00:00:01.22 |   46745 |       |       |          |
|   2 |   TABLE ACCESS FULL                      | T3    |      1 |    200 |    200 |00:00:00.01 |       3 |       |       |          |
|   3 |   VIEW                                   | V1    |    200 |      1 |    200 |00:00:01.22 |   46332 |       |       |          |
|   4 |    SORT UNIQUE                           |       |    200 |      1 |    200 |00:00:01.21 |   46332 |  2048 |  2048 | 2048  (0)|
|   5 |     NESTED LOOPS                         |       |    200 |      1 |  45000 |00:00:01.11 |   46332 |       |       |          |
|   6 |      NESTED LOOPS                        |       |    200 |      1 |  45000 |00:00:00.34 |    1332 |       |       |          |
|*  7 |       TABLE ACCESS BY INDEX ROWID BATCHED| T1    |    200 |      1 |   3000 |00:00:00.02 |     684 |       |       |          |
|*  8 |        INDEX RANGE SCAN                  | T1_I1 |    200 |     15 |   3000 |00:00:00.01 |     408 |       |       |          |
|*  9 |       INDEX RANGE SCAN                   | T2_I1 |   3000 |      1 |  45000 |00:00:00.11 |     648 |       |       |          |
|  10 |      TABLE ACCESS BY INDEX ROWID         | T2    |  45000 |      1 |  45000 |00:00:00.31 |   45000 |       |       |          |
---------------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   7 - filter("T1"."N2"=:B1)
   8 - access("T1"."N1"=:B1)
   9 - access("T2"."N1"=:B1)
       filter("T1"."N1"="T2"."N1")

Points to note from this execution plan: the VIEW operation at line 3 has started 200 times (there are 200 rows in table t3, the subquery runs once per row); and a simple measure of work done is the 46,745 buffer visits (of which, I can tell you, roughly 400 are current block gets) reported under Buffers in the top line of the plan.

It’s an interesting detail that although Oracle has pushed the correlation predicates inside the view (as shown by the predicate section for operations 7,8 and 9) it doesn’t report the operation at line 3 as “VIEW PUSHED PREDICATE”. It would be nice to see the explicit announcement of predicate pushing here, but that seems to be an expression reserved for pushing join predicates into views – fortunately we always check the predicate section, don’t we!

Now let’s see what the SQL and plan look like if we want Oracle to create the entire v1 result set and use that to update the t3 table.

update t3 
        set t2n2 = (
                with v0 as (
                        select
                                /*+ materialize */
                                t1n1, t1n2, t2n2
                        from v1
                )
                select
                        t2n2
                from
                        v0
                where   v0.t1n1 = t3.t1n1
                and     v0.t1n2 = t3.t1n2
        )
;

-----------------------------------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                   | Name                       | Starts | E-Rows | A-Rows |   A-Time   | Buffers | Reads  | Writes |  OMem |  1Mem | Used-Mem |
-----------------------------------------------------------------------------------------------------------------------------------------------------------------
|   0 | UPDATE STATEMENT            |                            |      1 |        |      0 |00:00:00.19 |    1185 |      1 |      1 |       |       |          |
|   1 |  UPDATE                     | T3                         |      1 |        |      0 |00:00:00.19 |    1185 |      1 |      1 |       |       |          |
|   2 |   TABLE ACCESS FULL         | T3                         |      1 |    200 |    200 |00:00:00.01 |       3 |      0 |      0 |       |       |          |
|   3 |   TEMP TABLE TRANSFORMATION |                            |    200 |        |    200 |00:00:00.18 |     778 |      1 |      1 |       |       |          |
|   4 |    LOAD AS SELECT           |                            |      1 |        |      0 |00:00:00.01 |     171 |      0 |      1 |  1040K|  1040K|          |
|   5 |     VIEW                    | V1                         |      1 |  45000 |    200 |00:00:00.01 |     168 |      0 |      0 |       |       |          |
|   6 |      HASH UNIQUE            |                            |      1 |  45000 |    200 |00:00:00.01 |     168 |      0 |      0 |  1558K|  1558K| 3034K (0)|
|*  7 |       HASH JOIN             |                            |      1 |  45000 |  45000 |00:00:00.01 |     168 |      0 |      0 |  1969K|  1969K| 1642K (0)|
|   8 |        TABLE ACCESS FULL    | T1                         |      1 |   3000 |   3000 |00:00:00.01 |      84 |      0 |      0 |       |       |          |
|   9 |        TABLE ACCESS FULL    | T2                         |      1 |   3000 |   3000 |00:00:00.01 |      84 |      0 |      0 |       |       |          |
|* 10 |    VIEW                     |                            |    200 |  45000 |    200 |00:00:00.17 |     603 |      1 |      0 |       |       |          |
|  11 |     TABLE ACCESS FULL       | SYS_TEMP_0FD9D6618_911FB4C |    200 |  45000 |  40000 |00:00:00.08 |     603 |      1 |      0 |       |       |          |
-----------------------------------------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   7 - access("T1"."N1"="T2"."N1")
  10 - filter(("V0"."T1N1"=:B1 AND "V0"."T1N2"=:B2))

The headline figure to note is that 1,185 Buffer visits – clearly we’ve done something very different (and possibly cheaper and faster, even in this tiny demonstration). Looking at operation 3 we see the “TEMP TABLE TRANSFORMATION”, which tells us that we’ve materialized our factored subquery. There is scope, though, for a little ambiguity and uncertainty – the Starts column for this operation says we started it 200 times, once for each row in t3. We might worry that we’ve actually recreated the result and written it to disc 200 times even though we might then notice that lines 4 – 9 tell us that we loaded the temporary table just once (Starts = 1).

You could take my word for it that we didn’t “do” the temp table transformation 200 time, we merely used the result of the temp table transformation 200 times; but I wasn’t prepared to make this assumption until I had done a little more checking, so there’s no reason why you shouldn’t still be a little suspicious. Lines 4 – 9 do seem to tell us (consistently) that we only load the data once, but there have been occasional bugs where counters have been reset to zero when they shouldn’t have been, so the fact that we see (for example, at operation 8) “1 full tablescan of t1 returning 3,000 rows after visiting 84 buffers” may mean that Oracle counted the work once and “forgot” to count it the other 199 times.

It’s easy enough to do a quick cross-check. Take a snapshot of v$mystat joined to v$statname before and after runnning the query, and check the difference in buffer visits, tablescans, and tablescan rows gotten – if those figures are broadly consistent with the figures in the execution plan I think we can be reasonably confident that the plan is telling us the truth.

Here’s what we get for a few key figures:

Name                                       Value
----                                       -----
session logical reads                      1,472
db block gets                                412
consistent gets                            1,060
consistent gets from cache                 1,060
db block changes                             410
table scans (short tables)                   205
table scan rows gotten                    46,213
table scan blocks gotten                     366

There are a number of oddities – not to mention version and feature dependent variations – in the numbers and a couple of discrepancies introduced by the code I was using to take the snapshot, but the “table scan rows gotten” figure is particularly easy to see in the execution plan:

46,213 = 3000 (t1) + 3000 (t2) + 200 (t3) + 200 * 200 (temp table)

With a small error the number of “table scans (short tables)” is also consistent with the plan Starts – and that’s perhaps the most important indicator, we scan t1 and t2 just once, and the temp table result 200 times. If we were creating the temp table 200 times we’d have to have done over 400 table scans (200 each for t1 and t2).

I won’t go into the details of how to compare the session logical I/O to the total Buffer gets for the plan – but the figures are in the right ballpark as far as matching is concerned – if the plan was deceiving us about the number of times the temporary table was created (rather than used) the session stats would have to report a figure more like 33,600 (200 * (84 + 84)) consistent gets.

Conclusion

We have managed to reduce the workload from “one view instantiation per row” to “one view instantiation” with a very small change to the SQL. In the case of the OP this should result in a small, easily comprehensible, change in the SQL statement leading to a drop in run-time from several hours to seven minutes – and maybe that’s good enough for the present.

Reference Script: cte_update.sql

 

Next Page »

Powered by WordPress.com.