Oracle Scratchpad

November 3, 2015

Nul points

Filed under: Infrastructure,Oracle,Troubleshooting — Jonathan Lewis @ 8:16 am BST Nov 3,2015

(To understand the title, see this Wikipedia entry)

The title could also be: “Do as I say, don’t do as I do”, because I want to remind you of an error that I regularly commit in my demonstrations. Here’s an example:

 
SQL> create table t (n number); 

Table created 

Have you spotted the error yet ? Perhaps this will help:

SQL> insert into t select 1 - 1/3 * 3 from dual; 

1 row created. 

SQL> insert into t select 1 - 3 * 1/3 from dual; 

1 row created. 

SQL> column n format 9.99999999999999999999999999999999999999999 
SQL> select * from t; 

                                           N
--------------------------------------------
  .00000000000000000000000000000000000000010
  .00000000000000000000000000000000000000000

2 rows selected. 

Spotted the error yet ? If not then perhaps this will help:

SQL> select * from dual where 3 * 1/3 = 1/3 * 3;

no rows selected 

SQL> select * from dual where 3 * (1/3) = (1/3) * 3; 

D
-
X

1 row selected. 

Computers work in binary, people (tend to) work in decimal. 10 = 2 * 5, and 5 (more precisely, dividing by 5) is something that a computer cannot do accurately. So when you do arbitrary arithmetic you should use some method to deal with tiny rounding errors.

In Oracle this means you ought to define all numbers with a precision and scale. Look on it as another form of constraint that helps to ensure the correctness of your data as well as improving performance and reducing wasted storage space.

October 12, 2015

Read Consistency

Filed under: Infrastructure,Oracle — Jonathan Lewis @ 7:25 pm BST Oct 12,2015

I posted a note a few days ago about read consistency, the Cross Session PL/SQL Function Result Cache, deterministic functions, and scalar subqueries. The intent of the article was to make clear the point that while you might think that declaring a PL/SQL function to be deterministic or in the PL/SQL Result Cache might make a query that calls the function perform faster, if that function contained its own SQL statement then your code might not be producing self-consistent results and (even worse) if you had used the Result Cache option your code might actually cause other session to get wrong results if you tried to “set transaction read only” or “alter session set isolation_level = serializable”

It occurred to me a couple of days after writing that article that perhaps there’s a much more basic point I ought to make as well. I’ll make it about PL/SQL but it’s something that applies across all programming languages – it merits being italicised and emphasised, and possibly even the addition of 5 exclamation marks:

Any time you execute more than one SQL statement in a PL/SQL procedure the results of executing that procedure may not be self-consistent unless you have explicitly locked your session SCN to a fixed value!!!!!

Similarly: if you run a report from SQL*Plus which goes: “select this, select that, select the total of the other” – your report may not be self-consistent; if you have a web-application which goes: “select the customer balance, select the customer orders outstanding, select the customer credit notes” – your screenful may not be self-consistent.

By default Oracle operates at statement-level read-consistency for SQL. In a multi-user system if you run two SQL statements and someone else changes the data and commits between your two statement executions then the result from the second statement may not be consistent with the result from the first. Here’s a sillly little bit of code you can use to demonstrate the concept:

set serveroutput on

create table t1
nologging
as
with generator as (
        select  --+ materialize
                rownum id
        from dual
        connect by
                level <= 1e4 
) 
select 
        rownum id,    
        rownum n1,
        rownum n2,
        rpad('x',10,'x')  small_vc,
        rpad('x',100,'x') padding 
from 
        generator v1 
; 


create table t2 ( 
        id, 
        v1, 
        constraint t2_pk primary key(id) 
) 
organization index 
as 
select 
        cast(1 as number (6,0)), 
        cast('one' as varchar2(10)) 
from dual 
;
 
begin 
        dbms_stats.gather_table_stats(
                ownname          => user,
                tabname          =>'T1',
                method_opt       => 'for all columns size 1'
        );

        dbms_stats.gather_table_stats(
                ownname          => user,
                tabname          =>'T2',
                method_opt       => 'for all columns size 1'
        );
end;
/

/*
   ----------------------------------------------------
   Table t3 only relevant to "serializable" experiments
   ----------------------------------------------------

create table t3 (
        id      number,
        v1      varchar2(10),
        padding varchar2(100) default rpad('x',100,'x')
)
initrans 4
;

alter table t3 add constraint t3_pk primary key(id) using index(
        create unique index t3_pk on t3(id) initrans 4
)
;

insert into t3 values (0,'0',rpad('x',100,'x'));
commit;

begin
        dbms_stats.gather_table_stats(
                ownname          => user,
                tabname          =>'T3',
                method_opt       => 'for all columns size 1'
        );
end;
/

   ----------------------------------------------------
   End of optional t3 definition
   ----------------------------------------------------
*/

prompt  ===========================================================
prompt  Four second pause - only needed to avoid flashback problems
prompt  ===========================================================

execute dbms_lock.sleep(4)

-- ------------------------------------------------------------------------------
-- Options for "fixing" the SCN - plenty of scope for ORA-08177 when serializable
-- ------------------------------------------------------------------------------

-- set transaction read only;
-- execute dbms_flashback.enable_at_time(systimestamp)
-- execute dbms_flashback.enable_at_system_change_number(dbms_flashback.get_system_change_number)
-- alter session set isolation_level = serializable;

prompt  =====================
prompt  Starting PL/SQL block
prompt  =====================

declare
        m_v1    varchar2(10);
begin
        for r in (select * from t1 where rownum <= 10) loop

                select  v1
                into    m_v1
                from    t2
                where   t2.id = r.n2 - r.n1 + 1
                ;

--              insert into t3 values(r.id, m_v1, rpad('x',100));

                dbms_output.put_line(r.id || ' - ' || m_v1);
                dbms_lock.sleep(1);

        end loop;

end;
/

execute dbms_flashback.disable;
commit;

--
--      Code to be run from a second session once the 
--      first session displays the "loop running" banner
--

begin
        for r in 1..20 loop
                update t2 set v1 = lpad(r,10,0);
                commit;
                dbms_lock.sleep(0.5);
        end loop;
end;
.

The code covers several different tests – the basic test and the three “read-only” tests don’t need table t3 at all, which exists only so that we have something to insert into (and show a possible ORA-08177) for the serializable test. If you do the serializable test you might want to note what happens if you eliminate the initial insert of id = 0; you may also want to increase the number of rows selected, the number of cycles through the update loop, and the sleep times.

In principle the code emulates a classic “nested loop join done in SQL”, with one row in table t2, and a CURSOR FOR LOOP select from t1 being used to drive a select from t2 “for each row”. The peculiar predicate “t2.id = r.n2 – r.n1 + 1” inside the loop is an attempt to minimise the risk of any future release of the PL/SQL optimizer getting so clever that it puts the “constant” query for “t2.id = 1” outside the loop.

As the main loop cycles once per second, selecting the same row from t2 on each cycle, another session updates the row and commits twice every second. You might have thought that the main loop would keep returning the same value on every cycle – but each select starts at a new SCN. If you don’t take some sort of defensive action (set transaction read only, dbms_flashback) then you’ll get output similar to the following:


=================
Four second pause
=================

PL/SQL procedure successfully completed.

=====================
Starting PL/SQL block
=====================
1 - one
2 - 0000000002
3 - 0000000004
4 - 0000000006
5 - 0000000008
6 - 0000000010
7 - 0000000012
8 - 0000000014
9 - 0000000016
10 - 0000000018

PL/SQL procedure successfully completed.

I have seen many sites where this type of code is used. Typically it’s justified because the referenced table is a “very static” code table or a table that’s not supposed to change while the main loop is running – but eventually Murphy’s law will raise its ugly head.

(My favourite invocation of Murphy’s law came from my A-level physics teacher with respect to experimentation: “Constants aren’t and variables don’t”.)

September 28, 2015

Result Cache 2

Filed under: 12c,Flashback,Oracle,Performance — Jonathan Lewis @ 8:50 am BST Sep 28,2015

Following on from my earlier posting of problems with temporary table and the PL/SQL result cache (a combination which the manuals warn you against) here’s another problem – again, to a large degree, self-inflicted.

Imagine you have a complex report involving a large number of financial transactions with a need to include calculations about current exchange rates. Unfortunately the rules about calculating the appropriate exchange rate for any transaction are complex and you find you have a choice between adding 6 tables with outer joins and a couple of aggregate (max) subqueries to the base query or calling a PL/SQL function to calculate the exchange rate for each row. I’m going to create an extremely simplified model of this requirement:

create table t1
nologging
as
with generator as (
        select  --+ materialize
                rownum id
        from dual
        connect by
                level <= 1e4
)
select
        rownum                  id,
        case mod(rownum,2) when 0 then 'USD' else 'GBP' end v1,
        case mod(rownum,2) when 0 then 'GBP' else 'USD' end v2
from
        generator       v1
where
        rownum <= 1e3
;

create table t2 (
        v1      varchar2(3),
        v2      varchar2(3),
        cvt     number(10,6),
        constraint t2_pk primary key(v1,v2)
)
organization index
;

insert into t2 values('USD','GBP',0);
insert into t2 values('GBP','USD',1);

commit;

create or replace function plsql_ordinary(
        from_cur        in varchar2,
        to_cur          in varchar2
)
return number is
        m_ret number;
begin
        select /*+ ordinary trace_this */
                cvt
        into    m_ret
        from    t2
        where   v1 = from_cur
        and     v2 = to_cur
        ;

        return m_ret;

end plsql_ordinary;
/

execute dbms_stats.gather_table_stats(user,'t2')

My t1 table represents the set of transactions but only has to worry about two exchange rates, the dollar/sterling and its inverse. My t2 table is an exchange rate table and I’ve loaded it with the two exchange rates I’m interested in. My function plsql_ordinary() takes two currency codes as inputs and returns the exchange rate.

Here’s the modelling query, with a little infrastructure to examine the workload. Note that this will only run on 12c because of the inline PL/SQL function I’ve used for slowing the query down.

set pagesize 20
set arraysize 6

set serveroutput off
alter system flush shared_pool;
alter session set statistics_level = all;

with
        function wait_row_n (
                i_secs          number,
                i_return        number default -1
        ) return number
        is
        begin
                dbms_lock.sleep(i_secs);
                return i_return;
        end wait_row_n;
select
        /*+ driver trace_this */
        wait_row_n(0.3,id),
        plsql_ordinary(v1,v2),
        (select /*+ scalar trace_this */ t2.cvt from t2 where t2.v1 = t1.v1 and t2.v2 = t1.v2) scalar_sub
from
        t1
where
        rownum <= 50
;

set pagesize 40

select * from table(dbms_xplan.display_cursor(null,null,'allstats last'));

select  sql_id, executions, fetches, rows_processed, sql_text
from    v$sql
where   lower(sql_text) like 'select%trace_this%'
and     lower(sql_text) not like '%v$sql%'
;

The query includes a scalar subquery in the select list to get the same data as the PL/SQL function, and you’ll see the point of that in a while. Because of the arraysize and input parameters to wait_row_n() the query will produce output in batches of 6 rows roughly every two seconds for a total of about 18 seconds – which will give me plenty of time to play around in another session. Before I try to do any damage, though, let’s check the execution plan of the report and the statistics of the queries with “trace_this” in their text:


--------------------------------------------------------------------------------------
| Id  | Operation          | Name  | Starts | E-Rows | A-Rows |   A-Time   | Buffers |
--------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |       |      1 |        |     50 |00:00:00.01 |      11 |
|*  1 |  INDEX UNIQUE SCAN | T2_PK |      2 |      1 |      2 |00:00:00.01 |       2 |
|*  2 |  COUNT STOPKEY     |       |      1 |        |     50 |00:00:00.01 |      11 |
|   3 |   TABLE ACCESS FULL| T1    |      1 |     50 |     50 |00:00:00.01 |      11 |
--------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - access("T2"."V1"=:B1 AND "T2"."V2"=:B2)
   2 - filter(ROWNUM<=50)


SQL_ID        EXECUTIONS    FETCHES ROWS_PROCESSED
------------- ---------- ---------- --------------
SQL_TEXT
-----------------------------------------------------------------------------
f1bz07bk5rbth         50         50             50
SELECT /*+ ordinary trace_this */ CVT FROM T2 WHERE V1 = :B2 AND V2 = :B1

Looking at operation 1 in the plan you can see that the scalar subquery has started just twice (once for each distinct combination of currency codes).
Looking at the results from v$sql you can see that the query in the PL/SQL function was executed 50 times – once for each row. Functions like this can be a massive drain of resources (typically CPU, but also through latch contention on buffer cache and shared pool latches).

Conveniently we realise that in our system the derived exchange rates change very slowly – so how about telling Oracle that the exchange rate function is a deterministic function (it’s nearly true), or better still, perhaps, experiment with the PL/SQL Function Result Cache.

(Note very carefully, however, that the Database Administrators’ Manual for 11.2 makes the following comment about using the deterministic keyword with PL/SQL functions)

DETERMINISTIC

Tells the optimizer that the function returns the same value whenever it is invoked with the same parameter values (if this is not true, then specifying DETERMINISTIC causes unpredictable results).

Look carefully at that “unpredictable” – it’s true … but you might not realise it until too late. Our PL/SQL function is NOT deterministic – after all a function that queries the database to produce a result may produce a different result every time it executes if someone keeps changing the underlying data – but we might wave our hands a bit and say that the table isn’t going to change while we’re running our report so it’s okay to pretend it’s deterministic, we might even make it a read-only table for the duration. Similar thoughts should cross our minds about declaring a PL/SQL function to the result cache – even though the manual doesn’t say anything quite so explicitly threatening about the result cache. But what the heck, let’s just do it and see what happens:

create or replace function plsql_result_cache(
        from_cur        in varchar2,
        to_cur          in varchar2
)
return number
result_cache
is
        m_ret number;
begin
        select /*+ result cache trace_this */
                cvt
        into    m_ret
        from    t2
        where   v1 = from_cur
        and     v2 = to_cur
        ;

        return m_ret;

end plsql_result_cache;
/

create or replace function plsql_deterministic(
        from_cur        in varchar2,
        to_cur          in varchar2
)
return number
deterministic
is
        m_ret number;
begin
        select /*+ det trace_this */
                cvt
        into    m_ret
        from    t2
        where   v1 = from_cur
        and     v2 = to_cur
        ;

        return m_ret;

end plsql_deterministic;
/

...
select
        /*+ driver trace_this */
        wait_row_n(0.3,id),
        plsql_ordinary(v1,v2),
        plsql_deterministic(v1,v2),
        plsql_result_cache(v1,v2),
        (select /*+ scalar trace_this */ t2.cvt from t2 where t2.v1 = t1.v1 and t2.v2 = t1.v2) scalar_sub
from
...

All three functions returned the same set of results as the scalar subquery – and here’s the output from v$sql showing the impact of declaring a deteministic function and a result cache function (note that “result cache” is not a hint in the first statement, it’s just a convenient label):


SQL_ID        EXECUTIONS    FETCHES ROWS_PROCESSED
------------- ---------- ---------- --------------
SQL_TEXT
-----------------------------------------------------------------------------------
49buxp3gba3cg          2          2              2
SELECT /*+ result cache trace_this */ CVT FROM T2 WHERE V1 = :B2 AND V2 = :B1

2sh7bm59dkwhw         18         18             18
SELECT /*+ det trace_this */ CVT FROM T2 WHERE V1 = :B2 AND V2 = :B1

f1bz07bk5rbth         50         50             50
SELECT /*+ ordinary trace_this */ CVT FROM T2 WHERE V1 = :B2 AND V2 = :B1


The simple function ran 50 times, the deteministic function ran 18 times, and the result cache function ran twice. It required just two executions to get the two distinct results needed, after which the session used the result cache rather than calling the function again.

The deterministic function only remembers its results for the duration of a single database call – in this case the fetch – so on each fetch the session has to re-populate the session’s “deterministic cache”, which takes 2 calls for each fetch, a total of 9 fetches * 2 calls = 18 calls.

Clearly, if I can get away with it safely, the PL/SQL function result cache looks like a real winner, with the deterministic option coming a close second (given that real life ought to be using a significantly larger fetch arraysize). So what could possibly go wrong ? Two things – first, the results … and if the results can go wrong there’s not much point in talking about the second thing at the moment.

My query runs for 18 seconds, I’m going to start another session while it runs and update one of the rows in the t2 table a few seconds after my long-running query starts. Here’s the SQL I’ve run, an extract from the query output, and the results from v$sql:


update  t2 set
        cvt = 2
where   v1 = 'USD' 
;

commit;


WAIT_ROW_N(0.3,ID) PLSQL_ORDINARY(V1,V2) PLSQL_DETERMINISTIC(V1,V2) PLSQL_RESULT_CACHE(V1,V2) SCALAR_SUB
------------------ --------------------- -------------------------- ------------------------- ----------
                 1                     1                          1                         1          1
                 2                     0                          0                         0          0
                 3                     1                          1                         1          1
                 4                     0                          0                         0          0
                 5                     1                          1                         1          1
                 6                     0                          0                         0          0
                 7                     1                          1                         1          1
                 8                     0                          0                         0          0
                 9                     1                          1                         1          1
                10                     2                          0                         2          0
                11                     1                          1                         1          1
                12                     2                          0                         2          0
                13                     1                          1                         1          1
                14                     2                          2                         2          0
                15                     1                          1                         1          1
                16                     2                          2                         2          0


SQL_ID        EXECUTIONS    FETCHES ROWS_PROCESSED
------------- ---------- ---------- --------------
SQL_TEXT
-----------------------------------------------------------------------------------
49buxp3gba3cg          4          4              4
SELECT /*+ result cache trace_this */ CVT FROM T2 WHERE V1 = :B2 AND V2 = :B1

2sh7bm59dkwhw         18         18             18
SELECT /*+ det trace_this */ CVT FROM T2 WHERE V1 = :B2 AND V2 = :B1

f1bz07bk5rbth         50         50             50
SELECT /*+ ordinary trace_this */ CVT FROM T2 WHERE V1 = :B2 AND V2 = :B1

The most important point is that we’ve got results that are not self-consistent – except for the scalar subquery results.

The SQL statements that are executed inside the PL/SQL functions do not run at the SCN of the driving query, each individual statement executes at its own starting SCN. This is an important point that is often overlooked when people write PL/SQL functions that are then called from SQL. The inline scalar subquery, on the other hand, always runs as at the start SCN of the driving query no matter how many times or how late in the lifetime of the driving query it runs.

If we examine the results we can see that the ordinary PL/SQL function and the result cache PL/SQL function changed their output the moment the commit took place (you’ll have to take my word on that, of course), but the deterministic function seemed to delay slightly. We can also see that the number of executions for the ordinary and deterministic functions didn’t change, but the result cache function doubled its executions.

Because of the way I’ve created my data and defined the function, the ordinary function executes its SQL once every row while the deterministic function executes its SQL twice every fetch (once for each pair of input values (though the first fetch from SQL*Plus is a special case) and then remembers the outputs for the rest of the fetch – this is why there is a delay before the deterministic function changes its output and doesn’t introduce any extra calls to the SQL – it was going to re-execute for both values on the fetch starting at id 13 whatever went on around it; the result cache function gets an invalidation warning the moment the other session commits, so re-executes its SQL as many times as needed to repopulate the bits of the cache that have been discarded – and that’s why we see the number of SQL calls doubling, the relevant part of the cache was identified by some sort of hash value for the statement with SQL_ID = ’49buxp3gba3cg’ so both results were discarded and reloaded even though only one of them actually had to change.

Critically every execution of the recursive statements runs at the then-current SCN – so when the underlying data changes our report sees those changes, the report is not self-consistent.

Fortunately there’s something we can do about this – if we want the whole report to operate at the same SCN all we need to do is freeze our session at a specific point in time with the command “set transaction read only;”. This has to be executed as the first statement of a transaction but if we can slip it in just before we start running our report we can ensure that all our SQL statements (including the recursive ones) run as at the same SCN until we issue a commit. When I repeated the data-changing experiment after setting the transaction read only the report ran to completion showing the correct results.

But there was a significant change in the output from v$sql:


SQL_ID        EXECUTIONS    FETCHES ROWS_PROCESSED
------------- ---------- ---------- --------------
SQL_TEXT
-----------------------------------------------------------------------------------
49buxp3gba3cg         44         44             44
SELECT /*+ result cache trace_this */ CVT FROM T2 WHERE V1 = :B2 AND V2 = :B1

2sh7bm59dkwhw         18         18             18
SELECT /*+ det trace_this */ CVT FROM T2 WHERE V1 = :B2 AND V2 = :B1

f1bz07bk5rbth         50         50             50
SELECT /*+ ordinary trace_this */ CVT FROM T2 WHERE V1 = :B2 AND V2 = :B1

I did my update just after the first batch of rows had appeared on screen – notice how the result cache SQL has executed 44 times instead of 2 (or 4) times. When I set my transaction to read only it looks as if my session stopped using the result cache the moment the other session commited – and that’s a GOOD THING. If my session were able to continue using the result cache that would mean one of two things, either I would be seeing a result created by another user – which would be too new for me, or every other session would be seeing the results from my session – which would (typically) be out of date for them. The session seems to have protected itself from the risk of a result cached PL/SQL function producing inconsistent results – but the workload changed the moment another session committed a change to the data we were interested in.

At that point I stopped investigating “set transaction read only” until a couple of days later when I realised that there was one more thing I had to test – when I changed the data from another session I didn’t check to see what that session would see when it executed the cached function, so I modified the code run by the other session to do the following:


update  t2 set 
        cvt = 2 
where   v1 = 'USD' 
;

commit;

execute dbms_lock.sleep(6)
execute dbms_output.put_line(plsql_result_cache('USD','GBP'))

The other session goes to sleep for a while (to let the reporting session get through a little more work) and then calls the function. I was very pleased to see that the session returned the correct result – the value 2 that it had just written to the table. But what I got from the reporting session wasn’t so good:

WAIT_ROW_N(0.3,ID) PLSQL_ORDINARY(V1,V2) PLSQL_DETERMINISTIC(V1,V2) PLSQL_RESULT_CACHE(V1,V2) SCALAR_SUB
------------------ --------------------- -------------------------- ------------------------- ----------
                 1                     1                          1                         1          1
                 2                     0                          0                         0          0
                 3                     1                          1                         1          1
                 4                     0                          0                         0          0
                 5                     1                          1                         1          1
                 6                     0                          0                         0          0
                 7                     1                          1                         1          1
                 8                     0                          0                         0          0
...
                24                     0                          0                         0          0
                25                     1                          1                         1          1
                26                     0                          0                         0          0
                27                     1                          1                         1          1
                28                     0                          0                         0          0
                29                     1                          1                         1          1
                30                     0                          0                         2          0
                31                     1                          1                         1          1
                32                     0                          0                         2          0

SQL_ID        EXECUTIONS    FETCHES ROWS_PROCESSED SQL_TEXT
------------- ---------- ---------- -------------- --------------------------------
49buxp3gba3cg         32         32             32 SELECT /*+ result cache trace_th
                                                   is */ CVT FROM T2 WHERE V1 = :B2
                                                    AND V2 = :B1

49buxp3gba3cg          1          1              1 SELECT /*+ result cache trace_th
                                                   is */ CVT FROM T2 WHERE V1 = :B2
                                                    AND V2 = :B1

2sh7bm59dkwhw         18         18             18 SELECT /*+ det trace_this */ CVT
                                                    FROM T2 WHERE V1 = :B2 AND V2 = 
                                                    :B1    

f1bz07bk5rbth         50         50             50 SELECT /*+ ordinary trace_this * 
                                                   / CVT FROM T2 WHERE V1 = :B2 AND
                                                    V2 = :B1

I changed t2 just after the first batch of rows had appeared (just after id 6), then called the function a few seconds later – and as I called the function from the other session it queried the data (the second child to 49buxp3gba3cg, executed just once above) and reloaded the result cache. At that moment (visible at id 30) the first session found data in the result cache and stopped re-executing its queries. When my session went read only it protected other sessions from the past by NOT re-populating the result cache as it ran its queries – but if it found something in the result cache it used it (notice how it has recorded 32 executions of the query rather than 44 – it found something in the result cache on the last 12 calls of the function). The protection only goes one way.

Observations

Using PL/SQL functions in the select list to encapsulate complex calculations that query the database is not something you can do casually. You have no guarantee that you will end up with a self-consistent result unless you take action to deal with the threats introduced by concurrent activity – ideally all tables involved should be set to read-only (which is only possible in 12c [Ed: see comment below] became possible from 11.1 onwards, though you can set a whole tablespace readonly in earlier versions: neither strategy is commonly viable). If you decide that you can work around those threats you still have the performance threat implied by the need to do some complex work for each driving row of your result set. For a report the simple solution to consistency is to “freeze” the session SCN by setting your session (transaction) into read only mode.

Once you’ve dealt with the consistency problem you can start to address the performance problen by claiming that you were using deterministic functions. You might be tempted to use the PL/SQL Result Cache to give you an even larger performance boost, but if you do you really have to make the table (or tablespace) read-only to be protected from read-consistency problems. The deterministic strategy may not be as dramatic in its effects as the result cache strategy but, given a reasonably sized fetch array, the performance benefit you get may be all you really need.

Whatever else you do, there’s an important underlying threat to remember. The moment you create a PL/SQL function that uses the result cache or deterministic option you have to ensure that nobody uses that function without ensuring that their code has handled the consistency threat properly. It’s too easy to forget, with the passing of time, that certain procedures have to be adopted when particular coding strategies are used.

Left as Exercise

I was planning to write a further article going into some detail about using dbms_flashback.enable_at_time(systimestamp) instead of “set transaction read only” – a mechanism that might be used to achieve the same read-consistency requirement though, perhaps, used less frequently than the older, less flexible option. I was also planning to include notes in the same araticle about the effect of “alter session set isolation_level = serializable” which some systems probably use to get read-consistency across multiple statements while writing results back to the database.

Both strategies run into the same issue as “set transaction read only”, with the added effect that your session (the one that has tried to “fix” its SCN) will repopulate the cache, so not only could you see newer results from other sessions in the cache; other sessions could see older results because you put them into the cache.

I’m not going to write up these demonstrations (which require fairly simple modifications to the code supplied above) as all the phenomena have been recorded as bugs on MoS (along with the GTT problem from my previous post, and a documentation bug for the PL/SQL manual to clarify the various threats):

  • Bug 21905592 : PL/SQL RESULT CACHE MUST BE BYPASSSED WHEN SESSION SCN NOT CURRENT
  • Bug 21905695 : BAD X-SESSION RESULTS W/ RESULT-CACHED PL/SQL FN AND “SET TXN READ ONLY”
  • Bug 21907155 : PL/SQL RC FN SHOWS ONE SESSION THE CONTENTS OF ANOTHER SESSION’S GTT
  • Bug 21885173 : THE LNPLS SECTION “RESTRICTIONS ON RESULT-CACHED FUNCTIONS” NEEDS RE-WRITE

Footnote

I haven’t mentioned it here, but another strategy for reducing the number of PL/SQL calls is simply to wrap the function call inside a scalar subquery, along the lines of:

select
       (select plsql_ordinary(v1, v2) from dual),
       ...

Provided you don’t have many combinations of (v1,v2) to handle, and with a little luck with Oracle’s internal hashing function, you could find that scalar subquery caching reduces your execution count from once per row to once per combination. Note that the function is the “ordinary” function, not the “fake deterministic” one; internally Oracle uses the same hashing strategy for remembering the results, but the duration of the scalar subquery cache is the statement rather than the fetch.

 

September 22, 2015

Result Cache

Filed under: 12c,Bugs,Infrastructure,Oracle,Performance — Jonathan Lewis @ 10:11 am BST Sep 22,2015

Yesterday I thought I’d spend half an hour before breakfast creating a little demonstration of a feature; some time about midnight I felt it was time to stop because I’d spent enough time chasing around a couple of bugs that produced wrong results in a variety of ways. Today’s short post is just little warning: be VERY careful what you do with the PL/SQL result cache – if you use the results of database queries in the cache you may end up with inconsistent results in your application. Here’s one very simple example of what can go wrong, starting with a little script:


alter system flush shared_pool;

truncate table gtt1;
drop table gtt1;
create global temporary table gtt1 (n1 number)
on commit preserve rows
;

insert into gtt1 values(1);
commit;

execute dbms_stats.gather_table_stats(user,'gtt1');


create or replace function f_cache
return number 
result_cache 
-- relies_on (gtt1)
is
        m_ret number;
begin
        select  max(n1)
        into    f_cache.m_ret
        from    gtt1
        ;

        return f_cache.m_ret;
end;
/

execute dbms_output.put_line(f_cache)

Here’s the output from a session that’s just connected and run this script (the table already existed from an earlier run):

SQL> @temp

System altered.


Table truncated.


Table dropped.


Table created.


1 row created.


Commit complete.


PL/SQL procedure successfully completed.


Function created.

1

PL/SQL procedure successfully completed.

SQL> 

Let’s call this session A, and I’m going to connect with another session which I’ll call session B. The following extracts are cut-and-pastes as I alternate between sessions:

Session B:

SQL> execute dbms_output.put_line(f_cache);
1

PL/SQL procedure successfully completed.

SQL> insert into gtt1 values(0);

1 row created.

SQL> execute dbms_output.put_line(f_cache);
0

PL/SQL procedure successfully completed.


Session B has just seen the data inserted into a global temporary table by session A; but after inserting a row into the table it now sees what it perceives to be the correct answer.

Session A:

SQL> truncate table gtt1;

Table truncated.

SQL> execute dbms_output.put_line(f_cache);
1

PL/SQL procedure successfully completed.

Despite truncating the table, session A still sees the data which has been eliminated unrecoverably.

Session B (where I hadn’t yet committed):


SQL> commit;

Commit complete.

SQL> 

Session A (where I’ve done nothing new):

SQL> execute dbms_output.put_line(f_cache);


PL/SQL procedure successfully completed.

SQL> 

The row has finally “disappeared” because session B committed.

Session B (where I haven’t done anything since committing):


SQL> execute dbms_output.put_line(f_cache);


PL/SQL procedure successfully completed.

SQL> 

Session B no longer sees the data because it’s now seeing what session A has just seen.

Warning.

This is just one of several ways I managed to get surprising inconsistencies when using the (cross-session) PL/SQL Result Cache. Oracle (in 12c) is supposed to know what the cache relies on and react accordingly, but it doesn’t manage to do it properly (even if you include the relies_on clause) – if you query the database in your PL/SQL you may find strange things happen.

The most important point to note in this example is that a session that wasn’t necessarily doing anything wrong got the wrong results because of the actions of another session that Oracle should have protected it from.

The good thing about this is example that it’s documented (sort of) – as it says in the manual: “to be result cached … does not reference … temporary tables ..”. So that should warn people off copying my example; on the other hand the problem I’m seeing arises because Oracle seems to be trying to use the result cache when the manuals suggests it shouldn’t be.

Conclusion

Do not mix the pl/sql result cache with database queries. The cache is public but (unlike the buffer cache) it is not guaranteed to give you read-consistency.

If you think this was an unrealistic example and you don’t need to worry about it – I’ll post a couple more examples in the next couple of weeks. They’ll be slightly special cases again, but I find the Oracle world is full of special cases.

Update

This behaviour is now visible on MoS as “Bug 21907155 : PL/SQL RC FN SHOWS ONE SESSION THE CONTENTS OF ANOTHER SESSION’S GTT”

August 25, 2015

Truncate – 2

Filed under: Infrastructure,Oracle — Jonathan Lewis @ 6:25 pm BST Aug 25,2015

Following on from my earlier comments about how a truncate works in Oracle, the second oldest question about truncate (and other DDL) appeared on the OTN database forum“Why isn’t a commit required for DDL?”

Sometimes the answer to “Why” is simply “that’s just the way it is” – and that’s what it is in this case, I think.  There may have been some historic reason why Oracle Corp. implemented DDL the way they did (commit any existing transaction the session is running, then auto-commit when complete), but once the code has been around for a few years – and accumulated lots of variations – it can be very difficult to change a historic decision, no matter how silly it may now seem.

This posting isn’t about answering the question “why”, though; it’s about a little script I wrote in 2003 in response to a complaint from someone who wanted to truncate a table in the middle of a transaction without committing the transaction. Don’t ask why – you really shouldn’t be executing DDL as part of a transactional process (though tasks like dropping and recreating indexes as part of a batch process is a reasonable strategy).

So if DDL always commits the current transaction how do you truncate a table without committing ? Easy – use an autonomous transaction. First a couple of tables with a little data, then a little procedure to do my truncate:


create table t1 (n1 number);
insert into t1 values(1);

create table t2 (n1 number);
insert into t2 values(1);

create or replace procedure truncate_t1
as
        pragma autonomous_transaction;
begin
        execute immediate 'truncate table t1';
end;
/

Then the code to demonstrate the effect:


prompt  ======================================
prompt  In this example we end up with no rows
prompt  in t1 and only the original row in t2,
prompt  the truncate didn't commit the insert.
prompt  ======================================

insert into t2 values(2);

execute truncate_t1;
rollback;

select * from t1;
select * from t2;


According to my notes, the last time I ran this code was on 9.2.0.3 but I’ve just tested it on 12.1.0.2 and it behaves in exactly the same way.

I’ve only tested the approach with “truncate” and “create table” apparently, and I haven’t made any attempt to see if it’s possible to cause major disruption with cunningly timed concurrent activity; but if you want to experiment you have a mechanism which Oracle could have used to avoid committing the current transaction – and you may be able to find out why it doesn’t, and why DDL is best “auto-committed”.

Truncate

Filed under: Infrastructure,Oracle,redo,undo — Jonathan Lewis @ 8:39 am BST Aug 25,2015

The old question about truncate and undo (“does a truncate generate undo or not”) appeared on the OTN database forum over the week-end and then devolved into “what really happens on a truncate” and then carried on.

The quick answer to the traditional question is essentially this: the actual truncate activity typically generates very little undo (and redo) compared to a full delete of all the data because all it does is tidy up any space management blocks and update the data dictionary; the undo and redo generated is only about the metadata, not about the data itself.

Of course, a reasonable response to the quick answer is: “how do you prove that?” – so I suggested that all you had to do was “switch logfile, truncate a table, dump logfile”. Unfortunately I realised that I had never bothered to do this myself and, despite having far more useful things to do, I couldn’t resist wasting some of my evening doing it. Here’s the little script I wrote to help


create table t2 (v1 varchar2(32));
insert into t2 values (rpad('A',32));
commit;

create table t1
nologging
as
with generator as (
        select  --+ materialize
                rownum id
        from dual
        connect by
                level <= 1e4
)
select
        rownum                  id, 
        rpad('x',100)           padding
from
        generator       v1,
        generator       v2
where
        rownum <= 1e5
;

create index t1_i1 on t1(id);
alter system flush buffer_cache;
execute dbms_lock.sleep(3)

alter system switch logfile;

insert into t2 values(rpad('X',32));

truncate table t1;

insert into t2 values(rpad('Y',32));
commit;

execute dump_log

Procedure dump_log simply dumps the current log file. The call to switch logfile keeps the dumped log file as small as possible; and I’ve flushed the buffer cache with a three second sleep to minimise the number of misleading “Block Written Record” entries that might otherwise appear in the log file after the truncate. There were all sorts of interesting little details in the resulting activity when I tested this on 12.1.0.2 – here’s one that’s easy to spot before you even look at the trace file:


SQL> select object_id, data_object_id, object_name from user_objects where object_name like 'T1%';

 OBJECT_ID DATA_OBJECT_ID OBJECT_NAME
---------- -------------- --------------------
    108705         108706 T1_I1
    108704         108707 T1

Notice how the data_object_id of the index is smaller than that of the table after the truncate ? Oracle truncates (and renumbers) the index before truncating the table.

The truncate activity was pretty much as I had assumed it would be – with one significant variation. The total number of change vectors reported was 272 in 183 redo records (your numbers may vary slightly if you try to reproduce the example), and here’s a summary of the redo OP codes that showed up in those change vectors in order of frequency:


Change operations
=================
  1 OP:10.25    Format root block
  1 OP:11.11    Insert multiple rows (table)
  1 OP:24.1     DDL
  1 OP:4.1      Block cleanout record
  2 OP:10.4     Delete leaf row
  2 OP:13.28    HWM on segment header block
  3 OP:10.2     Insert leaf row
  3 OP:17.28    standby metadata cache invalidation
  4 OP:11.19    Array update (index)
  4 OP:11.5     Update row (index)
 10 OP:13.24    Bitmap Block state change (Level 2)
 11 OP:23.1     Block written record
 12 OP:14.1     redo: clear extent control lock
 12 OP:22.5     File BitMap Block Redo
 14 OP:14.2     redo - lock extent (map)
 14 OP:14.4     redo - redo operation on extent map
 14 OP:5.4      Commit / Rollback
 15 OP:18.3     Reuse record (object or range)
 15 OP:22.16    File Property Map Block (FPM)
 22 OP:13.22    State on Level 1 bitmap block
 24 OP:22.2     File Space Header Redo
 29 OP:5.2      Get undo header
 58 OP:5.1      Update undo block

The line that surprised me was the 14 commit/rollback OP codes – a single truncate appears to have operated as 14 separate (recursive) transactions. I did start to walk through the trace file to work out the exact order of operation but it’s a really tedious and messy task so I just did a quick scan to get the picture. I may have made a couple of mistakes in the following but I think the steps were:

  • Start transaction
  • Lock the extent map for the INDEX segment — no undo needed
  • Lock each bitmap (space management) block  — no undo needed
  • Reset each bitmap block — undo needed to preserve space management information
  • Reset highwater marks where relevant on bitmap and segment header block — undo needed
  • Clear segment header block — undo needed
  • Write all the updated space management blocks to disc (local write waits)
    • Log file records “Block Written Record”.
  • For each space management block in turn
    • Update space management blocks with new data object_id — undo needed
    • Write the updated block to disc (local write wait)
    • Log file records one “Block Written Record” for each block
  • Repeat all the above for the TABLE segment.
  • Start a recursive transaction
    • Insert a row into mon_mod$ — undo needed
    • recursive commit
  • Set DDL marker in redo log (possibly holding the text of the DDL statement, but it’s not visible in the dump)
  • Set object reuse markers in the redo log
  • update tab$  — needs undo, it’s just DML
  • update ind$ — needs undo, it’s just DML
  • update seg$  — needs undo, it’s just DML (twice – once for table once for index)
  • update obj$ — needs undo, it’s just DML (twice – ditto)
  • COMMIT — at last, with a change vector for a “Standby metadata cache invalidation” marker

The remaining 12 transactions look like things that could be delayed to tidy up things like space management blocks for the files and tablespaces and releasing “block locks”.

This first, long, transaction, is the thing that has to happen as an atomic event to truncate the table – and you can imagine that if the database crashed (or you crashed the session) in the middle of a very slow truncate then there seems to be enough information being recorded in the undo to allow the database to roll forward an incomplete truncate, and then roll back to before the truncate.

It would be possible to test whether or not this would actually work – but I wouldn’t want to do it on a database that anyone else was using.

August 19, 2015

Oops

Filed under: Infrastructure,Oracle,Partitioning — Jonathan Lewis @ 10:02 am BST Aug 19,2015

I made a mistake a few days ago following up a question on the OTN database forum. The question was about a problem creating a hash/list composite partitioned table, and one of the respondants suggested that perhaps the problem appeared because hash/list wasn’t a legal combination.

Spot on: so I confirmed that observation and supplied a link to the official Oracle white paper that listed the combinations that were legal in 11.2 for composite partitioning.  In fact, although I was fairly sure that hash/list wasn’t legal, I had even run up a quick test to check that the attempt would fail before I’d searched online for the document.

Fortunately other people had different ideas about the original failure, and one of them supplied a link to a thread on AskTom which actually included some SQL to create a hash/list composite partitioned table. Naturally I tested it (even though it was from Tom Kyte) and obviously it worked (after all it was from Tom Kyte) – and then I spotted the syntax error in the example I had created for myself.

Trust but verify … and then …

I had fallen into two traps – and one of them was documented in my own “Trust” posting from 2006.

The white paper was dated September 2009 (Tom’s example was dated June 2013) and as it says in my Trust note:

  • If its date is more than about 18 months old – don’t assume it’s (still) true
  • If it’s not your exact version number – don’t assume it’s (still) true

The second trap was an example of confirmation bias, I was fairly sure that my test was supposed to fail with Oracle error “ORA-00922: missing or invalid option”, so when it failed with exactly that error I didn’t check why it had failed and didn’t notice that I had swapped the order of a couple of clauses in the create table statement. It’s very easy to think you’ve done enough when testing – especially when your test results match your expectation.

Update – later that day

So I’ve had an email asking me how I got the ORA-00922.  Here’s the SQL as I wrote it – don’t give the answer in the comments, but how quickly can you spot what I did wrong ?

create table t1
(
        object_name,
        object_type,
        created
)
partition by hash(object_name) partitions 4
subpartition by list(object_type)
subpartition template (
        subpartition sp1 values ('TABLE'),
        subpartition sp2 values ('INDEX'),
        subpartition sp3 values (default)
)
as
select  object_name, object_type, created
from    all_objects
;

It’s a deceptive error (to me, at least) because even though I know it’s wrong it still looks right.

August 5, 2015

Compression

Filed under: compression,Infrastructure,Oracle — Jonathan Lewis @ 1:06 pm BST Aug 5,2015

Originally published Jan 2013

Red Gate have asked me to write a few articles for their Oracle site, so I’ve sent them a short series on “traditional” compression in Oracle – which means I won’t be mentioning Exadata hybrid columnar compression (HCC a.k.a. EHCC). There will be five articles, published at the rate of one per week starting Tuesday (15th Jan). I’ll be supplying links for them as they are published.

Updated Aug 2015

Since I’ve got a little catalogue on compression started, I decided to add some items from my blog to the list, starting with a six-part series on HCC and compression units that I wrote (mostly) in the summer of 2012.

There are a few other items on my blog that I’ll add to this list eventually.

Updated Feb 2016

There’s a section on the official Oracle blog site – Database Storage Optimizationthat’s just become very active with a number of articles on compression in its many forms.

July 8, 2015

PK Index

Filed under: Indexing,Infrastructure,Oracle — Jonathan Lewis @ 6:08 pm BST Jul 8,2015

Here’s one of those little details that I might have known once, or maybe it wasn’t true in earlier versions of oracle, or maybe I just never noticed it and it’s “always” been true; and it’s a detail I’ll probably have forgotten again a couple of years from now.  Consider the following two ways of creating a table with primary key:


Option 1:

create table orders (
        order_id        number(10,0) not null,
        customer_id     number(10,0) not null,
        date_ordered    date         not null,
        other_bits      varchar2(250),
--      constraint ord_fk_cus foreign key(customer_id) references customers,
        constraint ord_pk primary key(order_id)
)
tablespace TS_ORD
;

Option 2:

create table orders (
        order_id        number(10,0) not null,
        customer_id     number(10,0) not null,
        date_ordered    date         not null,
        other_bits      varchar2(250)
)
tablespace TS_OP_DATA
;

alter table orders add constraint ord_pk primary key(order_id);

There’s a significant difference between the two strategies (at least in 11.2.0.4, I haven’t gone back to check earlier versions): in the first form the implicit primary key index is created in the tablespace of the table, in the second form it’s created in the default tablespace of the user. To avoid the risk of putting something in the wrong place you can always add the “using index” clause, for example:


alter table order add constraint ord_pk primary key (order_id) using index tablespace TS_OP_INDX;

Having noticed / reminded myself of this detail I now have on my todo list a task to check the equivalent behaviour when creating partitioned (or composite partitioned) tables – but that’s a task with a very low priority.

May 18, 2015

Migrated rows

Filed under: Infrastructure,Oracle,trace files,Troubleshooting — Jonathan Lewis @ 6:43 pm BST May 18,2015

I received an email recently describing a problem with a query which was running a full tablescan but: “almost all the waits are on ‘db file sequential read’ and the disk read is 10 times the table blocks”.  Some further information supplied was that the tablespace was using ASSM and 16KB block size; the table had 272 columns (ouch!) and the Oracle version was 11.2.0.4.

In his researches he had read my article on wide rows, and had picked out of one of the comments the line: “the very bad thing about chained rows and direct reads that is that finding the rest of row by ‘db file sequential read’ is never cached”, but he wasn’t sure that this was the problem he was seeing so, very sensibly, he had re-run the query with extended tracing available, and dumped (and formatted/edited) a couple of blocks from the table.

He then sent me the trace file and block dump. Generally this is a mistake – especially when the trace file is several megabytes – but he had prepared the ground well and had linked it back to one of my blog notes, and I thought there might be an opportunity for publishing a few more comments, so I took a look. Here’s a carefully edited subset of the block dump – showing all the pertinent information:


Start dump data blocks tsn: 99 file#:100 minblk 2513181 maxblk 2513181

Block header dump:  0x1926591d
 Object id on Block? Y
 seg/obj: 0x1652a7  csc: 0x53.891880b8  itc: 12  flg: E  typ: 1 - DATA
     brn: 1  bdba: 0x1965b70c ver: 0x01 opc: 0
     inc: 84  exflg: 0

 Itl           Xid                  Uba         Flag  Lck        Scn/Fsc
0x01   0x0010.01d.0000dff9  0x2b442286.3469.09  C---    0  scn 0x0053.891880b1
0x02   0x0000.000.00000000  0x00000000.0000.00  ----    0  fsc 0x0000.00000000
0x03   0x0000.000.00000000  0x00000000.0000.00  C---    0  scn 0x0000.00000000
0x04   0x0000.000.00000000  0x00000000.0000.00  C---    0  scn 0x0000.00000000
0x05   0x0000.000.00000000  0x00000000.0000.00  C---    0  scn 0x0000.00000000
0x06   0x0000.000.00000000  0x00000000.0000.00  C---    0  scn 0x0000.00000000
0x07   0x0000.000.00000000  0x00000000.0000.00  C---    0  scn 0x0000.00000000
0x08   0x0000.000.00000000  0x00000000.0000.00  C---    0  scn 0x0000.00000000
0x09   0x0000.000.00000000  0x00000000.0000.00  C---    0  scn 0x0000.00000000
0x0a   0x0000.000.00000000  0x00000000.0000.00  C---    0  scn 0x0000.00000000
0x0b   0x0000.000.00000000  0x00000000.0000.00  C---    0  scn 0x0000.00000000
0x0c   0x0000.000.00000000  0x00000000.0000.00  C---    0  scn 0x0000.00000000
bdba: 0x1926591d

data_block_dump,data header at 0x11083f154
===============
tsiz: 0x1ea8
hsiz: 0x26
pbl: 0x11083f154
     76543210
flag=--------
ntab=1
nrow=10
frre=-1
fsbo=0x26
fseo=0x4c5
avsp=0x49f
tosp=0x49f
0xe:pti[0]	nrow=10	offs=0
0x12:pri[0]	offs=0x1c15
0x14:pri[1]	offs=0x197b
0x16:pri[2]	offs=0x16e1
0x18:pri[3]	offs=0x1448
0x1a:pri[4]	offs=0x11b8
0x1c:pri[5]	offs=0xf1f
0x1e:pri[6]	offs=0xc85
0x20:pri[7]	offs=0x9ec
0x22:pri[8]	offs=0x752
0x24:pri[9]	offs=0x4c5
block_row_dump:
tab 0, row 0, @0x1c15
tl: 659 fb: -----L-- lb: 0x0  cc: 255
tab 0, row 1, @0x197b
tl: 666 fb: -----L-- lb: 0x0  cc: 255
tab 0, row 2, @0x16e1
tl: 666 fb: -----L-- lb: 0x0  cc: 255
tab 0, row 3, @0x1448
tl: 665 fb: -----L-- lb: 0x0  cc: 255
tab 0, row 4, @0x11b8
tl: 656 fb: -----L-- lb: 0x0  cc: 255
tab 0, row 5, @0xf1f
tl: 665 fb: -----L-- lb: 0x0  cc: 255
tab 0, row 7, @0x9ec
tl: 665 fb: -----L-- lb: 0x0  cc: 255
tab 0, row 8, @0x752
tl: 666 fb: -----L-- lb: 0x0  cc: 255
tab 0, row 9, @0x4c5
tl: 653 fb: -----L-- lb: 0x0  cc: 255

In the ITL you can see 10 entries with the flag set to “C—-” (committed) with no XID or SCN – that’s consistent with 10 rows migrating into the block in a single transaction. In the row directory you can see the block holds 10 rows, and in the body of the block you can see the header for each of those 10 rows with 255 columns (presumably the 2nd section of each row of 272 columns), and the flag bytes set to “—–L–” (the Last piece of a chained – as opposed to simply migrated – row).

So the block dump is consistent with the possiblity of a direct path read of a block somewhere (10 head pieces) having to read this block 10 times shortly afterwards. Can we find further corroboration in the trace file? The blockdump was for block 0x1926591d = 421943581 decimal


PARSE #4573135368:c=29,e=48,p=0,cr=0,cu=0,mis=0,r=0,dep=0,og=1,plh=4116693033,tim=79008343283418
EXEC #4573135368:c=53,e=93,p=0,cr=0,cu=0,mis=0,r=0,dep=0,og=1,plh=4116693033,tim=79008343283607
WAIT #4573135368: nam='SQL*Net message to client' ela= 1 driver id=1413697536 #bytes=1 p3=0 obj#=15477650 tim=79008343283636
WAIT #4573135368: nam='Disk file operations I/O' ela= 38 FileOperation=2 fileno=101 filetype=2 obj#=1462951 tim=79008343283973
WAIT #4573135368: nam='direct path read' ela= 8991 file number=100 first dba=947580 block cnt=13 obj#=1462951 tim=79008343293041

WAIT #4573135368: nam='db file sequential read' ela= 4934 file#=100 block#=2513181 blocks=1 obj#=1462951 tim=79008343298032
WAIT #4573135368: nam='db file sequential read' ela= 155 file#=100 block#=2513181 blocks=1 obj#=1462951 tim=79008343298216
WAIT #4573135368: nam='db file sequential read' ela= 127 file#=100 block#=2513181 blocks=1 obj#=1462951 tim=79008343298378
WAIT #4573135368: nam='db file sequential read' ela= 125 file#=100 block#=2513181 blocks=1 obj#=1462951 tim=79008343298526
WAIT #4573135368: nam='db file sequential read' ela= 128 file#=100 block#=2513181 blocks=1 obj#=1462951 tim=79008343298677
WAIT #4573135368: nam='db file sequential read' ela= 123 file#=100 block#=2513181 blocks=1 obj#=1462951 tim=79008343298826
WAIT #4573135368: nam='db file sequential read' ela= 134 file#=100 block#=2513181 blocks=1 obj#=1462951 tim=79008343298983
WAIT #4573135368: nam='db file sequential read' ela= 129 file#=100 block#=2513181 blocks=1 obj#=1462951 tim=79008343299135
WAIT #4573135368: nam='db file sequential read' ela= 180 file#=100 block#=2513181 blocks=1 obj#=1462951 tim=79008343299341
WAIT #4573135368: nam='db file sequential read' ela= 133 file#=100 block#=2513181 blocks=1 obj#=1462951 tim=79008343299497

WAIT #4573135368: nam='db file sequential read' ela= 11039 file#=100 block#=2513245 blocks=1 obj#=1462951 tim=79008343310565
WAIT #4573135368: nam='db file sequential read' ela= 133 file#=100 block#=2513245 blocks=1 obj#=1462951 tim=79008343310730
WAIT #4573135368: nam='db file sequential read' ela= 139 file#=100 block#=2513245 blocks=1 obj#=1462951 tim=79008343310895
WAIT #4573135368: nam='db file sequential read' ela= 124 file#=100 block#=2513245 blocks=1 obj#=1462951 tim=79008343311045
WAIT #4573135368: nam='db file sequential read' ela= 122 file#=100 block#=2513245 blocks=1 obj#=1462951 tim=79008343311190
WAIT #4573135368: nam='db file sequential read' ela= 127 file#=100 block#=2513245 blocks=1 obj#=1462951 tim=79008343311339
WAIT #4573135368: nam='db file sequential read' ela= 125 file#=100 block#=2513245 blocks=1 obj#=1462951 tim=79008343311490
WAIT #4573135368: nam='db file sequential read' ela= 134 file#=100 block#=2513245 blocks=1 obj#=1462951 tim=79008343311647
WAIT #4573135368: nam='db file sequential read' ela= 128 file#=100 block#=2513245 blocks=1 obj#=1462951 tim=79008343311797
WAIT #4573135368: nam='db file sequential read' ela= 124 file#=100 block#=2513245 blocks=1 obj#=1462951 tim=79008343311947

WAIT #4573135368: nam='db file sequential read' ela= 10592 file#=100 block#=2513309 blocks=1 obj#=1462951 tim=79008343322564
WAIT #4573135368: nam='db file sequential read' ela= 142 file#=100 block#=2513309 blocks=1 obj#=1462951 tim=79008343322740
WAIT #4573135368: nam='db file sequential read' ela= 126 file#=100 block#=2513309 blocks=1 obj#=1462951 tim=79008343322889

There are a couple of interesting details in this trace file.

First we note (as the OP said) there are very few direct path reads – but direct path reads can be asynchronous with several running concurrently, which means that we may report one direct path read while the data returned from others records no time. (You’ll have to take my word for the sparseness of direct path reads – there were 5 reading a total of 58 blocks from the object, compared to 50,000 db file sequential reads)

Then you can see that although each block that was subject to “db file sequential read” is reported 10 times, the first read is much slower than the subsequent ones – a fairly good indication that the later reads are coming from a cache somewhere. (The 50,00 reads consisted of roughly 5,300 blocks being read 10 times, 1,400 blocks being read 9 times, 460 blocks being read 8 times, and a few blocks being read 7 or fewer times.)

You might also notice that the “coincidental” jump of 64 blocks between the sets of 10 reads – this appears fairly frequently, and it’s the type of pattern you might expect to see when a serial process is allocating blocks for use in a clean ASSM tablespace after the extent sizes have become fairly large (possibly the 64MB size that eventually appears with system managed extent sizes). There’s a “pseudo-random” choice of block within extent dicated by the process id, that spreads the work done by a single process steadily through the extent. Having filled 2513181, 2513245, 2513309 and so on for 16 steps the trace file comes back to 2513182, 2513246, 2513309 and so on.

It’s interesting (and time-consuming) to check the patterns but what we really need next, and don’t have, to check the theory is the set of 13 blocks dictated by the first direct path read:

WAIT #4573135368: nam='direct path read' ela= 8991 file number=100 first dba=947580 block cnt=13 obj#=1462951 tim=79008343293041

It’s likely that somewhere in the 13 blocks in the range 947580 onwards we would find the 10 row head pieces pointing to block 2513181; then the 10 row head pieces pointing to block 2513245, and so on – and I’d hope that we might see a pattern of many consecutive (or near-consecutive) rows in each originating block pointing to the same “next block”. In fact, with a few blocks in the early range, we might even get some idea of how the application was loading and updating data and be able to make some suggestions for changing the strategy to avoid row chaining.

Footnote

The OP also had a follow-up question which was: “One question for the block dump is why there is no hrid in it since the row pieces are the second row pieces and the flag bit is ‘—–L–‘?”  It would be nice to see this, of course – then we wouldn’t need to see the 947580-947592 range to see what had been happening to the data – but that’s not the way Oracle works, as I’ve pointed out above; but since the answer was in another posting of mine I simply emailed the relevant URL to the OP.

May 13, 2015

Instance stats

Filed under: Infrastructure,Oracle — Jonathan Lewis @ 7:31 pm BST May 13,2015

+

While reading a posting by Martin Bach on a new buffering option for 12c I was prompted to take a look at another of his posts on the instance activity stats, which reminded me that the class column on v$statname is a bit flag, which we can dissect using the bitand() function to pick out the statistics that belong to multiple classes. I’ve got 2 or 3 little scripts that do this one, for example, picks out all the statistics relating to RAC, another is just a cross-tab of the class values used and their breakdown by class.  Originally this latter script used the “diagonal” method of decode() then sum() – but when the 11g pivot() option appeared I used it as an experiment on pivoting.

This is the script as it now stands, with the output from 12.1.0.2




select
        *
from    (
        select
                st.class,
                pwr.class_id,
                case bitand(st.class, pwr.expn)
                        when 0 then to_number(null)
                               else 1
                end     class_flag
        from
                v$statname      st,
                (select
                        level                   class_id,
                        power(2,level - 1)      expn
                from
                        dual
                connect by level <= 8
                )       pwr
        where
                bitand(class,pwr.expn) = pwr.expn
        )
pivot   (
                sum(class_flag)
        for     class_id in (
                        1 as EndUser,
                        2 as Redo,
                        3 as Enqueue,
                        4 as Cache,
                        5 as OS,
                        6 as RAC,
                        7 as SQL,
                        8 as Debug
                )
        )
order by
        class
;


     CLASS    ENDUSER       REDO    ENQUEUE      CACHE         OS        RAC        SQL      DEBUG
---------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- ----------
         1        130
         2                    68
         4                                9
         8                                         151
        16                                                     16
        32                                                                35
        33          3                                                      3
        34                     1                                           1
        40                                          53                    53
        64                                                                          130
        72                                          15                               15
       128                                                                                     565
       192                                                                            2          2

13 rows selected.

The titles given to the columns come from Martin’s blog, but the definitive set is in the Oracle documentation in the reference manual for v$statname. (I’ve changed the first class from “User” to “EndUser” because of a reserved word problem, and I abbreviated the “RAC” class for tidiness.) It’s interesting to note how many of the RAC statistics are also about the Cache layer.

April 20, 2015

Manuals

Filed under: Infrastructure,Oracle — Jonathan Lewis @ 4:24 pm BST Apr 20,2015

From time to time I read a question (or, worse, an answer) on OTN and wonder how someone could have managed to misunderstand some fundamental feature of Oracle – and then, as I keep telling people everyone should do – I re-read the manuals and realise that that sometimes the manuals make it really easy to come to the wrong conclusion.

Having nothing exciting to do on the plane to Bucharest today, I decided it was time to read the Concepts manual again – 12c version – to remind myself of how much I’ve forgotten. Since I was reading the mobi version on an iPad mini I can’t quote page numbers, but at “location 9913 of 16157” I found the following text in a sidebar:

“LGWR can write redo log entries to disk before a transaction commits. The redo entries become permanent only if the transaction later commits.”

Now I know what that’s trying to say because I already know how Oracle works – but it explains the various questions that I’ve seen on OTN (and elsewhere) struggling with the idea of how Oracle manages to “not have” redo for transactions that didn’t commit.

The redo entries become permanent the moment they are written to disc – nothing makes any of the content of the redo log files disappear 1, nothing goes back and flags some bits of the redo log as “not really there”. It’s the changes to the data blocks that have been described by the redo that become permanent only if the transaction later commits. If the transaction rolls back2 the session doesn’t “seek and destroy” the previous redo, it generates MORE redo (based on the descriptions that it originally put into the undo segment) and applies the changes described by that redo to reverse out the effects of the previous changes.

So next time you see a really bizarre question about how Oracle works remember that it could have arisen from someone reading the manual carefully; because sometimes the manual writers know exactly what they mean to say but don’t actually say it clearly and unambiguously.

1 I am aware that strange and rare events such disc crashes could make all sorts of things disappear, but I think it’s reasonable to assume here that we’re talking about standard processing mechanisms.

2 I am also aware that there are variations dependent on events like sessions being killed, or instance failure that could need some further explanation, but there’s a time, place, and pace, for everything.

April 5, 2015

Not In CTAS

Filed under: Infrastructure,NULL,Oracle — Jonathan Lewis @ 6:49 pm BST Apr 5,2015

Everyone gets caught out some of the time with NOT IN.

NOT IN is not the opposite of IN.

This came up in a (fairly typical) question on OTN recently where someone had the task of “deleting 6M rows from a table of 18M”. A common, and perfectly reasonable, suggestion for dealing with a delete on this scale is to consider creating a replacement table holding the data you do want rather than deleting the data you don’t want.  In this case, however, the query for deleting the data looked like this:


DELETE FROM EI.CASESTATUS
     WHERE CASEID NOT IN (SELECT CASEID FROM DO.STG_CASEHEADER);

The suggested code for creating the kept data was this:


CREATE TABLE newTable as
  SELECT * FROM EI.CASESTATUS
     WHERE CASEID IN (SELECT CASEID FROM DO.STG_CASEHEADER);

You might get the same result sets in the two tables after this process – but it depends on the CASEID in both tables never being NULL. (You might think that a column with ID in the name probably ought to be a primary key, or a foreign key with a NOT NULL declaration, but then again there’s that STG_ in the subquery table that might be indicative of a “staging” table, so who knows what might happen if the data’s allowed to start life as dirty data.) Here’s a quick demo to prove the point. First some simple data creation – with an optional insert so that you’ve got two tests in one script – followed by the two strategies for identifying data:


drop table t3;
drop table t2;
drop table t1;

create table t1 (n1 number);
create table t2 (n1 number);

insert into t1 values(null);
insert into t1 values(1);
insert into t1 values(2);

/* two tests available, with or without a null in t2 */

-- insert into t2 values(null);
insert into t2 values(1);

commit;

-- gather stats here

set null n/a
delete from t1
where  t1.n1 not in (select t2.n1 from t2);

prompt  Remainder after delete

select  *  from t1;

rollback;

prompt  Selected on create

create table t3 as
select  *  from t1
where   t1.n1 in (select t2.n1 from t2);

select * from t3;

Then the two sets of output from running the test, first with the the NULL insert into t2:


Remainder after delete

        N1
----------
n/a
         1
         2
Selected on create

        N1
----------
         1

We haven’t deleted ANY data from t1 when we were probably hoping that the 2 would disappear – after all, it’s not in t2; however since the equality comparison between a t1 row and every t2 row must evaluate to FALSE before a t1 row is deleted and the comparison of 2 and NULL evaluates to NULL the 2 row won’t be deleted (similarly the comparison for “t1.NULL = t2.anything” evaluates to NULL rather than FALSE, so the NULL isn’t deleted).

Still, perhaps the rewrite would have been okay for the data set where we don’t have a NULL in t2:


Remainder after delete

        N1
----------
n/a
         1
Selected on create

        N1
----------
         1

Oops – still doesn’t produce matching results . This time the row with the 2 has disappeared from t1 in both cases – which might have been closer to what the original OTN poster had hoped but we still have the difference in the survival of the NULLs from t1 – for the reason given for the previous data set

Footnote:

In passing, the execution plan subsequently supplied by the OP showed a “MERGE JOIN ANTI NA” with stg_caseheader (the subquery table) as the second table. The significance of the NA (Null-aware) is that it tells us that the join column in stg_caseheader definitely doesn’t have a NOT NULL constraint on it. (We can’t draw any conclusion about the join column in casestatus.)

March 27, 2015

ASH

Filed under: Infrastructure,Oracle,Troubleshooting — Jonathan Lewis @ 9:41 am BST Mar 27,2015

There was a little conversation on Oracle-L about ASH (active session history) recently which I thought worth highlighting – partly because it raised a detail that I had got wrong until Tim Gorman corrected me a few years ago.

Once every second the dynamic performance view v$active_session_history copies information about active sessions from v$session. (There are a couple of exceptions to the this rule – for example if a session has called dbms_lock.sleep() it will appear in v$session as state = ‘ACTIVE’, but it will not be recorded in v$active_session_history.) Each of these snapshots is referred to as a “sample” and may hold zero, one, or many rows.

The rows collected in every tenth sample are flagged for copying into the AWR where, once they’ve been copied into the underlying table, they can be seen in the view dba_hist_active_sess_history.  This is where a common misunderstanding occurs: it is not every 10th row in v$active_session_history it’s every 10th second; and if a sample happens to be empty that’s still the sample that is selected (which means there will be a gap in the output from dba_hist_active_sess_history). In effect dba_hist_active_sess_history holds copies of the information you’d get from v$session if you sampled it once every 10 seconds instead of once per second.

It’s possible to corroborate this through a fairly simple query as the rows from v$active_session_history that are going to be dumped to the AWR are as they are created:


select
        distinct case is_awr_sample when 'Y' then 'Y' end flag,
        sample_id,
        sample_time
from
        v$active_session_history
where
        sample_time > sysdate - 1/1440
order by
        2,1
;

F  SAMPLE_ID SAMPLE_TIME
- ---------- --------------------------------
     3435324 26-MAR-15 05.52.53.562 PM
     3435325 26-MAR-15 05.52.54.562 PM
     3435326 26-MAR-15 05.52.55.562 PM
     3435327 26-MAR-15 05.52.56.562 PM
     3435328 26-MAR-15 05.52.57.562 PM
     3435329 26-MAR-15 05.52.58.562 PM
     3435330 26-MAR-15 05.52.59.562 PM
     3435331 26-MAR-15 05.53.00.562 PM
Y    3435332 26-MAR-15 05.53.01.562 PM
     3435333 26-MAR-15 05.53.02.572 PM
     3435334 26-MAR-15 05.53.03.572 PM
     3435335 26-MAR-15 05.53.04.572 PM
     3435336 26-MAR-15 05.53.05.572 PM
     3435337 26-MAR-15 05.53.06.572 PM
     3435338 26-MAR-15 05.53.07.572 PM
     3435339 26-MAR-15 05.53.08.572 PM
     3435340 26-MAR-15 05.53.09.572 PM
     3435341 26-MAR-15 05.53.10.582 PM
Y    3435342 26-MAR-15 05.53.11.582 PM
     3435343 26-MAR-15 05.53.12.582 PM
     3435344 26-MAR-15 05.53.13.582 PM
     3435345 26-MAR-15 05.53.14.582 PM
     3435346 26-MAR-15 05.53.15.582 PM
     3435347 26-MAR-15 05.53.16.582 PM
     3435348 26-MAR-15 05.53.17.582 PM
     3435349 26-MAR-15 05.53.18.592 PM
     3435350 26-MAR-15 05.53.19.592 PM
     3435351 26-MAR-15 05.53.20.592 PM
Y    3435352 26-MAR-15 05.53.21.602 PM
     3435355 26-MAR-15 05.53.24.602 PM
     3435358 26-MAR-15 05.53.27.612 PM
     3435361 26-MAR-15 05.53.30.622 PM
     3435367 26-MAR-15 05.53.36.660 PM
     3435370 26-MAR-15 05.53.39.670 PM
     3435371 26-MAR-15 05.53.40.670 PM
     3435373 26-MAR-15 05.53.42.670 PM
     3435380 26-MAR-15 05.53.49.700 PM
     3435381 26-MAR-15 05.53.50.700 PM
Y    3435382 26-MAR-15 05.53.51.700 PM
     3435383 26-MAR-15 05.53.52.700 PM

40 rows selected.

As you can see at the beginning of the output the samples have a sample_time that increases one second at a time (with a little slippage), and the flagged samples appear every 10 seconds at 5.53.01, 5.53.11 and 5.53.21; but then the instance becomes fairly idle and there are several sample taken over the next 20 seconds or so where we don’t capture any active sessions; in particular there are no rows in the samples for 5.53.31, and 5.53.41; but eventually the instance gets a little busy again and we see that we’ve had active sessions in consecutive samples for the last few seconds, and we can see that we’ve flagged the sample at 5.53.51 for dumping into the AWR.

You’ll notice that I seem to be losing about 1/100th second every few seconds; this is probably a side effect of virtualisation and having a little CPU-intensive work going on in the background. If you see periods where the one second gap in v$active_session_history or 10 second gap in dba_hist_active_sess_history has been stretched by several percent you can assume that the CPU was under pressure over that period. The worst case I’ve seen to date reported gaps of 12 to 13 seconds in dba_hist_active_sess_history.  The “one second” algorithm is “one second since the last snapshot was captured” so if the process that’s doing the capture doesn’t get to the top of the runqueue in a timely fashion the snapshots slip a little.

When the AWR snapshot is taken, the flagged rows from v$active_session_history are copied to the relevant AWR table. You can adjust the frequency of sampling for both v$active_session_history, and dba_hist_active_sess_history, of course – there are hidden parameters to control both: _ash_sampling_interval (1,000 milliseconds) and _ash_disk_filter_ratio (10). There’s also a parameter controlling how much memory should be reserved in the shared pool to hold v$active_session_history.: _ash_size (1048618 bytes per session in my case).  The basic target is to keep one hour’s worth of data in memory, but if there’s no pressure for memory you can find that the v$active_session_history holds more than the hour; conversely, if there’s heavy demand for memory and lots of continuously active sessions you may find that Oracle does “emergency flushes” of v$active_session_history between the normal AWR snapshots. I have heard of people temporarily increasing the memory and reducing the interval and ratio – but I haven’t yet felt the need to do it myself.

 

March 26, 2015

12c MView refresh

Filed under: 12c,Infrastructure,Materialized view,Oracle — Jonathan Lewis @ 1:19 pm BST Mar 26,2015

Some time ago I wrote a blog note describing a hack for refreshing a large materialized view with minimum overhead by taking advantage of a single-partition partitioned table. This note describes how Oracle 12c now gives you an official way of doing something similar – the “out of place” refresh.

I’ll start by creating a matieralized view and creating a couple of indexes on the resulting underlying table; then show you three different calls to refresh the view. The materialized view is based on all_objects so it can’t be made available for query rewrite (ORA-30354: Query rewrite not allowed on SYS relations) , and I haven’t created any materialized view logs so there’s no question of fast refreshes – but all I intend to do here is show you the relative impact of a complete refresh.


create materialized view mv_objects nologging
build immediate
refresh on demand
as
select
        *
from
        all_objects
;

begin
	dbms_stats.gather_table_stats(
		ownname		 => user,
		tabname		 =>'mv_objects',
		method_opt 	 => 'for all columns size 1'
	);
end;
/

create index mv_obj_i1 on mv_objects(object_name) nologging compress;
create index mv_obj_i2 on mv_objects(object_type, owner, data_object_id) nologging compress 2;

This was a default install of 12c, so there were about 85,000 rows in the view. You’ll notice that I’ve created all the objects as “nologging” – this will have an effect on the work done during some of the refreshes.

Here are the three variants I used – all declared explicitly as complete refreshes:


begin
	dbms_mview.refresh(
		list			=> 'MV_OBJECTS',
		method			=> 'C',
		atomic_refresh		=> true
	);
end;
/

begin
	dbms_mview.refresh(
		list			=> 'MV_OBJECTS',
		method			=> 'C',
		atomic_refresh		=> false
	);
end;
/

begin
	dbms_mview.refresh(
		list			=> 'MV_OBJECTS',
		method			=> 'C',
		atomic_refresh		=> false,
		out_of_place		=> true
	);
end;
/

The first one (atomic_refresh=>true) is the one you have to use if you want to refresh several materialized views simultaneously and keep them self consistent, or if you want to ensure that the data doesn’t temporarily disappear if all you’re worried about is a single view. The refresh works by deleting all the rows from the materialized view then executing the definition to generate and insert the replacement rows before committing. This generates a lot of undo and redo – especially if you have indexes on the materialized view as these have to be maintained “row by row” and may leave users accessing and applying a lot of undo for read-consistency purposes. An example at a recent client site refreshed a table of 6.5M rows with two indexes, taking about 10 minutes to refresh, generating 7GB of redo as it ran, and performing 350,000 “physical reads for flashback new”. This strategy does not take advantage of the nologging nature of the objects – and as a side effect of the delete/insert cycle you’re likely to see the indexes grow to roughly twice their optimal size and you may see the statistic “recursive aborts on index block reclamation” climbing as the indexes are maintained.

The second option (atomic_refresh => false) is quick and efficient – but may result in wrong results showing up in any code that references the materialized view (whether explicitly or by rewrite). The session truncates the underlying table, sets any indexes on it unusable, then reloads the table with an insert /*+ append */. The append means you get virtually no undo generated, and if the table is declared nologging you get virtually no redo. In my case, the session then dispatched two jobs to rebuild the two indexes – and since the indexes were declared nologging the rebuilds generated virtually no redo. (I could have declared them with pctfree 0, which would also have made them as small as possible).

The final option is the 12c variant – the setting atomic_refresh => false is mandatory if we want  out_of_place => true. With these settings the session will create a new table with a name of the form RV$xxxxxx where xxxxxx is the hexadecimal version of the new object id, insert the new data into that table (though not using the /*+ append */ hint), create the indexes on that table (again with names like RV$xxxxxx – where xxxxxx is the index’s object_id). Once the new data has been indexed Oracle will do some name-switching in the data dictionary (shades of exchange partition) to make the new version of the materialized view visible. A quirky detail of the process is that the initial create of the new table and the final drop of the old table don’t show up in the trace file  [Ed: wrong, see comment #1] although the commands to drop and create indexes do appear. (The original table, though it’s dropped after the name switching, is not purged from the recyclebin.) The impact on undo and redo generation is significant – because the table is empty and has no indexes when the insert takes place the insert creates a lot less undo and redo than it would if the table had been emptied by a bulk delete – even though the insert is a normal insert and not an append; then the index creation honours my nologging definition, so produces very little redo. At the client site above, the redo generated dropped from 7GB to 200MB, and the time dropped to 200 seconds which was 99% CPU time.

Limitations, traps, and opportunities

The manuals say that the out of place refresh can only be used for materialized views that are joins or aggregates and, surprisingly, you actually can’t use the method on a view that simply extracts a subset of rows and columns from a single table.  There’s a simple workaround, though – join the table to DUAL (or some other single row table if you want to enable query rewrite).

Because the out of place refresh does an ordinary insert into a new table the resulting table will have no statistics – you’ll have to add a call to gather them. (If you’ve previously been using a non-atomic refreshes this won’t be a new problem, of course). The indexes will have up to date statistics, of course, because they will have been created after the table insert.

The big opportunity, of course, is to change a very expensive atomic refresh into a much cheaper out of place refresh – in some special cases. My client had to use the atomic_refresh=>true option in 11g because they couldn’t afford to leave the table truncated (empty) for the few minutes it took to rebuild; but they might be okay using the out_of_place => true with atomic_refresh=>false in 12c because:

  • the period when something might break is brief
  • if something does go wrong the users won’t get wrong (silently missing) results, they’ll an Oracle error (probably ORA-08103: object no longer exists)
  • the application uses this particular materialized view directly (i.e. not through query rewrite), and the query plans are all quick, light-weight indexed access paths
  • most queries will probably run correctly even if they run through the moment of exchange

I don’t think we could guarantee that last statement – and Oracle Corp. may not officially confirm it – and it doesn’t matter how many times I show queries succeeding but it’s true. Thanks to “cross-DDL read-consistency” as it was called in 8i when partition-exchange appeared and because the old objects still exist in the data files, provided your query doesn’t hit a block that has been overwritten by a new object, or request a space management block that was zero-ed out on the “drop” a running query can keep on using the old location for an object after it has been replaced by a newer version. If you want to make the mechanism as safe as possible you can help – put each relevant materialized view (along with its indexes) into its own tablespace so that the only thing that is going to overwrite an earlier version of the view is the stuff you create on the next refresh.

 

« Previous PageNext Page »

Blog at WordPress.com.

Follow

Get every new post delivered to your Inbox.

Join 6,668 other followers