Oracle Scratchpad

January 13, 2022

“Priors”

Filed under: Non-technical,Philosophy — Jonathan Lewis @ 12:34 pm GMT Jan 13,2022

I was listening to a podcast recently about “Thinking Clearly” when the presenter gave a very rapid description of a common mistake that occurs when human intuition meets statistical information.

The particular bit of statistical analysis, when you meet its effects in real-life, can be hard to understand so I thought I’d draw a picture that might help people to understand the mechanism of putting a “threat percentage” into the correct context.

Imagine there is a “nasty medical condition X” that affects one in 20 of all men over the age of 55. Now imagine that there is a diagnostic test for the condition that is 90% accurate (meaning it will return the correct result 90 times out of a hundred and the wrong result 10 times out of 100).

You are male, just past your 55th birthday, and your doctor tells you that you’ve tested positive. How terrified should you be or, to put it another way, which is more likely (and by how much): you’ve got X, or you haven’t got X?

The human, intuitive, response is simple: you’ve been told the test is 90% accurate and you’ve tested positive; so your digestive system is probably telling you that it’s almost certain you’ve got X.

The statistician’s approach (after the initial reflexive shock, perhaps) is to apply Bayesian thinking which can be turned into pictures as follows:

  • Step 1: What does “one in 20” look like? (left hand square)
  • Step 2: What does “90% accurate” look like? (right hand square)
  • Step 3: What does the picture look like when we superimpose the individual (independent) probabilities:

The big white block toward the top left tells us about the group of people who are correctly informed that they are X-free; the tiny red block in the bottom right tells us about the really unlucky ones who are told they’re X-free when they actually have X (false negatives).

Now compare the two pink blocks: the vertical rectangle to the right is the group that have X and test positive; the horizontal rectangle at the bottom is the group who don’t have X but test positive anyway (false positives).

The visual impression from this image is that if you’ve been told that you tested positive it’s nearly twice as likely that you are X-free than you are to have X: but let’s put the numbers into the picture to get a better handle on this. I’ll use a population of 10,000 (which, conveniently, I can represent as a square measuring 100 by 100):

In a population of 10,000

  • X-free = 95 * (90 + 10) = 9,500 (95%)
  • Got X = 5 * (90 + 10) = 500 (5%)
  • Correct result given = 90 * (95 + 5) = 9,000 (90%)
  • Wrong result given =10 * (95 + 5) = 1,000 (10%)
  • X-free and received right result = 8,550 … (95 * 90, top left)
  • Got X and received wrong result = 50 … (5 * 10, bottom right)
  • Got X and received right result = 450 … (5 * 90, top right)
  • X-free and received wrong result = 950 … (95 * 10, bottom left)

Given the underlying population data (“priors”) for this example, we see that a positive result from a test that’s 90% accurate means there’s a probability of 450 / (950 + 450) = 0.32 (32%) that you’ve got X.

Footnote

The result of this very simple hypothetical case is not intuitively obvious to most people; but if you thought it was easy to get to the right answer you might want to look at the Monty Hall problem, which also leads to Bertrand’s Boxes and the Three Prisoners problem.-

January 22, 2020

Philosophy 23

Filed under: Oracle,Philosophy — Jonathan Lewis @ 2:46 pm GMT Jan 22,2020

It’s a long time since I wrote a note with the “philosophy” theme, but a recent complaint about Oracle prompted me to suggest the following thought:

“If you write SQL that is technically incorrect it’s not Oracle’s fault if sometimes the SQL completes without an error.”

Consider the following fragment of code:

drop table t1 purge;

create table t1 (n1 number, n2 number);
insert into t1 values (1,1);
commit;

select n1 from t1 where n2 = to_number('x');
select n1 from t1 where n2 = to_number('x') and n1 = 2;

select 
        case 
                when 
                        0 in  (select n1 from t1 where n2 = to_number('x'))
                then
                        1
                else
                        0
        end
from
        dual
;

Clearly the first query must raise an error because ‘x’ can’t be converted to a number (until Oracle supplies a format option to read it as a Roman Numeral).

Clearly the second query must raise an error because it’s just the first query with an extra predicate on the end.

Clearly the third query must raise an error because it’s going to execute a subquery that must raise an error.

Here’s the output from running the code from SQL*Plus (on 12.2.0.1).

Table dropped.


Table created.


1 row created.


Commit complete.

select n1 from t1 where n2 = to_number('x')
                                       *
ERROR at line 1:
ORA-01722: invalid number



no rows selected


CASEWHEN0IN(SELECTN1FROMT1WHEREN2=TO_NUMBER('X'))THEN1ELSE0END
--------------------------------------------------------------
                                                             0

1 row selected.

The first query fails with the expected conversion error: ORA-01722: invalid number. The second and third queries run to completion because SQL is a declarative language, not a procedural language, and the optimizer is allowed to transform your query in any way that it thinks might make it complete more quickly.

Here’s the execution plan – with predicate information – for the second query:

-------------------------------------+-----------------------------------+
| Id  | Operation          | Name    | Rows  | Bytes | Cost  | Time      |
-------------------------------------+-----------------------------------+
| 0   | SELECT STATEMENT   |         |       |       |     2 |           |
| 1   |  TABLE ACCESS FULL | T1      |     1 |    26 |     2 |  00:00:01 |
-------------------------------------+-----------------------------------+

Predicate Information:
----------------------
1 - filter(("N1"=2 AND "N2"=TO_NUMBER('x')))

The optimizer has decided that it’s more efficent to test for the constant 2 than it is to call the to_number() function, so it’s evaluated n1 = 2 first for each row and never had to check the second predicate because nothing got past the first.

The explanation for the successful completion of the third query is slightly different, but again it revolves around transforming for efficiency. Oracle will (very often) convert an IN subquery to an EXISTS subquery. In my example the resulting SQL looks like this (taken from the CBO (10053) trace file, with some cosmeticy enhancement):


SELECT 
        CASE
                WHEN 
                        EXISTS (SELECT 0 FROM T1 WHERE N1=0 AND N2=TO_NUMBER('x')) 
                THEN 1 
                ELSE 0 
        END  "CASEWHEN0IN(SELECTN1FROMT1WHEREN2=TO_NUMBER('X'))THEN1ELSE0END" 
FROM
        DUAL

Note how the “guaranteed to fail” subquery has an extra predicate added as the IN subquery is transformed into an EXISTS subquery and, just like the previous example, the extra predicate is applied before the to_number() predicate, and there’s no data to match the first predicate so the to_number() function never gets called and never gets a chance to raise an exception.

You could argue, or course, that the optimizer should spot the attempt to generate a constant and evaluate it (if possible) at parse time and raise the error before the runtime engine even sees the query – and it wouldn’t be too  hard to make a case for that – but it would only take a few seconds of thought to create slightly more complex examples to demonstrate the point I’ve been making.

Bottom line(s):

1) Use the correct datatypes for your data.

2) Don’t assume that a statement that raises an exception in some circumstances will result in an exception if it is subsequently embedded in a more complex statement. The optimizer is non-procedural and may transform your statement in a way that bypasses your bad design.

 

Footnote:

I probably shouldn’t say this – it will only encourage someone to do the wrong thing – but for the very simplest examples you can show that the ordered_predicates hint still works:


SQL> select /*+ ordered_predicates */  n1 from t1 where n2 = to_number('x') and n1 = 2;
select /*+ ordered_predicates */  n1 from t1 where n2 = to_number('x') and n1 = 2
                                                                  *
ERROR at line 1:
ORA-01722: invalid number

August 9, 2019

Troubleshooting

Filed under: Philosophy — Jonathan Lewis @ 1:33 pm BST Aug 9,2019

An anecdote with a moral.

Many years ago – in the days of Oracle 7.2.3, when parallel query and partition views were in their infancy and when RAC was still OPS (Oracle Parallel Server), I discovered a bug that caused parallel queries against partition views to crash (Maxim – mixing two new features is a good way to find bugs). I no longer recall the details of exact syntax but the problem revolved around the way Oracle used to rewrite queries for parallel execution. In outline it was something like the following:


create or replace view v1
as
select * from t1
union all
select * from t2
;

select  /*+ parallel(v1 2) */ * 
from    v1 
where   pv_col between 1 and 10
and     date_col = to_date('1-Apr-1999','dd-mm-yyyy')

If you had followed the rules about partition views then Oracle would generate some code that managed to combine the partitioning definitions with the view definition and query predicates and come up with rewritten code for the parallel query slaves that looked something like (e.g.)

select  {list of columns}
from    t1
where   pv_col between 1 and 10 
and     pv_col >= 0 and pv_col < 3000 -- > comment to avoid wordpress format issue 
and     date_col = to_date(:SYS_B1,'dd-mm-yyyy') 
union all 
select  {list of columns} 
from    t2 
where   pv_col between 1 and 10 
and     pv_col >= 3000 and pv_col < 6000 -- > comment to avoid wordpress format issue
and     date_col = to_date(:SYS_B2,'dd-mm-yyyy')

In this case I’ve defined my partitions (using constraints on the underlying tables) so that t1 will hold rows where pv_col >= 0 and pv_col < 3000, and t2 will hold rows where pv_col >= 3000 and pv_col < 6000. As you can see the optimizer has expanded the query with the view text and  pulled the constraints into the query and will be able to bypass all access to t2 because it can reduce the pv_col predicates on t2 into the contradiction “1 = 0”. Here’s the basic form of the execution plan we’d be looking for with partition elimination:


-----------------------------------------------------------------------------------------------------------------
| Id  | Operation               | Name     | Rows  | Bytes | Cost (%CPU)| Time     |    TQ  |IN-OUT| PQ Distrib |
-----------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT        |          |    15 |  1500 |     2   (0)| 00:00:01 |        |      |            |
|   1 |  PX COORDINATOR         |          |       |       |            |          |        |      |            |
|   2 |   PX SEND QC (RANDOM)   | :TQ10000 |    15 |  1500 |     2   (0)| 00:00:01 |  Q1,00 | P->S | QC (RAND)  |
|   3 |    VIEW                 | V1       |    15 |  1500 |     2   (0)| 00:00:01 |  Q1,00 | PCWP |            |
|   4 |     UNION-ALL           |          |       |       |            |          |  Q1,00 | PCWP |            |
|   5 |      PX BLOCK ITERATOR  |          |    12 |  1308 |     2   (0)| 00:00:01 |  Q1,00 | PCWC |            |
|*  6 |       TABLE ACCESS FULL | T1       |    12 |  1308 |     2   (0)| 00:00:01 |  Q1,00 | PCWP |            |
|*  7 |      FILTER             |          |       |       |            |          |  Q1,00 | PCWC |            |
|   8 |       PX BLOCK ITERATOR |          |     1 |   109 |     2   (0)| 00:00:01 |  Q1,00 | PCWC |            |
|*  9 |        TABLE ACCESS FULL| T2       |     1 |   109 |     2   (0)| 00:00:01 |  Q1,00 | PCWP |            |
-----------------------------------------------------------------------------------------------------------------


Note the FILTER at operation 7 – the predicate there is actually “NULL IS NOT NULL” for new versions of Oracle, but would have been “1 = 0” for older versions, so operations 8 and 9 would not get executed.

Notice, also, that my date predicate changes from a literal to a bind variable. This just happened to be the way things were done for parallel query in that version of Oracle. So now we can get to the bug. Our data was actually partitioned by day using a date(-only) column, and the initial target was to keep two years of data. The trouble was that certain queries kept crashing with Oracle error “ORA-12801: error signaled in parallel query server”.

Naturally I created a simplified model (smaller tables, fewer partitions in the views) to try and track down the problem – and the problem disappeared. So I took a crashing query from production, and started creating partition views with fewer and fewer table until the query stopped crashing, and what I discovered was the following:

  • If you had 2 dates in the query it crashed if the view held 128 or more tables
  • If you had 3 dates in the query it crashed if the view held 86 or more tables

Think about the arithmetic for a moment: 2 * 128 = 256, 3 * 86 = 258. Does this give you a clue ?

What it suggested to me was that someone in Oracle Development had used a single byte to index the array of bind variables that they defined for use with the generated parallel query text, so when you had 2 dates and needed 256 bind variable the counter overflowed, when you had 3 dates and needed 258 bind variables the counter overflowed. Having made the hypothesis I predicted that a query would crash if there were 4 dates and 64 partitions, but survive if there were only 63 partitions. (And I was right.)

When I called this in to Oracle support (who remembers those early days when you were supposed to “phone” Oracle support on a “land-line”) and suggested the source of the problem I was told that there was “no limit on the number of bind variables a query could handle in Oracle”. Notice how this is essentially a true statement – but has nothing to do with my suggestion.

Several months (maybe even a couple of years) later – long after the client had upgraded to 7.3.2 then 7.3.4 and I was no long on site – I got a call from Oracle support who wanted to close the TAR (as SR’s used to be known) because they’d discovered the problem and it was … see above. I got the feeling that no-one had considered my suggestion for a long time because they “knew” it had to be  wrong.

The moral(s) of the story

  • Listen to the question carefully – you may not  have heard what you were assuming you would hear.
  • Listen to the answer carefully – it may sound like a convincing response to your question while not being relevant to the actual question.
  • “It’s not supposed to do that” isn’t the same as “That didn’t happen” or (to mis-quote a well-known philosophical problem): “you can’t turn an ‘ought not’ into a ‘did not”

One thing that’s worth emphasising is that everyone (and that does include me) will occasionally hear what they’re expecting to hear and completely misunderstand the point of the question. So when someone says something which is clearly very silly pause for thought then ask them, with care and precision, if they just said what you thought they said – maybe what they said and what you heard were very different. (The same is true for twitter, list servers, forums etc. – it’s very easy to misinterpret a short piece of text, and it may be the way it’s been written but it may be the way it’s been read.)

 

July 9, 2019

Assumptions

Filed under: Infrastructure,IOT,Oracle,Philosophy — Jonathan Lewis @ 11:47 am BST Jul 9,2019

Over the last few days I’ve been tweeting little extracts from Practical Oracle 8i, and one of the tweets contained the following quote:

This lead to the question:

Good question! The whole undo/redo infrastructure in Oracle is probably the most astounding technological achievement in the entire code base – so would you test it to see that it was working properly and if you could break it ? Probably not – although if you were about to recreate your undo tablespace with a 32KB block size you might test to see if the change would produce any surprise side-effects); or you might wonder if anything funny could happen to the redo generation  if you created all your varchar2() columns as 4000 bytes “just in case”. or possibly you’d check for undo or redo anomalies if you were told to create a table with more than 255 columns.

I don’t know quite what I was trying to imply (20 years ago) when I wrote the quoted sentence. Possibly I was trying to avoid saying “new features”, because it’s not just the new features you need to test. I was probably trying to suggest the flavour of “exotic”, “high-tech”, “exciting” – which basically comes down to the things where you think you might be (h/t Martin Widlake) a “thought leader” or ground-breaker.  If very few people have used some feature of Oracle you might be the first person to use that feature in a specific fashion – so if there’s a surprise (or bug) waiting to be found you’ll be the first to find it and you don’t want to find it in production.

Anything in Oracle might have an odd boundary condition, and life (or the project life-cycle) is too short to test everything – but almost any time you feel you may be going beyond “common usage”, it’s worth thinking about what might go wrong.

As a closing item of entertainment – here’s a little demonstration (last run on 19.2):


rem
rem     Script:         assumptions.sql
rem     Author:         Jonathan Lewis
rem     Dated:          July 2019
rem     Purpose:
rem
rem     Last tested
rem             19.2.0.0        (LiveSQL)
rem             18.3.0.0
rem
rem     Notes:
rem     Add the predicate "where rownum <= 1600" to test on LiveSSQL
rem

create table t1
as
select * from all_objects
/

create table pt1(
        OWNER, OBJECT_NAME, SUBOBJECT_NAME,
        OBJECT_ID constraint pt1_pk primary key using index local,
        DATA_OBJECT_ID, OBJECT_TYPE, CREATED, LAST_DDL_TIME, TIMESTAMP, STATUS,
        TEMPORARY, GENERATED, SECONDARY, NAMESPACE, EDITION_NAME, SHARING, EDITIONABLE,
        ORACLE_MAINTAINED, APPLICATION, DEFAULT_COLLATION, DUPLICATED,
        SHARDED, CREATED_APPID, CREATED_VSNID, MODIFIED_APPID, MODIFIED_VSNID
)
partition by hash (object_id) (
        partition p1,
        partition p2,
        partition p3,
        partition p4
)
as
select * from all_objects
/

create table iot1 (
        OWNER, OBJECT_NAME, SUBOBJECT_NAME,
        OBJECT_ID constraint iot1_pk primary key,
        DATA_OBJECT_ID, OBJECT_TYPE, CREATED, LAST_DDL_TIME, TIMESTAMP, STATUS,
        TEMPORARY, GENERATED, SECONDARY, NAMESPACE, EDITION_NAME, SHARING, EDITIONABLE,
        ORACLE_MAINTAINED, APPLICATION, DEFAULT_COLLATION, DUPLICATED,
        SHARDED, CREATED_APPID, CREATED_VSNID, MODIFIED_APPID, MODIFIED_VSNID
)
organization index
as
select * from all_objects
/

create table ptiot1 (
        OWNER, OBJECT_NAME, SUBOBJECT_NAME,
        OBJECT_ID constraint ptiot1_pk primary key,
        DATA_OBJECT_ID, OBJECT_TYPE, CREATED, LAST_DDL_TIME, TIMESTAMP, STATUS,
        TEMPORARY, GENERATED, SECONDARY, NAMESPACE, EDITION_NAME, SHARING, EDITIONABLE,
        ORACLE_MAINTAINED, APPLICATION, DEFAULT_COLLATION, DUPLICATED,
        SHARDED, CREATED_APPID, CREATED_VSNID, MODIFIED_APPID, MODIFIED_VSNID
)
organization index
partition by hash (object_id) (
        partition p1,
        partition p2,
        partition p3,
        partition p4
)
as
select * from all_objects
/

alter table t1 move online;

alter table pt1 move partition p1 online;

alter table iot1 move online;

alter table ptiot1 move partition p1 online;

It’s a simple test. Copying data from view all_objects I’ve created:

  • A simple heap table
  • A hash partitioned heap table – with locally partitioned primary key index
  • A simple index-organized table
  • A hash partitioned index organized table

Then I’ve issued an online move command for each table. I often lose track of which enhancements to features appeared in which version of Oracle, but I think the following is correct:

  • alter table t1 move online – the online option became possible in 12.2
  • alter table pt1 move partition online – the online option because possible in 12.1
  • alter table iot1 move online – the online option (for IOTs) became possible in Oracle 8i (and gets a mention in Practical Oracle 8i)
  • alter table ptiot1 move partition online – any guesses ?)

In the absence of my bait-and-switch lead-up to the final question I think you could be forgiven for assuming that you would be able to move a partition of a partitioned index-organized table online – but even in 19.2 you’ll end up with error message: ORA-14808: table does not support ONLINE MOVE PARTITION.

In a vacuum it’s okay to make the mistake – on the other hand if someone suggested changing a partitioned table in your production system into a partitioned IOT it ought to be one of the first things you’d check (on a small model). Sadly I have been in design meetings where weeks of effort have been spent on producing a detailed design that can’t possibly work because no-one checked to see if some critical detail (like online move of IOT partitions) was actually possible – and that’s the background for the statement:

“If you’re going to depend on a technological feature of Oracle, you need to make sure that you have tried to break it, in half a dozen ways, before you use it in production.”

There are many technological features of Oracle that you can assume (safely) have been tested by many other people – when you get to the edge of the known universe your watchword should be: Here be Dragons.

 

August 29, 2018

RTFM ?

Filed under: Non-technical,Philosophy — Jonathan Lewis @ 11:04 am BST Aug 29,2018

My entrance at the Polish Oracle User Group conference 2017 has just resurfaced on Twitter. There is a back-story to this which contains an allegorical lesson in using Oracle. As I said in the opening remarks in the subsequent presentation: “The lesson began before I got to the stage”.

Like all the other speakers at POUG2017 I had received an email asking me to name a tune for a playlist. Having decided that Beethoven, Brahms and Tchaikowski were not the sort of composers the organisers had in mind I nominated (with a touch of irony) “Who wants to live forever?” by Queen (despite Richard Foote’s strenuous efforts to turn the Oracle world to David Bowie).

When the conference started I noticed two things: first, that the tunes for the “playlist” were actually being used to accompany speakers to the stage, secondly that the admin staff were all wearing monk-like gowns and hoods. So I asked for my tune to be changed to “The Imperial March” (Darth Vader’s theme) and borrowed the gown from the tallest admin person, with the results you see in the video clip.

So what’s the Oracle allegory ?

First you read the manuals, then you observe how it really works before you go live.

 

February 20, 2018

Taking Notes – 2

Filed under: Oracle,Philosophy — Jonathan Lewis @ 11:08 am GMT Feb 20,2018

[Originally written August 2015, but not previously published]

If I’m taking notes in a presentation that you’re giving there are essentially four possible reasons:

  • You’ve said something interesting that I didn’t know and I’m going to check it and think about the consequences
  • You’ve said something that I knew but you’ve said it in a way that made me think of some possible consequences that I need to check
  • You’ve said something that I think is wrong or out of date and I need to check it
  • You’ve said something that has given me a brilliant idea for solving a problem I’ve had to work around in the past and I need to work out the details

Any which way, if I’m taking notes it means I’ve probably just added a few more hours of work to my todo list.

Footnote

“Checking” can include:

  • having a chat
  • reading the manuals
  • finding a recent Oracle white-paper
  • searching MoS
  • building some models

Philosophy

Filed under: Oracle,Philosophy — Jonathan Lewis @ 11:03 am GMT Feb 20,2018

Here’s a note I’ve just re-discovered – at the time I was probably planning to extend it into a longer article but I’ve decided to publish the condensed form straight away.

In a question to the Oak Table a couple of years ago (May 2015) Cary Millsap asked the following:

If you had an opportunity to tell a wide audience of system owners, users, managers, project leaders, system architects, DBAs, and developers “The most important things you should know about Oracle” what would you tell them?

I imagine that since then Cary has probably discussed the pros and cons of some of the resulting thoughts in one of his excellent presentations on how to do the right things, but this was my quick response:

If I had to address them all at once it would be time to go more philosophical than technical.

The single most important point: Oracle is a very large, complex, and flexible product. It doesn’t matter where you are approaching it from you will not have enough information on your own to make best use of it. You have to talk to your peer group to get alternative ideas, and you have to talk to the people at least one step either side of you on the technology chain (dev to dba, dba to sysadmin, Architect to dev, dba to auditor etc.) to understand options and consequences. Create 4 or 5 scenarios of how your system should behave and then get other people – and not just your peer group – to identify their advantages and threats.

Assumptions

Filed under: CBO,Oracle,Philosophy — Jonathan Lewis @ 8:57 am GMT Feb 20,2018

As the years roll on I’ve found it harder and harder to supply quick answers to “simple” questions on the Oracle-L list server and OTN/ODC forum because things are constantly changing and an answer that may have been right the last time I checked could now be wrong. A simple example of the consequences of change showed up recently on the OTN/ODC forum where one reply to a question started:

Just why do you need distinct in a subquery??? That’s the first thing that appears really shocking to me. If it’s a simple in (select …) adding a distinct to the subquery would just impose a sort unique(as you can see in the explain plan), which may be quite costly.

Three question-marks is already tip-toeing its way to the Pratchett limit – but “really shocking” ? It’s bad enough that the comment goes operatic but going operatic in order to introduce an error pushes the thing into tragedy (or possibly farce – or maybe both). To make the self-inflicted injury worse, there were two execution plans supplied in the original post anyway of which only one showed any attempt to achieve uniqueness.

Bottom line – when you’re about to correct someone for doing something that is “obviously” wrong, be a little bit kind about it and then be kind to yourself and do a quick sanity check to see if your attempt at correction is itself correct. A good guideline would be to ask yourself: “How do I know what I know – and am I about to make myself look like an idiot (again).”

Check It

Question: Does a  “distinct” in a subquery impose a sort (or hash) unique ?

Answer: No – a uniqueness operation may appear but it’s not guaranteed to appear.

Here’s a quick example which does not result in any attempt at imposing uniqueness (running 11.2.0.4):

rem
rem     Script:         unnest_demo.sql
rem     Author:         Jonathan Lewis
rem     Dated:          Feb 2018
rem

drop table t2 purge;
drop table t1 purge;
create table t1 as select * from all_objects where rownum  <= 100;
create table t2 as select * from all_objects where rownum <= 100;

create index t1_i1 on t1(owner);
create index t2_i2 on t2(object_type);


set autotrace traceonly explain

select  * 
from    t1 
where   owner = 'OUTLN' 
and     object_name in (
                select distinct object_name 
                from   t2 
                where  object_type = 'TABLE'
        )
;


Execution Plan
----------------------------------------------------------
Plan hash value: 3169044451

--------------------------------------------------------------------------------------
| Id  | Operation                    | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |       |     3 |   558 |     4   (0)| 00:00:01 |
|*  1 |  HASH JOIN SEMI              |       |     3 |   558 |     4   (0)| 00:00:01 |
|   2 |   TABLE ACCESS BY INDEX ROWID| T1    |     3 |   474 |     2   (0)| 00:00:01 |
|*  3 |    INDEX RANGE SCAN          | T1_I1 |     3 |       |     1   (0)| 00:00:01 |
|   4 |   TABLE ACCESS BY INDEX ROWID| T2    |    12 |   336 |     2   (0)| 00:00:01 |
|*  5 |    INDEX RANGE SCAN          | T2_I2 |    12 |       |     1   (0)| 00:00:01 |
--------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - access("OBJECT_NAME"="OBJECT_NAME")
   3 - access("OWNER"='OUTLN')
   5 - access("OBJECT_TYPE"='TABLE')

Note
-----
   - dynamic sampling used for this statement (level=2)


There’s no sign of a sort unique or hash unique. The optimizer has decided that the IN subquery can be transformed into an EXISTS subquery which can then be transformed into a semi-join.

I can think of three other execution plan strategies that might have appeared depending on the data, indexing, and statistics. Initially I had just hacked the text above to produce the following plans and hadn’t saved anything as a library script but following a request in the comments below I decided to recreate the whole text and report the hints I’d used. In all the following cases the hints I quote go in the subquery, not in the main body of the query:

a) Hinting /*+ no_unnest */ transforms the IN subquery to an EXISTS subquery which operates as a filter subquery (with no uniqueness imposed):


--------------------------------------------------------------------------------------
| Id  | Operation                    | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |       |     1 |   158 |     5   (0)| 00:00:01 |
|*  1 |  FILTER                      |       |       |       |            |          |
|   2 |   TABLE ACCESS BY INDEX ROWID| T1    |     3 |   474 |     2   (0)| 00:00:01 |
|*  3 |    INDEX RANGE SCAN          | T1_I1 |     3 |       |     1   (0)| 00:00:01 |
|*  4 |   TABLE ACCESS BY INDEX ROWID| T2    |     1 |    28 |     2   (0)| 00:00:01 |
|*  5 |    INDEX RANGE SCAN          | T2_I2 |    12 |       |     1   (0)| 00:00:01 |
--------------------------------------------------------------------------------------

b) Hinting /*+ unnest no_merge no_semijoin */ gets a simple unnest with sort/hash unique and join


--------------------------------------------------------------------------------------------
| Id  | Operation                       | Name     | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                |          |     3 |   525 |     5  (20)| 00:00:01 |
|*  1 |  HASH JOIN                      |          |     3 |   525 |     5  (20)| 00:00:01 |
|   2 |   JOIN FILTER CREATE            | :BF0000  |     3 |   474 |     2   (0)| 00:00:01 |
|   3 |    TABLE ACCESS BY INDEX ROWID  | T1       |     3 |   474 |     2   (0)| 00:00:01 |
|*  4 |     INDEX RANGE SCAN            | T1_I1    |     3 |       |     1   (0)| 00:00:01 |
|   5 |   VIEW                          | VW_NSO_1 |    12 |   204 |     3  (34)| 00:00:01 |
|   6 |    HASH UNIQUE                  |          |    12 |   336 |     3  (34)| 00:00:01 |
|   7 |     JOIN FILTER USE             | :BF0000  |    12 |   336 |     2   (0)| 00:00:01 |
|   8 |      TABLE ACCESS BY INDEX ROWID| T2       |    12 |   336 |     2   (0)| 00:00:01 |
|*  9 |       INDEX RANGE SCAN          | T2_I2    |    12 |       |     1   (0)| 00:00:01 |
--------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - access("OBJECT_NAME"="OBJECT_NAME")
   4 - access("OWNER"='OUTLN')
   9 - access("OBJECT_TYPE"='TABLE')


For my particular data set I had originally had to take the optimizer_features_enable back to ‘8.1.7’ to get the above  plan but on recreating the tests I realised that there was a way to get the plan from basic hints (though the modern version of Oracle has slipped a Bloom filter into the hash join). As you can see that there’s a HASH UNIQUE at operation 6, but that would have been there whether or not the DISTINCT keyword had appeared in the SQL. Effectively the query has been transformed to:


select  t1.*
from    (
                select  distinct t2.object_name object_name
                from    t2
                where   t2.object_type='TABLE'
        )
        vw_nso_1,
        t1
where   t1.owner = 'OUTLN'
and     t1.object_name = vw_nso_1.object_name
/

c) Hinting /*+ unnest no_semijoin merge */ results in unnesting then a “transform distinct aggregation” so that the distinct is applied after the join. (In the original text I had said this was using “place group by”. But that’s the transformation that pushes an aggregate inside a join while what’s happening here is one of the variants of the opposite transformation.)

--------------------------------------------------------------------------------------------
| Id  | Operation                      | Name      | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT               |           |     3 |   474 |     5  (20)| 00:00:01 |
|   1 |  VIEW                          | VM_NWVW_1 |     3 |   474 |     5  (20)| 00:00:01 |
|   2 |   HASH UNIQUE                  |           |     3 |   594 |     5  (20)| 00:00:01 |
|*  3 |    HASH JOIN                   |           |     3 |   594 |     4   (0)| 00:00:01 |
|   4 |     TABLE ACCESS BY INDEX ROWID| T1        |     3 |   510 |     2   (0)| 00:00:01 |
|*  5 |      INDEX RANGE SCAN          | T1_I1     |     3 |       |     1   (0)| 00:00:01 |
|   6 |     TABLE ACCESS BY INDEX ROWID| T2        |    12 |   336 |     2   (0)| 00:00:01 |
|*  7 |      INDEX RANGE SCAN          | T2_I2     |    12 |       |     1   (0)| 00:00:01 |
--------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   3 - access("T1"."OBJECT_NAME"="T2"."OBJECT_NAME")
   5 - access("T1"."OWNER"='OUTLN')
   7 - access("T2"."OBJECT_TYPE"='TABLE')

Note
-----
   - dynamic sampling used for this statement (level=2)

Again the plan would be the same whether or not the original subquery had a redundant distinct.

The things you think you know may have been true 10 years ago – but maybe they’re not true any longer, or maybe they’re still true on your version of the database but not every  version of the database. So I often end up looking at a question, thinking the poster’s claim can’t be right, and then working out and modelling the circumstances that might make the poster’s observations appear – and learning something new at the same time.

Remember: “I’ve never seen it” doesn’t mean “It doesn’t happen”.

Update (1st March 2018)

In a remarkably timely coincidence – showing that there’s always more to see no matter how carefully you think you’ve been looking – Nenad Noveljic shows us that sometimes it’s actually a positively good thing to have a “redundant” distinct, because it bypasses an optimizer bug.

 

May 1, 2017

RTFM

Filed under: Infrastructure,Oracle,Philosophy — Jonathan Lewis @ 12:55 pm BST May 1,2017

Imagine you’re fairly new to Oracle and don’t have a lot of background information at your fingertips; then one day someone tells you to read the manual pages for the view dba_free_space. Look carefully at this sentence [Nov 2020 – originally taken from the 12.1 reference manual, but still the same in the 19c reference manual]:

Note that if a data file (or entire tablespace) is offline in a locally managed tablespace, you will not see any extent information.

Can you spot the error? Did you spot the error when you first read the sentence – or did you fill in the gap without noticing what you were doing ?

Let’s demonstrate the accuracy of the statement with the following simple cut-n-paste from an SQL*Plus session on 12.1.0.2 running in archivelog mode, and with a locally managed tablespace consisting of 4 (oracle managed) files on a filesystem:


SQL> select * from dba_free_space where tablespace_name = 'LOB_TEST';

TABLESPACE_NAME                   FILE_ID   BLOCK_ID      BYTES     BLOCKS RELATIVE_FNO
------------------------------ ---------- ---------- ---------- ---------- ------------
LOB_TEST                                4        128   51380224       6272            4
LOB_TEST                                7        128   51380224       6272            7
LOB_TEST                                8        640   47185920       5760            8
LOB_TEST                                9        128   51380224       6272            9

4 rows selected.

SQL> select file#, ts#, name from v$datafile;

     FILE#        TS# NAME
---------- ---------- ----------------------------------------------------------------------
         1          0 /u02/app/oracle/oradata/OR32/datafile/o1_mf_system_cbcysq2o_.dbf
         2          9 /u02/app/oracle/oradata/OR32/datafile/o1_mf_undotbs_d84db0s2_.dbf
         3          1 /u02/app/oracle/oradata/OR32/datafile/o1_mf_sysaux_cbcyrmyd_.dbf
         4         15 /u02/app/oracle/oradata/OR32/datafile/o1_mf_lob_test_dhpchn57_.dbf
         5          6 /u02/app/oracle/oradata/OR32/datafile/o1_mf_test_8k__cbd120yc_.dbf
         6          4 /u02/app/oracle/oradata/OR32/datafile/o1_mf_users_cbcyv47y_.dbf
         7         15 /u02/app/oracle/oradata/OR32/datafile/o1_mf_lob_test_dhpchnnq_.dbf
         8         15 /u02/app/oracle/oradata/OR32/datafile/o1_mf_lob_test_dhpcho47_.dbf
         9         15 /u02/app/oracle/oradata/OR32/datafile/o1_mf_lob_test_dhpchok1_.dbf

9 rows selected.

SQL> alter database datafile '/u02/app/oracle/oradata/OR32/datafile/o1_mf_lob_test_dhpchnnq_.dbf' offline;

Database altered.

SQL> select * from dba_free_space where tablespace_name = 'LOB_TEST';

TABLESPACE_NAME                   FILE_ID   BLOCK_ID      BYTES     BLOCKS RELATIVE_FNO
------------------------------ ---------- ---------- ---------- ---------- ------------
LOB_TEST                                4        128   51380224       6272            4
LOB_TEST                                8        640   47185920       5760            8
LOB_TEST                                9        128   51380224       6272            9

3 rows selected.

SQL> recover datafile '/u02/app/oracle/oradata/OR32/datafile/o1_mf_lob_test_dhpchnnq_.dbf';
Media recovery complete.
SQL> alter database datafile '/u02/app/oracle/oradata/OR32/datafile/o1_mf_lob_test_dhpchnnq_.dbf' online;

Database altered.

SQL> select * from dba_free_space where tablespace_name = 'LOB_TEST';

TABLESPACE_NAME                   FILE_ID   BLOCK_ID      BYTES     BLOCKS RELATIVE_FNO
------------------------------ ---------- ---------- ---------- ---------- ------------
LOB_TEST                                4        128   51380224       6272            4
LOB_TEST                                7        128   51380224       6272            7
LOB_TEST                                8        640   47185920       5760            8
LOB_TEST                                9        128   51380224       6272            9

4 rows selected.

SQL> spool off

See the bit in the middle where I have “3 rows selected” for the lob_test tablespace: the manual says I “will not see any extent information” – but the only change in the output is the absence of information about the one data file that I’ve put offline.

You may want to argue that “obviously” the statement was only about the data file that was offline – but is that a couple of years experience allowing you to interpret the text? Some people might assume that the comment was about the whole of one single tablespace disappearing, some might assume it meant the entire view became empty – and maybe others would criticise them for making unwarranted assumptions.

But maybe you’re a novice and believed what the manual actually said.

It’s a fairly silly example, of course, but the point of this note is that when you tell someone to RTFM remember that they might actually do exactly that and not have the benefit of being able to know (unthinkingly) that the manual is wrong. If you go one step further and tell them to “stop making assumptions and RTFM” then just remember that you probably make a lot of assumptions without realising it when you read the manuals, and maybe it’s your assumptions that lead you to the correct interpretation of the manual.

Footnote:

If you’re feeling in the mood to split hairs, don’t forget that dba_free_space doesn’t usually give you any information about extents when it’s reporting locally managed tablespaces, it tells you about the space in which extents can be created. The one exception (that I know of) is when you have an object in the recyclebin and each extent of that object is listed as free space (see this article and the footnote here).  It’s only for dictionary managed tablespaces that dba_free_space reports extent information – the rows stored in the fet$ table.

 

 

April 3, 2017

Block Names

Filed under: Oracle,Philosophy — Jonathan Lewis @ 1:04 pm BST Apr 3,2017

There are a number of tiny details that I can never remember when I’m sketching out models to test ideas and one of those is the PL/SQL block name. Virtually every piece of PL/SQL I write ends up with variables that have one of two prefixes to their names “M_” or “G_” (for “memory” or “global” respectively) but I probably ought to be more formal than that, so here’s one example of using the option to label pl/sql blocks – specifically labelling anonymous blocks from SQL*Plus – using a trivial and silly bit of code:


rem
rem     Script:         plsql_block_names.sql
rem     Author:         Jonathan Lewis
rem     Dated:          March 2017
rem

create table t1
as
select * from all_objects
;

--
--        Anonymous PL/SQL block with label, and a couple of 
--        uses of the label to qualify variable names.
--

<<my_block>>
declare
        data_object_id  t1.data_object_id%type;
        object_id       t1.object_id%type;
begin

        select  data_object_id
        into    my_block.data_object_id
        from    t1
        where
                data_object_id is not null
        and     rownum = 1
        ;

        select  object_id
        into    my_block.object_id
        from    t1
        where
                data_object_id = my_block.data_object_id
        and     rownum = 1
        ;

        dbms_output.put_line('Object: ' || object_id || ' - ' || my_block.object_id);
        dbms_output.put_line('Data Object: ' || data_object_id || ' - ' || my_block.data_object_id);

end;
/

Object: 16 - 16
Data Object: 6 - 6

The important point, of course, is that with a qualified variable name you eliminate the risk of a problem that appears occasionally on one of the Oracle forums or list-servers where someone has lost track of duplicated variable names or used (as I have above) a variable name that matches a column name, and doesn’t notice that Oracle has captured the “wrong” interpretation of the name at some other point in the code.

You’ll note that when you declare a block name it has to be surrounded by doubled angle brackets – and now that I’ve written about it I might actually remember to use them for a couple of weeks.

Footnote:

This isn’t the only use for labels, by the way, it’s just one that probably ought to be used more frequently in production code. You could/should also follow the same strategy with functions and procedures – when you use a variable declared inside a function or procedure you should qualify it with the function/procedure name.

September 20, 2016

Why Bother

Filed under: Oracle,Philosophy — Jonathan Lewis @ 11:16 am BST Sep 20,2016

This note comes to you prompted by “Noons” in a recent twitter exchange

In response to a complaint by Lukas Eder about having to educate people in 2016 that there is no (performance) difference between count(*) and count(1), Nuno  asked me to blog about my claim that this non-difference is a good educational example on at least three different counts.

One thing I won’t count in the list of three is an actual demonstration of the non-difference. I would be surprised if anyone reading this blog needed a proof of the point, but if you’re interested here are a few examples from my past writings – including a couple where the problem becomes subtler and the “non-difference” stops happening:

You’ll notice that the earliest note I’ve published on the blog dates back 8 years or more; and one of the notes references back to a FAQ dated 2001 published on my old website. So why bother making a fuss about this non-difference when it ought to be common knowledge and really isn’t worth the effort.

  • There is a huge amount of garbage on the internet about how Oracle works and what you can to do make it work better, so when you demonstrate to someone that claims can be backed up by proof and an idea that they’ve picked up from the internet is wrong it might make them pause for thought the next time they read some unjustified claim that really matters.
  • This is a delightfully simple example to use to demonstrate how you can get a little extra information from an execution plan that could help you understand the cause of a performance problem.  (I could make the same comment about being able to get something helpful out of a 10053 (optimizer) trace file, but I won’t claim that as reason number 3)
  • Starting even with this extremely simple example that proves one point we can see what happens as we push the boundaries of what we know. Thinking about this problem teaches us how to come up with new questions and modified examples which enhance our understanding of the product.

Frankly I find it extremely tedious how often I have to explain mechanisms that I first described in Practical Oracle 8i – it can be terribly depressing to see how uninformed people can be – but there are always new people starting their careers in Oracle, the manuals they look to are more concerned with describing how to use GUIs than with explaining the technology, and the old garbage on the internet won’t go away. New arrivals have little previous experience for telling the difference between fact and fiction until someone provides the demonstration – so we keep repeating the simple examples.

My favourite quote about the internet came by way of Terry Pratchett (in “The Truth”): A lie can run round the world before the truth has got its boots on.”

You just have to keep hammering away at the truth.

 

June 24, 2016

Never …

Filed under: Infrastructure,Oracle,Philosophy — Jonathan Lewis @ 1:15 pm BST Jun 24,2016

From time to time a question comes up on OTN that results in someone responding with the mantra: “Never do in PL/SQL that which can be done in plain  SQL”. It’s a theme I’ve critiqued a couple of times before on this blog, most recently with regard to Bryn Llewellyn’s presentation on transforming one table into another and Stew Ashton’s use of Analytic functions to solve a problem that I got stuck with.

Here’s a different question that challenges that mantra. What’s the obvious reason why someone might decide to produce something like the following pattern of code rather than writing a simple “insert into t1 select * from t2;”:


declare

        cursor c1 is
        select * from t2
        ;

        type c1_array is table of c1%rowtype index by binary_integer;
        m_tab c1_array;

begin

        open c1;
        loop
                fetch c1
                bulk collect into m_tab limit 100;

                begin
                        forall i in 1..m_tab.count
                                insert into t1 values m_tab(i);
                exception
                        when others
                                then begin
                                        --  proper exception handling should go here
                                        dbms_output.put_line(m_tab(1).id);
                                        dbms_output.put_line(sqlerrm);
                                end;
                end;

                exit when c1%notfound;

        end loop;
        close c1;
end;
/

There is a very good argument for this approach.

Follow-up (Saturday 25th)

As Andras Gabor pointed out in one of the comments, there are documented scenarios where the execution plan for a simple select statement is not legal for the select part of an “insert into .. select …” statement. In particular, if you have a distributed query the most efficient execution plan may require the remote site to be the driving site but the plan for a create local_table as select (CTAS) or insert into local_table/select is required to use the local site as the driving site. There are workarounds – if you’re allowed to use them – such as creating a view at the remote site and selecting from the view, or you could create a pipelined function locally and select from the pipelined function (but that’s going to be writing PL/SQL anyway, and you’d have to create one or two object types in the database to implement it).

Another example of plan limitations, that I had not seen before (but have now found documented as “not a bug” in MoS note 20112932), showed up in a comment from Louis: a select statement may run efficiently because the plan uses a Bloom filter, but the filter disappears when the statement is used in insert/select.

These limitations, however, were not the point I had in mind. The “obvious” reason for taking the PL/SQL approach is error handling. What happens if one of the rows in your insert statement raises an Oracle exception ? The entire statement has to rollback. If you adopt the PL/SQL array processing approach then you can trap each error as it occurs and decide what to do about it – and there’s a detail behind that statement that is really important: a PL/SQL array insert can operate at virtually the same speed as the simple SQL statement once you’ve set the arraysize to a value which allows each insert to populate a couple of blocks.

Let me emphasise the critical point of the last sentence:  array inserts in PL/SQL operate at (virtually) the speed of the standard SQL insert / select.

As it stands I don’t think the exception handler in my code above could detect which row in the batch had caused the error – I’ve just printed the ID from the first row in the batch as a little debug detail that’s only useful to me because of my knowledge of the data. Realistically the PL/SQL block to handle the inserts might look more like the following:


-- In program declaration section

        dml_errors      exception;
        pragma exception_init(dml_errors, -24381);

        m_error_pos     number(6,0)     := 0;

-- ------------------------------

                begin
                        forall i in 1..m_tab.count save exceptions
                                insert into t1 values m_tab(i);
                exception
                        when dml_errors then begin

                                for i in 1..sql%bulk_exceptions.count loop

                                        dbms_output.put_line(
                                                'Array element: ' ||
                                                        sql%bulk_exceptions(i).error_index || ' ' ||
                                                        sqlerrm(-sql%bulk_exceptions(i).error_code)
                                        );

                                        m_error_pos := sql%bulk_exceptions(i).error_index;
                                        dbms_output.put_line(
                                                'Content: ' || m_tab(m_error_pos).id || ' ' || m_tab(m_error_pos).n1
                                        );

                                end loop;
                        end;

                        when others then raise;
                end;


You’ll notice that I’ve added the save exceptions clause to the forall statement. This allows Oracle to trap any errors that occur in the array processing step and record details of the guilty array element as it goes along, storing those details in an array calls sql%bulk_exceptions. My exception handler then handles the array processing exception by walking through that array.

I’ve also introduced an m_error_pos variable (which I could have declared inside the specific exception handler) to remove a little of the clutter from the line that shows how to identify exactly which row in the source data caused the problem. With a minimum of wasted resources this code now inserts all the valid rows and reports the invalid rows and, if required, could be enhanced to take appropriate action on each invalid row as it appears.

If you’ve got a data loading requirement where almost all the data is expected to be correct but errors occasionally happen then this type of coding strategy is likely to be the most appropriate you could use to get your data into the database. It may be slightly slower than a simple insert/select when there were no errors in the data, but that’s a good insurance premium when compared with the crash and complete rollback that would otherwise take place when there were any errors in the data. You might argue that you could validate the data before inserting but there are bound to be cases where a pre-emptive (row by row?) check of all the data would add far more overhead than the little bit of PL/SQL processing shown here.

Results

It’s obviously a little difficult to produce any time-based rates that demonstrate the similarity in performance of the SQL and PL/SQL approaches – the major time component in a little demo I built was about the I/O rather than the CPU (which, in itself, rather validates the claim anyway). But if you want to do some testing here’s my data model with some results in the following section:


rem
rem     Script: plsql_loop_insert.sql
rem     Author: Jonathan Lewis
rem     Dated:  June 2016
rem

execute dbms_random.seed(0)

create table t1
nologging
as
with generator as (
        select  --+ materialize
                rownum id
        from dual
        connect by
                level <= 1e4
)
select
        cast(rownum as number(8,0))                     id,
        2 * trunc(dbms_random.value(1e10,1e12))         n1,
        cast(lpad('x',100,'x') as varchar2(100))        padding
from
        generator       v1,
        generator       v2
where
        rownum <= 1e6
;
create table t2
nologging
noparallel
as
select
        /*+ no_parallel(t1) */
        id + 1e6        id,
        n1 - 1          n1,
        rpad('x',100,'x') padding
from t1 
;

-- update t2 set n1 = n1 + 1 where id = 2e6;
-- update t2 set n1 = n1 + 1 where id = 2e6 - 10;
-- update t2 set n1 = n1 + 1 where id = 2e6 - 20;
-- update t2 set n1 = n1 + 1 where id = 1750200;
-- update t2 set n1 = n1 + 1 where id = 1500003;
-- update t2 set n1 = n1 + 1 where id = 1500001;
commit;

alter system checkpoint;
alter system switch logfile;

begin
        dbms_stats.gather_table_stats(
                ownname          => user,
                tabname          =>'T1',
                method_opt       => 'for all columns size 1'
        );

        dbms_stats.gather_table_stats(
                ownname          => user,
                tabname          =>'T2',
                method_opt       => 'for all columns size 1'
        );
end;
/

create unique index t1_i1 on t1(n1) nologging;
create unique index t1_pk on t1(id) nologging;
alter table t1 add constraint t1_pk primary key(id);


I’ve generated 1 million rows with an id column and a random integer – picking the range of the random numbers to give me a very good chance (that worked) of getting a unique set of values. I’ve doubled the random values I use for t1 so that I can subtract 1 and still guarantee uniqueness when I generate the t2 values (I’ve also added 1 million to the id value for t2 for the same uniqueness reasons).

The optional update to add 1 to a scattering of rows in t2 ensures that those values go back to their original t1 values so that they can cause “duplicate key” errors. The SQL insert was a simple insert into t1 select * from t2 (ensuring that parallel query didn’t come into play), and the PL/SQL detail I used was as follows:

declare

        cursor c1 is
        select /*+ no_parallel(t2) */ * from t2
        ;

        type c1_array is table of c1%rowtype index by binary_integer;
        m_tab c1_array;

        dml_errors      exception;
        pragma exception_init(dml_errors, -24381);

        m_error_pos     number(6,0)     := 0;

begin

        open c1;
        loop
                fetch c1
                bulk collect
                into m_tab limit 100;

                begin
                        forall i in 1..m_tab.count save exceptions
                                insert into t1 values m_tab(i);

                exception
                        when dml_errors then begin

                                for i in 1..sql%bulk_exceptions.count loop

                                        dbms_output.put_line(
                                                'Array element: ' ||
                                                        sql%bulk_exceptions(i).error_index || ' ' ||
                                                        sqlerrm(-sql%bulk_exceptions(i).error_code)
                                        );

                                        m_error_pos := sql%bulk_exceptions(i).error_index;
                                        dbms_output.put_line(
                                                'Content: ' || m_tab(m_error_pos).id || ' ' || m_tab(m_error_pos).n1
                                        );

                                end loop;
                        end;

                        when others then raise;

                end;

                exit when c1%notfound;  -- when fetch < limit

        end loop;
        close c1;
end;
/

The PL/SQL output with one bad row (2e6 – 20) looked like this:

Array element: 80 ORA-00001: unique constraint (.) violated
Content: 1999980 562332925640

Here are some critical session statistics for different tests in 11g:


No bad data, insert select
--------------------------
Name                                                 Value
----                                                 -----
CPU used when call started                             944
CPU used by this session                               944
DB time                                              1,712
redo entries                                     1,160,421
redo size                                      476,759,324
undo change vector size                        135,184,996

No bad data, PL/SQL loop
------------------------
Name                                                 Value
----                                                 -----
CPU used when call started                             990
CPU used by this session                               990
DB time                                              1,660
redo entries                                     1,168,022
redo size                                      478,337,320
undo change vector size                        135,709,056


Duplicate Key (2e6-20), insert select (with huge rollback)
----------------------------------------------------------
Name                                                 Value
----                                                 -----
CPU used when call started                           1,441
CPU used by this session                             1,440
DB time                                              2,427
redo entries                                     2,227,412
redo size                                      638,505,684
undo change vector size                        134,958,012
rollback changes - undo records applied          1,049,559

Duplicate Key (2e6-20), PL/SQL loop - bad row reported
------------------------------------------------------
Name                                                 Value
----                                                 -----
CPU used when call started                             936
CPU used by this session                               936
DB time                                              1,570
redo entries                                     1,168,345
redo size                                      478,359,528
undo change vector size                        135,502,488
rollback changes - undo records applied                 74

Most of the difference between CPU time and DB time in all the tests was file I/O time (in my case largely checkpoint wait time because I happened to have small log files) but in larger systems it’s quite common to see a lot of time spent on db file sequential reads as index blocks are read for update). You can see that there’s some “unexpected” variation in CPU time – I wasn’t expecting the PL/SQL loop that failed after nearly 1M inserts to use less CPU than anything else – but the CPU numbers fluctuated a few hundredths of a second across tests, this just happened to be particularly noticeable with the first one I did – so to some extent this was probably affected by background activity relating to space management, job queue processing and all the other virtual machines on the system.

Critically I think it’s fair to say that the differences in CPU timing are not hugely significant across a reasonably sized data set, and most importantly the redo and undo hardly vary at all between the successful SQL and both PL/SQL tests. The bulk processing PL/SQL approach doesn’t add a dramatic overhead – but it clearly does bypass the threat of a massive rollback cost.

Footnote:

You might want to argue the case for using basic SQL with the log errors clause. The code method is simple and it gives you a table of rows which have caused exceptions as the insert executed – and that may be sufficient for your purposes; but there’s a problem until you upgrade to 12c.

Here’s how I had to modify my test case to demonistrate the method:


begin
        dbms_errlog.create_error_log('t1');
end;
/

insert into t1 select * from t2
log errors
reject limit unlimited
;

The procedure call creates a table to hold the bad rows, by default it’s name will be err$_t1, and it will be a clone of the t1 table with changes to column types (which might be interesting if you’ve enabled 32K columns in 12c) and a few extra columns. There are some features of Oracle that aren’t supported by the error logging table (LOB columns, for example) so you may need to include the parameter “skip_unsupported => true” in the call to create_error_log:


SQL> desc err$_t1
 Name                          Null?    Type
 ----------------------------- -------- --------------------
 ORA_ERR_NUMBER$                        NUMBER
 ORA_ERR_MESG$                          VARCHAR2(2000)
 ORA_ERR_ROWID$                         ROWID
 ORA_ERR_OPTYP$                         VARCHAR2(2)
 ORA_ERR_TAG$                           VARCHAR2(2000)
 ID                                     VARCHAR2(4000)
 N1                                     VARCHAR2(4000)
 PADDING                                VARCHAR2(4000)

SQL> execute print_table('select * from err$_t1')
ORA_ERR_NUMBER$               : 1
ORA_ERR_MESG$                 : ORA-00001: unique constraint (TEST_USER.T1_I1) violated

ORA_ERR_ROWID$                :
ORA_ERR_OPTYP$                : I
ORA_ERR_TAG$                  :
ID                            : 1999980
N1                            : 562332925640
PADDING                       : xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx

So what’s the problem with logging errors ? Here are the sets of session stats corresponding to the ones that I reported above for the SQL and PL/SQL options. The first set comes from running this test on 11.2.0.4, the second from 12.1.0.2:


11g results
===========
Name                                                 Value
----                                                 -----
CPU used when call started                           1,534
CPU used by this session                             1,534
DB time                                              2,816
redo entries                                     3,113,105
redo size                                      902,311,860
undo change vector size                        269,307,108

12c results
===========
Name                                                 Value
----                                                 -----
CPU used when call started                             801
CPU used by this session                               801
DB time                                              3,061  -- very long checkpoint waits !!
redo entries                                     1,143,342
redo size                                      492,615,336
undo change vector size                        135,087,044


The 12c stats are very similar to the stats from the perfect SQL run and the two PL/SQL runs – but if you look at the 11g stats you’ll see that they’re completely different from all the other stats. The number of redo entries (if nothing else) tells you that Oracle has dropped back from array processing to single row processing in order to be able to handle the error logging (1 million rows, one entry for each row, it’s PK index entry, and the unique key index entry.)

Until 12c “error logging” is just row by row processing.

Footnote:

As far as I can tell, I first pointed out this “single row processing” aspect of the log errors option some time around December 2005.

Late Entry:

While looking for a posting about efficient updates  I came across another of my posting that compares SQL with PL/SQL for updates – it’s worth a read.

 

January 1, 2016

Expert

Filed under: Oracle,Philosophy — Jonathan Lewis @ 1:02 pm GMT Jan 1,2016

Someone sent me the following email a few years ago. It’s a question that comes up fairly frequently and there’s no good answer to it but, unusually, I made an attempt to produce a response. I’ve decided that I’d start this year by presenting the question and quoting the answer I gave, so here – with no editing – is the question:

I’m disturbing you for some help about becoming an Oracle master expert. Probably you are getting this kind of emails a lot but I would be appreciate if you give a small answer to me at least.

First, shortly I want to introduce my self. I’m an *Oracle Trainer* in Turkey Oracle University for 2 years. Almost for 4 years, I worked as software engineer and meet with Oracle on these days. After a while I decided to develop myself in Oracle database technologies and become trainer as i said. I also give consultancy services about SQL / PLSQL development and especially* SQL / PLSQL tuning*. I really dedicate myself to these subjects. As a trainer I also give DBA workshop lectures but in fact I didnt actually did dba job in a production system. I have the concept and even read everything I found about it but always feel inadequate because didnt worked as a DBA on a production system. So many DBA’s has taken my class and they were really satisfied (they have got all answers for their questions) but I did not. I’m a good trainger (with more that 97 average points in oracle evaluations) but I want to be best.

Even in SQL / PLSQL tuning, I know that I am really good at it but I also aware that there are some levels and I can not pass through the next level. for ex: I can examine execution plan (index structures, access paths etc), find cpu and io consumption using hierarchical profiler and solve the problem but can’t understand yet how to understand how much IO consumed by query and understand slow segments. if you remember, for a few days ago, on OTN you answered a question that I involved about sequence caching and Log file sync event. There, I said that sequence can cause to log file sync event (and as you said that was true) but when someone else write a simple code and couldnt see this event, I couldnt answer to him, you did (you said that it was because optimizing).

that is the level what i want to be. I am really working on this and age on 29. but whatever I do I cant get higher. I need a guideness about that. I even worked free for a while (extra times after my job here). I need your guideness, as I said I can work with you if you want to test and I want to learn more advanced topics while working. In Turkey, I couldn’t find people who can answer my questions so I can not ask for guideness to them.

And my (impromptu and unedited) reply:

Thank you for your email. You are correct, I do get a lot of email like this, and most of it gets a stock response; but yours was one of the most intelligently written so I’ve decided to spend a little time giving you a personal answer.

Even if you were to spend a few years as a DBA, you would probably not become the sort of expert you want to be. Most DBAs end up dealing with databases that, for want of a better word, we could call “boring”; for a database to be interesting and show you the sorts of problems where you have to be able to answer the types of question I regularly answer you probably need to be the DBA for a large banking or telecoms system – preferably one that hasn’t been designed very well – that has to handle a very large volume of data very quickly. On these extreme systems you might find that you keep running into boundary conditions in Oracle that force you to investigate problems in great detail and learn all sorts of strange things very quickly. On most other systems you might run into a strange problem very occasionally and spend several years on the job without once being forced to solve any difficult problems very quickly.

If you want to become an expert, you need to be a consultant so you get to see a lot of problems on lots of different systems in a very short time; but you can’t really become a consultant until you’re an expert. As a substitute, then, you need to take advantage of the problems that people report on the OTN database forum – but that doesn’t mean just answering questions on OTN. Look for the problems which people have described reasonably well that make you think “why would that happen”, then try to build a model of the problem that has been described and look very closely at all the statistics and wait events that change as you modify the model. Creating models, and experimenting with models, is how you learn more.

Take, for example, the business of the sequences and pl/sql – you might run the test as supplied with SQL_trace enabled to see what that showed you, you could look very carefully at the session stats for the test and note the number of redo entries, user commits, and transactions reported; you could look at the statistics of enqueue gets and enqueue releases, ultimately you might dump the redo log file to see what’s going into it. Many of the tiny little details I casually report come from one or two days of intense effort studying an unexpected phenomenon.  (The log file sync one was the result of such a study about 15 years ago.)

 Happy New Year to all my readers.

 

Update (Feb 2021)

I’ve just rediscovered a draft note I started writing in 2011, which was in response to a similar, but much shorter email question:

Can you tell me how long  does it take to become a very good DBA like you ?

Things have changed a lot in the last 10 years, and I don’t know if I sent the following reply, but here’s what I drafted:

It might take about five years of hard work.

If you want to get to my position your best strategy these days is to get a copy of Oracle Enterprise Edition installed then get on to OTN and see what problems people are having and work out the right answers to the questions – looking closely at what you are doing, constructing models that match the questions, and checking alternative answers and examining why some solutions are not appropriate.

Document what you do and what you discover.

Note – you do not need to supply answers to the questions, but OTN is a good source of information for things that people need to know.

I think the 5 years hard work is probably still in the right ball-park.

The idea of having a private copy of Oracle is still sound. It’s a lot easier to do now when you can download virtual machines with pre-installed everything; and you can even learn a lot of what really matters from a free instance on the Cloud.

OTN has been renamed, of course, and is now the Oracle Developer Community (ODC), but the principle still remains that you need variety, so you need to see questions that other people have asked. The volume and subtlety of questions seems to have dropped significantly in the last couple of years, though, so you may need to pick a topic and then search the forum history for interesting questions (and then experiment without looking at the responses straightaway).

Finally, of course, document, catalogue, and cross reference the models you’ve built and the things you’ve learned.

December 23, 2015

Testing

Filed under: Oracle,Philosophy — Jonathan Lewis @ 12:56 pm GMT Dec 23,2015

This post is a 100% copy of a message that Tanel Poder sent to the Oracle-L mailing list in response to a thread about the performance of SSD. It’s not just a good answer to the question, it’s a wonderfully succinct insight into how to think about what you’re really testing and it displays the mind-set that should be adopted by everyone.

If you measure write performance on an idle Exadata machine without any other load going on, you are not comparing flash vs disk, you are comparing flash vs the battery-backed 512MB RAM cache in the “RAID” controllers within each storage cell!

This is how the “disk” that’s supposed to have a couple of milliseconds of average latency (it still rotates and needs to seek + calibrate to next track even in sequential writes) gives you sub-millisecond write latencies… it’s not the disk write, it’s the controller’s RAM write that gets acknowledged.

And now when you run a real workload on the machine (lots of random IOs on the disk and Smart Scans hammering them too), your disk writes won’t be always acknowledged by the controller RAM cache. When comparing *busy* flash disks to *busy* spinning disks vs. *idle* flash disks vs *idle* spinning disks (with non-dirty write cache) you will get different results.

So, I’m not arguing here that flash is somehow faster for sequential writes than a bunch of disks when talking about throughput. But if you care about latency (of your commits) you need to be aware of everything else that will be going on on these disks (and account for this in your benchmarks).

Without queueing time included, a busy flash device will “seek” where needed and perform the write in under a millisecond, a busy disk device in 6-10 milliseconds. So your commits will end up having to wait for longer (yes, your throughput will be ok due to the LGWR writing multiple transactions redo out in a single write, but this doesn’t change the fact that individual commit latency suffers).

This latency issue of course will be mitigated when you are using a decent storage array with enough (well-managed) write cache.

So I’d say there are the following things you can compare (and need to be aware of which hardware are you really benchmarking):

1) Flash storage
2) Disk storage without (write) cache
3) Disk storage with crappy (write) cache
4) Disk storage with lots of well-managed & isolated (write) cache

And the second thing to be aware of:

1) Are you the single user on an idle storage array
2) Are you just one of the many users in a heavily utilized (and randomly seeking) storage array

So, as usual, run a realistic workload and test it out yourself (if you have the hardware :)

July 3, 2014

Philosophy 22

Filed under: Philosophy — Jonathan Lewis @ 9:59 am BST Jul 3,2014

Make sure you agree on the meaning of the jargon.

If you had to vote would you say that the expressions “more selective” and “higher selectivity” are different ways of expressing the same idea, or are they exact opposites of each other ? I think I can safely say that I have seen people waste a ludicrous amount of time arguing past each other and confusing each other because they didn’t clarify their terms (and one, or both, parties actually misunderstood the terms anyway).

Selectivity is a value between 0 and 1 that represents the fraction of data that will be selected – the higher the selectivity the more data you select.

If a test is “more selective” then it is a harsher, more stringent, test and returns less data  (e.g. Oxford University is more selective than Rutland College of Further Education): more selective means lower selectivity.

If there’s any doubt when you’re in the middle of a discussion – drop the jargon and explain the intention.

Footnote

If I ask:  “When you say ‘more selective’ do you mean ….”

The one answer which is absolutely, definitely, unquestionably the wrong reply is: “No, I mean it’s more selective.”

 

Next Page »

Website Powered by WordPress.com.