I’ll probably have to file this one under “Optimizer ignoring hints” – except that it should also go under “bugs”, and that’s one of the get-out clauses I use in my “hints are not hints” argument.
Sometimes an invisible index isn’t completely invisible.
Actually it’s probably not the NOT IN that’s nasty, it’s the thing you get if you don’t use NOT IN that’s more likely to be nasty. Just another odd little quirk of the optimizer, which I’ll demonstrate with a simple example (running under 188.8.131.52 in this case):
Here’s a funny little bug – which probably won’t cause any damage – that may remind you that (most of) the people who work for Oracle are just ordinary people like you and me who make ordinary little mistakes in their programming. It’s a bug I discovered by accident because I just wanted to check something about how a particular undo tablespace had been defined, and I called dbms_metadata instead of querying dba_tablespaces. Here’s the cut-n-paste from an SQL*Plus session on 184.108.40.206:
Here’s a suggestion to help you avoid wasting time. If you ever include the rowid in a query – not that that should happen very commonly – make sure you give it an alias, especially if you’re using ANSI SQL. If you don’t, you may find yourself struggling to work out why you’re getting an irrational error message. Here’s an example that appeared recently on the OTN forum, with the output cut-n-pasted from a system running 220.127.116.11:
For your entertainment – there’s nothing up my sleeves, this was a simple cut-n-paste after real-time typing with no tricks:
20:39:51 SQL> create table t1 (t1 timestamp);
20:39:55 SQL> insert into t1 values(systimestamp);
1 row created.
20:39:59 SQL> select t1 - systimestamp from t1;
1 row selected.
From time to time I’ve looked at an AWR report and pointed out to the owner the difference in work load visible in the “SQL ordered by” sections of the report when they compare the summary figure with the sum of the work done by the individual statements. Often the summary will state that the captured SQL in the interval represents some percentage of the total workload in the high 80s to mid 90s – sometimes you might see a statement that the capture represents a really low percentage, perhaps in the 30s or 40s.
You have to be a little sensible about interpreting these figures, of course – at one extreme it’s easy to double-count the cost of SQL inside PL/SQL, at the other you may notice that every single statement reported does about the same amount of work so you can’t extrapolate from a pattern to decide how significant a low percentage might be. Nevertheless I have seen examples of AWR reports where I’ve felt justified in suggesting that at some point in the interval some SQL has appeared, worked very hard, and disappeared from the library cache before the AWR managed to capture it.
Now, from Nigel Noble, comes another explanation for why the AWR report might be hiding expensive SQL – a bug, which doesn’t get fixed until 12.2 (although there are backports in hand).
Here’s an oddity that I ran into a little while ago while trying to prepare a sample trace file showing a particular locking pattern; it was something that I’d done before, but trace files can change with different versions of Oracle so I decided to use a copy of 18.104.22.168 that happened to be handy at the time to check if anything had changed since the previous (11gR1) release. I never managed to finish the test; here are the steps I got through:
Here’s a very long post (which is mainly an example) demonstrating a little bug in the “explain plan” functionality. It’s a variation of a bug which I thought had been fixed in 11g, but it still appears in some cases. Take a look at this execution plan, which comes from explaining “select * from dba_tab_cols” – the bit I want to emphasise is in lines 1 to 10:
Just a quick note to say that I found a blog over the weekend with a number of interesting posts, so I thought I’d pass it on: http://www.bobbydurrettdba.com/
There’s a really cute example (complete with test case) of an optimizer bug (possibly only in 11.1) in the December archive: http://www.bobbydurrettdba.com/2012/12/04/index-causes-poor-performance-in-query-that-doesnt-use-it/
Sometimes you find bugs on MOS (Metalink, OCIS, whatever) that make you feel positively ill. I’ve just been on a customer site where (in passing) they mentioned that one of their historic queries against v$sqlstats now tool just over one second (CPU) in 22.214.171.124 when it had previously taken about 200ms on 10.2.0.4***. After a little checking it seemed likely that the change was possibly related to the fact that they had increased the size of the SGA significantly, allowing for a much larger shared pool and library cache; however there have been numerous code changes in the shared pool area on the route from 10g to 11g, so I decided to check MOS to see if anyone else had seen a similar problem. I found this:
A recent post on Oracle-l complained about an oddity when deleting through a function-based index.
I have a function based index but the CBO is not using it. The DML that I expect to have a plan with index range scan is doing a FTS. Its a simple DML that deletes 1000 rows at a time in a loop and is based on the column on which the FBI is created.
Although execution plans are mentioned, we don’t get to see the statement or the plan – and it’s always possible that there will be some clue in the (full) plan that tells us something about the code that the OP has forgotten to mention. However, function-based indexes have a little history of not doing quite what you expect, so I thought I’d take a quick look at the problem, starting with the simplest possible step – do function-based indexes and “normal” b-tree indexes behave differently on a delete. Here’s the data set I created for my test:
Last week I published some code that demonstrated how a fast refresh of an aggregate materialized view could fail because of the internal use of the sys_op_map_nonnull() function. In some ways it was an easy problem to explain because all you had to do was trace the call to refresh and see what was going on. Today’s bug is completely different – it’s virtually impossible to see the connection between the failure and its cause. Here (as a cut-n-paste) is an example of what happens when the bug appears:
ERROR at line 1:
ORA-01760: illegal argument for function
ORA-06512: at "SYS.DBMS_STATS", line 13336
ORA-06512: at "SYS.DBMS_STATS", line 13682
ORA-06512: at "SYS.DBMS_STATS", line 13760
ORA-06512: at "SYS.DBMS_STATS", line 13719
ORA-06512: at line 1
I’ve given examples in the past of how you can be suprised by Oracle when a mechanism that has “always worked” in the past suddenly just doesn’t work because some unexpected internal coding mechanism hits a boundary condition. One example I gave of this was rebuilding indexes online – where a limitation on the key size of index organized tables made it impossible to do an online rebuild of an index on a heap table because of an ORA-01450 (maximum key length) error that was raised against the (index-organized) journal table that Oracle creates internally to support the rebuild.
Slightly more recently the topic of the sys_op_map_nonnull() function came up in some comments about the ora_hash function – and I made a passing comment about the function appearing in materialized view refreshes and then never followed it up. Fortunately I got an email from Jared Still and Don Seiler a few days ago which contained a nice short demonstration of the problem so, with their permission, I’ve published it below.
One of my recent assignments involved a company that had run into some performance problems after upgrading from 10.2.0.3 to 126.96.36.199. We had spent half an hour on the phone discussing the system before I had arrived, and I’d made a couple of suggestions that had solved most of their problems before I got on site – but they still wanted me to come in and give them some specific ideas about why the critical part of the solution had helped.
The most critical piece of advice I had given them (after listening very carefully to their description of the system) was to get rid of ALL the histograms they had on their system, and then watch very carefully for any signs that they might need to re-introduce a handful of histograms over the next few weeks.
One of their critical queries completed in less that 2 seconds when histograms were removed, but took 33 seconds to complete when histograms were in place. With their permission, the following notes record my investigation of this puzzle, the underlying Oracle bug (possibly not yet documented) that caused it, and the optimum workaround that was available to them.
Just in from Mark Bobak on the Oracle-L list server: a warning for the people who have to deal with the client environment for their eBS systems. (It’s likely that the relevant person in your organization has already received this message direct from Oracle Corp., but there’s no harm in double-checking.)