Here’s a funny little bug – which probably won’t cause any damage – that may remind you that (most of) the people who work for Oracle are just ordinary people like you and me who make ordinary little mistakes in their programming. It’s a bug I discovered by accident because I just wanted to check something about how a particular undo tablespace had been defined, and I called dbms_metadata instead of querying dba_tablespaces. Here’s the cut-n-paste from an SQL*Plus session on 184.108.40.206:
June 10, 2013
June 4, 2013
Here’s a suggestion to help you avoid wasting time. If you ever include the rowid in a query – not that that should happen very commonly – make sure you give it an alias, especially if you’re using ANSI SQL. If you don’t, you may find yourself struggling to work out why you’re getting an irrational error message. Here’s an example that appeared recently on the OTN forum, with the output cut-n-pasted from a system running 220.127.116.11:
April 16, 2013
For your entertainment – there’s nothing up my sleeves, this was a simple cut-n-paste after real-time typing with no tricks:
20:39:51 SQL> create table t1 (t1 timestamp); Table created. 20:39:55 SQL> insert into t1 values(systimestamp); 1 row created. 20:39:59 SQL> select t1 - systimestamp from t1; T1-SYSTIMESTAMP --------------------------------------------------------------------------- +000000000 04:59:50.680620 1 row selected. 20:40:08 SQL>
March 29, 2013
From time to time I’ve looked at an AWR report and pointed out to the owner the difference in work load visible in the “SQL ordered by” sections of the report when they compare the summary figure with the sum of the work done by the individual statements. Often the summary will state that the captured SQL in the interval represents some percentage of the total workload in the high 80s to mid 90s – sometimes you might see a statement that the capture represents a really low percentage, perhaps in the 30s or 40s.
You have to be a little sensible about interpreting these figures, of course – at one extreme it’s easy to double-count the cost of SQL inside PL/SQL, at the other you may notice that every single statement reported does about the same amount of work so you can’t extrapolate from a pattern to decide how significant a low percentage might be. Nevertheless I have seen examples of AWR reports where I’ve felt justified in suggesting that at some point in the interval some SQL has appeared, worked very hard, and disappeared from the library cache before the AWR managed to capture it.
Now, from Nigel Noble, comes another explanation for why the AWR report might be hiding expensive SQL – a bug, which doesn’t get fixed until 12.2 (although there are backports in hand).
March 20, 2013
Here’s an oddity that I ran into a little while ago while trying to prepare a sample trace file showing a particular locking pattern; it was something that I’d done before, but trace files can change with different versions of Oracle so I decided to use a copy of 18.104.22.168 that happened to be handy at the time to check if anything had changed since the previous (11gR1) release. I never managed to finish the test; here are the steps I got through:
January 17, 2013
Here’s a very long post (which is mainly an example) demonstrating a little bug in the “explain plan” functionality. It’s a variation of a bug which I thought had been fixed in 11g, but it still appears in some cases. Take a look at this execution plan, which comes from explaining “select * from dba_tab_cols” – the bit I want to emphasise is in lines 1 to 10:
January 6, 2013
Just a quick note to say that I found a blog over the weekend with a number of interesting posts, so I thought I’d pass it on: http://www.bobbydurrettdba.com/
There’s a really cute example (complete with test case) of an optimizer bug (possibly only in 11.1) in the December archive: http://www.bobbydurrettdba.com/2012/12/04/index-causes-poor-performance-in-query-that-doesnt-use-it/
November 30, 2012
Sometimes you find bugs on MOS (Metalink, OCIS, whatever) that make you feel positively ill. I’ve just been on a customer site where (in passing) they mentioned that one of their historic queries against v$sqlstats now tool just over one second (CPU) in 22.214.171.124 when it had previously taken about 200ms on 10.2.0.4***. After a little checking it seemed likely that the change was possibly related to the fact that they had increased the size of the SGA significantly, allowing for a much larger shared pool and library cache; however there have been numerous code changes in the shared pool area on the route from 10g to 11g, so I decided to check MOS to see if anyone else had seen a similar problem. I found this:
September 11, 2012
A recent post on Oracle-l complained about an oddity when deleting through a function-based index.
I have a function based index but the CBO is not using it. The DML that I expect to have a plan with index range scan is doing a FTS. Its a simple DML that deletes 1000 rows at a time in a loop and is based on the column on which the FBI is created.
Although execution plans are mentioned, we don’t get to see the statement or the plan – and it’s always possible that there will be some clue in the (full) plan that tells us something about the code that the OP has forgotten to mention. However, function-based indexes have a little history of not doing quite what you expect, so I thought I’d take a quick look at the problem, starting with the simplest possible step – do function-based indexes and “normal” b-tree indexes behave differently on a delete. Here’s the data set I created for my test:
September 2, 2012
Last week I published some code that demonstrated how a fast refresh of an aggregate materialized view could fail because of the internal use of the sys_op_map_nonnull() function. In some ways it was an easy problem to explain because all you had to do was trace the call to refresh and see what was going on. Today’s bug is completely different – it’s virtually impossible to see the connection between the failure and its cause. Here (as a cut-n-paste) is an example of what happens when the bug appears:
execute dbms_stats.gather_schema_stats(user) * ERROR at line 1: ORA-01760: illegal argument for function ORA-06512: at "SYS.DBMS_STATS", line 13336 ORA-06512: at "SYS.DBMS_STATS", line 13682 ORA-06512: at "SYS.DBMS_STATS", line 13760 ORA-06512: at "SYS.DBMS_STATS", line 13719 ORA-06512: at line 1
August 30, 2012
I’ve given examples in the past of how you can be suprised by Oracle when a mechanism that has “always worked” in the past suddenly just doesn’t work because some unexpected internal coding mechanism hits a boundary condition. One example I gave of this was rebuilding indexes online – where a limitation on the key size of index organized tables made it impossible to do an online rebuild of an index on a heap table because of an ORA-01450 (maximum key length) error that was raised against the (index-organized) journal table that Oracle creates internally to support the rebuild.
Slightly more recently the topic of the sys_op_map_nonnull() function came up in some comments about the ora_hash function – and I made a passing comment about the function appearing in materialized view refreshes and then never followed it up. Fortunately I got an email from Jared Still and Don Seiler a few days ago which contained a nice short demonstration of the problem so, with their permission, I’ve published it below.
August 9, 2012
One of my recent assignments involved a company that had run into some performance problems after upgrading from 10.2.0.3 to 126.96.36.199. We had spent half an hour on the phone discussing the system before I had arrived, and I’d made a couple of suggestions that had solved most of their problems before I got on site – but they still wanted me to come in and give them some specific ideas about why the critical part of the solution had helped.
The most critical piece of advice I had given them (after listening very carefully to their description of the system) was to get rid of ALL the histograms they had on their system, and then watch very carefully for any signs that they might need to re-introduce a handful of histograms over the next few weeks.
One of their critical queries completed in less that 2 seconds when histograms were removed, but took 33 seconds to complete when histograms were in place. With their permission, the following notes record my investigation of this puzzle, the underlying Oracle bug (possibly not yet documented) that caused it, and the optimum workaround that was available to them.
June 14, 2012
Just in from Mark Bobak on the Oracle-L list server: a warning for the people who have to deal with the client environment for their eBS systems. (It’s likely that the relevant person in your organization has already received this message direct from Oracle Corp., but there’s no harm in double-checking.)
June 8, 2012
As in – how come a unique (or primary key) index is predicted to return more than one row using a unique scan. Here’s and example (running on 10.2.0.3 – but the same type of thing happens on newer versions):
March 21, 2012
You’ve probably heard about adaptive cursor sharing, and possibly you’ve wondered why you haven’t seen it happening very often on your production systems (assuming you’re running a modern version of Oracle). Here’s a little bug/fix that may explain the non-appearance.
MOS Doc ID 9532657.8 Adaptive cursor sharing ignores SELECTs which are not fully fetched.
This bug is confirmed in 188.8.131.52, and fixed in 184.108.40.206. The problem is that the ACS code doesn’t process the statistical information from the cursor unless the cursor reaches “end of fetch” – i.e. if you don’t select all the data in your query, Oracle doesn’t consider the statistics of that execution when deciding whether or not to re-optimise a statement.
It’s quite possible, of course, for an OLTP system, and particularly a web-based system, to execute a number of that allow the user to fetch data one “page” at a time, and stop before fetching all the data – so this bug (or limitation, perhaps) means that some critical statements in your application may never be re-optimized. If this is the case, and you know that you have some such statements that should generate multiple plans, then you could add the hint /*+ bind_aware */ to the SQL.
Upgrade woes: as ever, when a bug is fixed, it’s possible that a few people will suffer from unfortunate side-effects. In the case of this bug, Oracle may start to re-optimize and generated multiple child cursors for SQL statements that (from your perspective) didn’t need the extra work. If you’re very unlucky this may have an undesirable impact on execution performance, and library cache activity.
Thanks to Leonid Roodnitsky for sending me a note about this bug after attending one of my tutorial days last month.