Last week I published some code that demonstrated how a fast refresh of an aggregate materialized view could fail because of the internal use of the sys_op_map_nonnull() function. In some ways it was an easy problem to explain because all you had to do was trace the call to refresh and see what was going on. Today’s bug is completely different – it’s virtually impossible to see the connection between the failure and its cause. Here (as a cut-n-paste) is an example of what happens when the bug appears:
ERROR at line 1:
ORA-01760: illegal argument for function
ORA-06512: at "SYS.DBMS_STATS", line 13336
ORA-06512: at "SYS.DBMS_STATS", line 13682
ORA-06512: at "SYS.DBMS_STATS", line 13760
ORA-06512: at "SYS.DBMS_STATS", line 13719
ORA-06512: at line 1
I’ve given examples in the past of how you can be suprised by Oracle when a mechanism that has “always worked” in the past suddenly just doesn’t work because some unexpected internal coding mechanism hits a boundary condition. One example I gave of this was rebuilding indexes online – where a limitation on the key size of index organized tables made it impossible to do an online rebuild of an index on a heap table because of an ORA-01450 (maximum key length) error that was raised against the (index-organized) journal table that Oracle creates internally to support the rebuild.
Slightly more recently the topic of the sys_op_map_nonnull() function came up in some comments about the ora_hash function – and I made a passing comment about the function appearing in materialized view refreshes and then never followed it up. Fortunately I got an email from Jared Still and Don Seiler a few days ago which contained a nice short demonstration of the problem so, with their permission, I’ve published it below.
One of my recent assignments involved a company that had run into some performance problems after upgrading from 10.2.0.3 to 126.96.36.199. We had spent half an hour on the phone discussing the system before I had arrived, and I’d made a couple of suggestions that had solved most of their problems before I got on site – but they still wanted me to come in and give them some specific ideas about why the critical part of the solution had helped.
The most critical piece of advice I had given them (after listening very carefully to their description of the system) was to get rid of ALL the histograms they had on their system, and then watch very carefully for any signs that they might need to re-introduce a handful of histograms over the next few weeks.
One of their critical queries completed in less that 2 seconds when histograms were removed, but took 33 seconds to complete when histograms were in place. With their permission, the following notes record my investigation of this puzzle, the underlying Oracle bug (possibly not yet documented) that caused it, and the optimum workaround that was available to them.
Just in from Mark Bobak on the Oracle-L list server: a warning for the people who have to deal with the client environment for their eBS systems. (It’s likely that the relevant person in your organization has already received this message direct from Oracle Corp., but there’s no harm in double-checking.)
As in – how come a unique (or primary key) index is predicted to return more than one row using a unique scan. Here’s and example (running on 10.2.0.3 – but the same type of thing happens on newer versions):
You’ve probably heard about adaptive cursor sharing, and possibly you’ve wondered why you haven’t seen it happening very often on your production systems (assuming you’re running a modern version of Oracle). Here’s a little bug/fix that may explain the non-appearance.
MOS Doc ID 9532657.8 Adaptive cursor sharing ignores SELECTs which are not fully fetched.
This bug is confirmed in 188.8.131.52, and fixed in 184.108.40.206. The problem is that the ACS code doesn’t process the statistical information from the cursor unless the cursor reaches “end of fetch” – i.e. if you don’t select all the data in your query, Oracle doesn’t consider the statistics of that execution when deciding whether or not to re-optimise a statement.
It’s quite possible, of course, for an OLTP system, and particularly a web-based system, to execute a number of that allow the user to fetch data one “page” at a time, and stop before fetching all the data – so this bug (or limitation, perhaps) means that some critical statements in your application may never be re-optimized. If this is the case, and you know that you have some such statements that should generate multiple plans, then you could add the hint /*+ bind_aware */ to the SQL.
Upgrade woes: as ever, when a bug is fixed, it’s possible that a few people will suffer from unfortunate side-effects. In the case of this bug, Oracle may start to re-optimize and generated multiple child cursors for SQL statements that (from your perspective) didn’t need the extra work. If you’re very unlucky this may have an undesirable impact on execution performance, and library cache activity.
Thanks to Leonid Roodnitsky for sending me a note about this bug after attending one of my tutorial days last month.
Just like my posting on an index hash, this posting is about a problem as well as being about a hash join. The article has its roots in a question posted on the OTN database forum, where a user has shown us the following execution plan:
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
| 0 | SELECT STATEMENT | | 14 | 896 | 63 (7)| 00:00:01 |
| 1 | SORT GROUP BY | | 14 | 896 | 63 (7)| 00:00:01 |
| 2 | NESTED LOOPS | | | | | |
| 3 | NESTED LOOPS | | 14 | 896 | 62 (5)| 00:00:01 |
|* 4 | HASH JOIN | | 14 | 280 | 48 (7)| 00:00:01 |
| 5 | VIEW | V_SALES_ALL | 200 | 1800 | 4 (0)| 00:00:01 |
| 6 | UNION-ALL | | | | | |
| 7 | INDEX FAST FULL SCAN | PRODUCTS_DATES_IDX | 100 | 900 | 2 (0)| 00:00:01 |
| 8 | INDEX FAST FULL SCAN | PRODUCTS_DATES_IDX_HARD | 100 | 900 | 2 (0)| 00:00:01 |
|* 9 | VIEW | index$_join$_003 | 2238 | 24618 | 44 (7)| 00:00:01 |
|* 10 | HASH JOIN | | | | | |
|* 11 | INDEX RANGE SCAN | PRODUCTS_GF_INDEX2 | 2238 | 24618 | 6 (0)| 00:00:01 |
| 12 | INDEX FAST FULL SCAN | PRODUCTS_GF_PK | 2238 | 24618 | 45 (3)| 00:00:01 |
|* 13 | INDEX UNIQUE SCAN | DATES_PK | 1 | | 0 (0)| 00:00:01 |
| 14 | TABLE ACCESS BY INDEX ROWID| DATES | 1 | 44 | 1 (0)| 00:00:01 |
Here’s a quirky little detail that may make you think carefully about how you define and load large tables.
I have a large table which I load with data and then apply the following:
alter table t_15400 modify (id not null, small_vc not null);
Would you really expect to find Oracle doing two tablescans on the table to enable these constraints ? This is what I found in a trace file (with a lot of db file scattered read waits and other stuff in between) when I ran the test recently on 220.127.116.11:
select /*+ all_rows ordered */ A.rowid, :1, :2, :3 from "SYS"."T_15400" A where( "ID" is null)
select /*+ all_rows ordered */ A.rowid, :1, :2, :3 from "SYS"."T_15400" A where( "SMALL_VC" is null)
It’s just a little difficult to come up with a good reason for this approach, rather than a single statement that validates both constaints at once.
Somewhere I think I’ve published a note that points out that when you add a primary key constraint Oracle first checks that the key column (or column set) is not null – which means that adding a primary key may also result in a tablescan for every column in the index before the index is created – but in that case you can’t see the SQL that checks each column, you have to infer the check from the number of tablescans and number of rows fetched by tablescan. The trace file is rather more helpful if all you’re doing is adding the not null constraints.
I see that Christian Antognini posted a note about an interesting little defect in Enterprise Manager a little while ago – it doesn’t always know how to interpret execution plans. The problem appears in Christians’ example when a filter subquery predicate is applied during an index range scan – it’s a topic I wrote about a few months ago with the title “filter bug” because the plan shows (or, rather, fails to show) a “missing” filter operation, which has been subsumed into the predicate section of the thing that would otherwise have been the first child of the filter operation – the rule of recursive descent through the plan breaks, and the ordering that OEM gives for the operations goes wrong.
I’ve just spent a couple of days in Switzerland presenting seminar material to an IT company based in Delemont (about 50 minutes drive from Basle), and during the course I picked up a number of new stories about interesting things that have gone wrong at client sites. Here’s one that might be of particular interest to people thinking of upgrading from 10g to 11g – even if you don’t hit the bug described in the blog, the fact that the new feature has been implemented may leave you wondering where all your machine resources are going during the overnight run.
Here’s an interesting little conundrum about subquery factoring that hasn’t changed in the recent (18.104.22.168) patch for subquery factoring. It came to me from Jared Still (a fellow member of Oak Table Network) shortly after I’d made some comments about the patch. It’s an example based on the scott/tiger schema – which I’ve extracted from the script $ORACLE_HOME/rdbms/admin/utlsampl.sql (though the relevant scripts may be demobld.sql or scott.sql, depending on version).
From MOS (Metalink) a search for “Patch Set – List of Bug Fixes by Problem” is a useful search, andother is “Availability and Known Issues”. Whenever you find some behaviour that looks like a bug, it’s worth checking the patch sets for the patches or release that are newer than the version that you’re running – you may find that your problem is a known bug with a patch that might be back-ported.
For ease of reference, here are some of the results I got (sorted in reverse order of version) from the searches; you will need a MOS account to follow the links:
Here’s a follow-up to a post I did some time ago about estimating the size of an index before you create it. The note describes dbms_stats.create_index_cost() procedure, and how it depends on the results of a call to explain plan. A recent question on the OTN database forum highlighted a bug in explain plan, however, which I can demonstrate very easily. I’ll start with a small amount of data to demonstrate the basic content that is used to calculate the index cost.
Here’s a funny little optimizer bug – though one that seems to have been fixed by at least 10.2.0.3. It showed up earlier on today in a thread on the OTN database forum. We’ll start (in 22.214.171.124) with a little table and two indexes – one normal, the other descending.
This one’s so odd I nearly posted it as a “Quiz Night” – but decided that it would be friendlier simply to demonstrate it. Here’s a simple script to create a couple of identical tables. It’s using my standard environment but, apart from fiddling with optimizer settings, I doubt if there’s any reason why you need to worry too much about getting the environment exactly right.