Oracle Scratchpad

February 17, 2020

Join Elimination bug

Filed under: Bugs,CBO,Execution plans,Oracle — Jonathan Lewis @ 3:37 pm GMT Feb 17,2020

It is possible to take subquery factoring (common table expressions / CTEs) too far. The most important purpose of factoring is to make a complex query easier to understand – especially if you can identify a messy piece of text that is used in more than one part of the query – but I have seen a couple of patterns appearing that make the SQL harder to read.

  • In one pattern each table is given its own factored subquery holding the non-join predicates (and, possibly, filter subqueries) and then the main query is just a simple join of the factored subqueries with nothing but join (ON) predicates.
  • In another pattern each factored subquery consists of the previous subquery with one more table added to it, so every subquery is no more that a “two-table” query block and the final subquery is a simple “select from last_factored_subquery”.

Neither of these patterns is helpful – but today’s blog note is not going to be about going to extremes with subquery factoring; instead it’s an example of a fairly reasonable use of subquery factoring that ran into a “wrong results” bug.

Consider a system that collects data from some type of meters. Here’s a two-table definition for meters and meter readings:

create table meters (
        meter_id        number,
        meter_type      varchar2(10),  -- references meter_types
        date_installed  date,
        padding         varchar2(100),
        constraint met_pk primary key(meter_id)

create table meter_readings (
        meter_id        number,
        date_read       date,
        reading         number(10,3),
        padding         varchar2(100),
        constraint      mrd_pk primary key(meter_id, date_read),
        constraint      mrd_fk_met foreign key (meter_id) references meters

insert into meters
        1e6 + rownum,
        case mod(rownum,3)
                when 0 then 'A'
                when 1 then 'B'
                       else 'C'
        trunc(sysdate) - mod(rownum,5),
connect by 
        level <= 10

execute dbms_stats.gather_table_stats(null,'meters')

insert into meter_readings 
        met.date_installed - + 2,
        meters met,
        (select rownum id from dual connect by level <= 4) v


execute dbms_stats.gather_table_stats(null,'meter_readings')

I’ve picked the obvious primary keys for the two tables and defined the appropriate referential integrity constraint – which means the optimzer should be able to choose the best possible strategies for any query that joins the two tables.

I’ve created a very small data set – a few meters installed in the last few days, and a few readings per meters over the last few days. So lets report the readings for the last 48 hours, and include in the output any meters that haven’t logged a reading in that interval.

Here’s the query I wrote, with its output, running on a 19.3 instance on 17th Feb 2020:

with mrd_cte as (
                meter_id, date_read, reading
                date_read in (trunc(sysdate), trunc(sysdate)+1)
        met.meter_id, met.date_installed, mrd_cte.date_read, reading
        meters met
left join 
on      mrd_cte.meter_id = met.meter_id

---------- --------- --------- ----------
   1000001 16-FEB-20 17-FEB-20       .063
   1000002 15-FEB-20
   1000003 14-FEB-20
   1000004 13-FEB-20
   1000005 17-FEB-20 18-FEB-20        .37
   1000005 17-FEB-20 17-FEB-20       .824
   1000006 16-FEB-20 17-FEB-20       .069
   1000007 15-FEB-20
   1000008 14-FEB-20
   1000009 13-FEB-20
   1000010 17-FEB-20 17-FEB-20       .161
   1000010 17-FEB-20 18-FEB-20       .818

12 rows selected.

The query returns 12 rows – which SQL*Plus can report because it counts them as it fetches them so it can give you the total at the end of the query.

Of course, sometimes people write preliminary queries to find out how big the result set would be before they run the query to acquire the result set itself. In cases like that (where they’re just going to select a “count(*)” the optimizer may a choose different execution path from the base query – perhaps finding a way to do an index-only execution, and maybe eliminating a few table joins from the query. So let’s execute a count of the above query:

rem     Script:         join_elimination_bug.sql
rem     Author:         Jonathan Lewis
rem     Dated:          Jan 2020

with mrd_cte as (
                meter_id, date_read, reading
                date_read in (trunc(sysdate), trunc(sysdate)+1)
select count(*) from (
                met.meter_id, met.date_installed, mrd_cte.date_read, mrd_cte.reading
                meters met
        left join 
        on      mrd_cte.meter_id = met.meter_id


1 row selected.

You’ll have to take my word for it, of course, but no-one else was using this database while I was running this test, and no-one else has access to the schema I was using anyway. Unfortunately when I count the 12 rows instead of reporting them Oracle thinks there are only 10 rows. Oops!

Step 1 in investigating the problem – check the execution plans to see if there are any differences in the structure of the plan, the use of predicates, or the outline information. I won’t bother with the plan for the base query because it was very obvious from the count query where the problem lay.

| Id  | Operation        | Name   | Rows  | Cost (%CPU)| Time     |
|   0 | SELECT STATEMENT |        |       |     1 (100)|          |
|   1 |  SORT AGGREGATE  |        |     1 |            |          |
|   2 |   INDEX FULL SCAN| MET_PK |    10 |     1   (0)| 00:00:01 |

Outline Data
      MERGE(@"SEL$C43CA2CA" >"SEL$2")
      MERGE(@"SEL$D28F6BD4" >"SEL$E6E74641")
      MERGE(@"SEL$1" >"SEL$006708EA")
      INDEX(@"SEL$69B21C86" "MET"@"SEL$3" ("METERS"."METER_ID"))

This is the plan as pulled from memory by a call to dbms_xplan.display_cursor(). We note particularly the following: meter_readings doesn’t appear in the plan, there is no predicate section (and no asterisks against any of the operations that would tell us there ought to be some predicate information), and there’s a very revealing ELIMINATE_JOIN(@”SEL$00F67CF8″ “METER_READINGS”@”SEL$1”) in the outline information.

For some reason the optimizer has decided that it’s okay to remove meter_readings from the query (even though there may be many meter readings for each meter), so it was inevitable that it produced the wrong result.

Despite my opening note, this is not an issue with subquery factoring – it just looked that way when I started poking at the problem. In fact, if you rewrite the query using an inline view you get the same error, if you turn the inline view into a stored view you get the error, and if you turn the whole query into a simple (left) join with the date predicate as part of the ON clause you still get the error.

The problem lies somewhere in the join elimination transformation. If you go back to the outline information from the bad plan you’ll see the line: ELIMINATE_JOIN(@”SEL$00F67CF8″ “METER_READINGS”@”SEL$1”) – by changing this to NO_ELIMINATE_JOIN(…) and adding it to the main query block I got a suitable plan joining the two tables and producing the right result.

The problem appears in and (tested on livesql) – but does not appear in or


There is a known bug associated with this problem:

Bug: 29182901
Query with Outer Join Returned a Wrong Result due to Join Elimination (Doc ID 29182901.8)

The bug has been fixed in 20.1, with a fix that has been backported into the Jan 2020 patches for 19, 18, and 12.2.

The description of the bug suggests that you could also work around the problem by setting “_optimizer_enhanced_join_elimination” to false at the system or session level.

Although the example uses the “ANSI” syntext for the outer join the problem is not one of those that can be fixed by rewriting to traditional outer-join Oracle syntax, the join elimination still occurs using the older (+) syntax.




  1. Hello Jonathan,

    I am sorry to say it, but this looks to me as a complete shame, not just a bug …

    Regardless of the issue of using a CTE or not, join elimination should be based on a rock-strong logic of making sure that it does not change the query result, based ONLY on things like constraints, aggregations that produce unique “keys”, a.s.o.

    In this case it looks trivial that with the condition “date_read in (trunc(sysdate), trunc(sysdate)+1)”, it is incorrect to eliminate the join,
    while, if using a “single value” condition, like “date_read in (trunc(sysdate)”, join elimination is correct.

    Generally speaking, and based on how they put it in the bug title, it looks to me that at Oracle they are working more on “enlisting” the cases where some optimization should NOT be applied, instead of more solidly working on when it CAN be applied correctly, and only apply it in those cases, even if they are less numerous.

    I still believe that any performance penalty that might be paid by NOT applying some “exotic” optimization is highly preferable upon producing wrong results, so all these tricky optimizations should be applied much more conservatively.

    Let’s also not forget that optimizations like “join elimination” can always be achieved manually by the developers, by a slightly more careful coding.

    Thanks a lot & Best Regards,
    Iudith Mentzel

    Comment by Iudith Mentzel — February 18, 2020 @ 11:59 am GMT Feb 18,2020 | Reply

    • Iudith,

      Playing the devil’s advocate here, I think there are two points that should be made to balance your argument.

      First – there are legacy applications (the SAPs, Peoplesoft, EBS etc.) and in-house applications that are so badly written that generic strategies for handling join elimination and other exotic transformations are really important, as are strategies to minimise the projection from views (which many in-house applications tend to have as stored views).

      Second – a strategy for converting ANSI-style to traditional syntax is very important so that Oracle can use existing optimizer technology rather than having to write parallel code to optimizer ANSI-style.

      It seems almost inevitable that when two classes of strategy like this collide there will be cases where an oversight has occurred and the resulting combination of transformations produces what looks like an error that could “obviously” have been avoided. Interestingly, this example looks very similar to another “ANSI” bug that I’ve written about (and you’ve commented on) where the transformation loses a column as the translation takes place and ends up producing the wrong results. (And there’s another bug where an earlier version of kept far more columns than was needed and queries would crash with “too many columns” errors when the queries themselves referenced only a handful of columns).

      Personally I’d like to see an optimizer-engineering group extract a specification of all the features of the optimzer, document all the traps that have appeared as collisions between different features, then rebuild the optimizer from the ground up. Realistically I think that adding in new “damage limiting” features incrementally (single column join elimination -> multi-column join elimination -> enhanced_join_elimination) is a necessary evil which can be endured provided they give us a fine-grained method of identifying and reversing out specific stages when we find collisions that cause problems.

      Jonathan Lewis

      Comment by Jonathan Lewis — February 18, 2020 @ 3:46 pm GMT Feb 18,2020 | Reply

  2. Hello Jonathan,

    It is indeed a nice idea to have a clear and documented picture of all the “wise features” that are incrementally added to the optimizer.
    But, for allowing complete control upon the consequences, we need to fully understand the functionality of all the parameters and hints,
    both the documented and the undocumented ones, and, as far as I understand, Oracle does not intend to do this, anyway not in a near future.

    What might happen, however, is that a query that produces correct results today, regardless of its performance,
    with or without applying any automatic optimization/transformation, might suddenly start to produce incorrect results when some additional
    automatic optimization is “released” in a future version.

    So, theoretically, one would practically find himself monitoring all the time the result correctness of each and every statement,
    and not only the excessively complex ones, as the example statement in this post shows.

    Or, as another alternative, one could decide “to guard” the correctly working code-base by using SQL plan management,
    which would be another way to avoid the damages (and also the possible benefits) of newly added automatic optimizations.

    As a somehow “philosophical parallel” to rebuilding the optimizer from the ground-up, as you say,
    I would rather like the possibility to build a deeper understanding of optimization issues in the developer community,
    not to mention the teams that work on widely used commercial application packages like those you mentioned.

    This will be a double gain for both the developers themselves, as well as for the Oracle optimizer development team,
    that will be able to dedicate itself to adding coding features instead of adding inefficient coding.

    It is most welcome from Oracle side to add more and more possibilities/features/tools for optimizing SQL queries,
    but to let the developers apply them in a conscious and controlled fashion, rather than doing this completely automatically.

    For some reason, when the first hints were introduced back in Oracle 7, I was pretty sure that some day they will be turned to become proper
    SQL syntax, but this did not happen.
    Considering the infinite complexity possible to be achieved by using the SQL Language, I doubt that whenever in the future
    in will be possible to still keep optimization completely automatic and transparent in all cases, without the risk of incorrect results.

    Thanks a lot & Best Regards,
    Iudith Mentzel

    Comment by Iudith Mentzel — February 18, 2020 @ 7:22 pm GMT Feb 18,2020 | Reply

RSS feed for comments on this post. TrackBack URI

Comments and related questions are welcome.

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Website Powered by