Just like my posting on an index hash, this posting is about a problem as well as being about a hash join. The article has its roots in a question posted on the OTN database forum, where a user has shown us the following execution plan:
--------------------------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | --------------------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 14 | 896 | 63 (7)| 00:00:01 | | 1 | SORT GROUP BY | | 14 | 896 | 63 (7)| 00:00:01 | | 2 | NESTED LOOPS | | | | | | | 3 | NESTED LOOPS | | 14 | 896 | 62 (5)| 00:00:01 | |* 4 | HASH JOIN | | 14 | 280 | 48 (7)| 00:00:01 | | 5 | VIEW | V_SALES_ALL | 200 | 1800 | 4 (0)| 00:00:01 | | 6 | UNION-ALL | | | | | | | 7 | INDEX FAST FULL SCAN | PRODUCTS_DATES_IDX | 100 | 900 | 2 (0)| 00:00:01 | | 8 | INDEX FAST FULL SCAN | PRODUCTS_DATES_IDX_HARD | 100 | 900 | 2 (0)| 00:00:01 | |* 9 | VIEW | index$_join$_003 | 2238 | 24618 | 44 (7)| 00:00:01 | |* 10 | HASH JOIN | | | | | | |* 11 | INDEX RANGE SCAN | PRODUCTS_GF_INDEX2 | 2238 | 24618 | 6 (0)| 00:00:01 | | 12 | INDEX FAST FULL SCAN | PRODUCTS_GF_PK | 2238 | 24618 | 45 (3)| 00:00:01 | |* 13 | INDEX UNIQUE SCAN | DATES_PK | 1 | | 0 (0)| 00:00:01 | | 14 | TABLE ACCESS BY INDEX ROWID| DATES | 1 | 44 | 1 (0)| 00:00:01 | ---------------------------------------------------------------------------------------------------------
Note particularly that line 6 is a union all of two index fast full scans that are predicted to produce 100 rows each, and that the resulting 200 rows are passed through a view to become the build table for a hash join. The problem, as the OP points out, is that the estimates are very bad and the indexes return 1.7 million and 78 million rows respectively, leading to a very large hash table and a very slow query.
The problem is that the system is running with the optimizer_mode set to first_rows_100 – and there’s a bug in the code that has made the optimizer “forget” that it can’t stop after 100 rows when it’s supposed to be building a hash table. Here’s a test case to reproduce the problem. I’ve run this on Oracle 11.1.0.7 using my standard setup (8KB blocks, locally managed tablespace with 1MB uniform extents, freelist management, and system statistics disabled):
create table t1 as with generator as ( select --+ materialize rownum id from dual connect by level <= 10000) select rownum id, trunc(dbms_random.value(1,1000)) n1, lpad(rownum,10,'0') small_vc, rpad('x',100) padding from generator v1, generator v2 where rownum <= 1000000; create index t1_n1 on t1(id, n1); create table t2 as with generator as ( select --+ materialize rownum id from dual connect by level <= 10000) select rownum id, trunc(dbms_random.value(10001,20001)) x1, lpad(rownum,10,'0') small_vc, rpad('x',100) padding from generator v1, generator v2 where rownum <= 1000000; create index t2_i1 on t2(x1); -- collect stats on both tables, no histograms create or replace view v1 as select id, n1, small_vc, padding from t1 where n1 between 101 and 300 union all select id, n1, small_vc, padding from t1 where n1 between 501 and 700 ; alter session set optimizer_mode = first_rows_10; explain plan for select v1.small_vc, v1.n1 from v1, t2 where t2.id = v1.id and t2.x1 between 1000 and 5000 ; select * from table(dbms_xplan.display(null,null,'outline'));
I’ve used the same table twice in my view – and the final query is actually going to use tablescans on the two accesses to the t1 table, ignoring the index which I happen to have left in place from my first attempts to model the original scenario. With this data in place, and running with first_rows_10, the execution plan is as follows:
----------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost | ----------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | 50 | 108 | |* 1 | HASH JOIN | | 1 | 50 | 108 | | 2 | VIEW | V1 | 24 | 960 | 4 | | 3 | UNION-ALL | | | | | |* 4 | TABLE ACCESS FULL | T1 | 12 | 240 | 2 | |* 5 | TABLE ACCESS FULL | T1 | 12 | 240 | 2 | | 6 | TABLE ACCESS BY INDEX ROWID | T2 | 100 | 1000 | 103 | |* 7 | INDEX RANGE SCAN | T2_I1 | 100 | | 3 | -----------------------------------------------------------------------
As you can see, the optimizer thinks it will be very quick and cheap to do two full tablescans of t1 to collect a total of 24 rows to feed into a hash table before doing an index range scan against t2 to collect 100 rows to probe the hash table. But even though it may be quick and cheap for Oracle to collect the first 12 rows from each tablescan, at runtime we actually have to complete both tablescans and build the entire hash table before we can start the index range scan of t2. To see how nasty this plan will be I generated the outline for the query, which gave me all the hints I needed to reproduce the plan while running all_rows – so here’s the plan again, giving you a better indication of how expensive it will be to build the hash table:
------------------------------------------------------------------------------ | Id | Operation | Name | Rows | Bytes |TempSpc| Cost | ------------------------------------------------------------------------------ | 0 | SELECT STATEMENT | | 200 | 11600 | | 5854 | |* 1 | HASH JOIN | | 200 | 11600 | 23M| 5854 | | 2 | VIEW | V1 | 402K| 18M| | 5382 | | 3 | UNION-ALL | | | | | | |* 4 | TABLE ACCESS FULL | T1 | 201K| 3933K| | 2691 | |* 5 | TABLE ACCESS FULL | T1 | 201K| 3933K| | 2691 | | 6 | TABLE ACCESS BY INDEX ROWID| T2 | 100 | 1000 | | 103 | |* 7 | INDEX RANGE SCAN | T2_I1 | 100 | | | 3 | ------------------------------------------------------------------------------
See how expensive the two full tablescans are, relative to the cost of getting the 100 rows held by t2. The fastest way of running this queryto get the first 10 rows is probably to use t2 as the build table of a hash join (so that we can stop the tablescans of t1 early).
Remember – if you’re running first_rows_N as the optimizer_mode you’re in the minority, so you’re running code that has been subject to less testing than the rest of the optimizer code: you’re likely to run into some very odd bugs. Despite comments you might hear on the internet from time to time along the lines of “first_rows_N is for OLTP systems” (that’s not really correct, by the way), it’s probably best to stick with all_rows unless you can come up with a very sound argument for doing otherwise.
Footnote:
This looks as if it mighe be bug 9633142: FIRST_ROWS OPTIMIZER DOES NOT PUSH PREDICATES INTO UNION INLINE VIEW; and if it is not the same bug it is closely related to it. The bug is reported as fixed in 12.1
Update (August 2015)
The problem still reproduces in 12.1.0.2
Reblogged this on lava kafle kathmandu nepal.
Comment by lkafle — March 12, 2012 @ 3:34 am GMT Mar 12,2012 |
Jonathan
you’ve missed “less than” in data generation scripts.
I was able to reproduce similar plans using 10K & 1M rows in t1 & t2 respectively. However, I don’t see interesting word BATCHED on the 6th plan line. So my plan looks like:
For those who want to copy-paste test case: http://pastebin.com/raw.php?i=XKdYKB6d
Comment by Timur Akhmadeev — March 12, 2012 @ 7:27 am GMT Mar 12,2012 |
Timur,
Thanks, now fixed.
I don’t know why my “<=" are disappearing when I cut and paste these things, I haven't changed the way I prepare the texts for months. I've had a couple real messes when I've tried to add a comment or correction to some older notes – possibly WordPress has done something behind the scenes that means I need to watch more carefully what finally arrives. The "batched" is from another experiment I was doing at the same time – I've corrected the plan, and I copied in the wrong output.
Comment by Jonathan Lewis — March 12, 2012 @ 7:47 am GMT Mar 12,2012 |
I’ve seen something similar if I switch between HTML and Visual editing mode couple of times – I think WP performs unnecessary escaping sometimes.
Comment by Timur Akhmadeev — March 12, 2012 @ 9:32 am GMT Mar 12,2012 |
10.2.0.5 has a different plan though, with the presence of ‘UNION ALL PUSHED PREDICATE’ appearing for the view V1.
Comment by Andy — March 12, 2012 @ 8:41 am GMT Mar 12,2012 |
Andy,
I’m going to guess that you forgot to set the optimizer_mode to first_rows_10, and got a plan like this:
Comment by Jonathan Lewis — March 12, 2012 @ 8:49 am GMT Mar 12,2012 |
Hi Jonathan,
Yes, that’s the plan, but only when I set it to ‘all_rows’. If it is first_rows_10, the plan is still the same, except the rows estimate at line 2 becomes 28.
I can only get the ‘faulty’ plan (ie, full table scan on T1) if T1 is built with10,000 rows. At 1 million rows for T1, I can’t get this bad plan.
Am I missing something?
Comment by Andy — March 12, 2012 @ 9:02 am GMT Mar 12,2012 |
I tried Timur’s script on 10.2.0.5 , Solaris 10, and the plan is ‘correct'( now showing T2 as the build table):
Comment by Andy — March 12, 2012 @ 9:20 am GMT Mar 12,2012
Andy,
No, you’re not missing anything. I’ve just run the test against 10.2.0.3 and 10.2.0.5 and got the correct (i.e. view with pushed predicate) plan. However 11.2.0.3 is consistent with 11.1.0.7, producing the wrong plan.
So it looks like it may be a bug introduced in 11g.
Comment by Jonathan Lewis — March 12, 2012 @ 9:34 am GMT Mar 12,2012
Andy, most likely you are running with altered optimizer_index_cost_adj/optimizer_index_caching settings (like me). I’ve added ALTER SESSION to change these settings in the script to default values of 100/0.
Comment by Timur Akhmadeev — March 12, 2012 @ 9:36 am GMT Mar 12,2012
This seems to be a ubiquitous problem with FIRST_ROWS; it appears to act like an /*+ INDEX (T) */ (no index specified), i.e. it uses an index if at all possible, no matter how bad.
Here’s a more pathological example, (Kyte’s pagination); note the completely inappropriate use of SH.CUSTOMERS_GENDER_BIX :
Comment by Jeff Jacobs, Oracle Ace — March 12, 2012 @ 3:15 pm GMT Mar 12,2012 |
There are huge differences between FIRST_ROWS(N) algorithm and old fashioned FIRST_ROWS. FIRST_ROWS tries to bypass physical sort always when possible. It could result into unexpected index use.
Regards
Pavol Babel
Comment by Pavol Babel — March 29, 2012 @ 4:23 pm BST Mar 29,2012 |
My example was from R12.01.
Comment by Jeff Jacobs, Oracle Ace — March 12, 2012 @ 3:16 pm GMT Mar 12,2012 |
Why do you always disable (you mean delete?) system statistics in your tests? We all want to have repeatable behaviour in our tests, why does this help? In your book about the cost optimizer you disabled “system statistics (CPU costing)”. CPU costing? How is CPU costing related to this?
Comment by Xenofon Grigoriadis — March 13, 2012 @ 11:59 am GMT Mar 13,2012 |
Xenofon,
“CPU Costing” and “System statistics” are synonyms for the same mechanism. Because of the version-dependency in the rules for using system statistics, I call dbms_stats.delete_system_stats and set _optimizer_cost_model to “io” to ensure that I’ve disabled system statistics.
As far as repeatable results go – the use of system statistics will change the cost of a tablescan relative to an index access path in ways that are dependent on the actual numbers in the sys.aux_stats$ table, getting rid of them will ensure that people don’t see different plans because they have different stats.
There are cases though (distributed queries come to mind) where certain features cannot appear unless system statistics are enabled, so sometimes I will explicitly set some values for the stats using dbms_stats.set_system_stats().
Comment by Jonathan Lewis — March 31, 2012 @ 6:31 am BST Mar 31,2012 |
A little more experimentation shows that the problem I describe is apparenly only for FIRST_ROWS hint, without a number qualifier, i.e. the “_N” (or the older “(N)”) when hinting a query.
Comment by Jeff Jacobs, Oracle Ace — March 13, 2012 @ 6:16 pm GMT Mar 13,2012 |
Hi Jeff
Could you please provide the full test case for the explain plan that has got you the problem with first_rows hint.?
Comment by George Joseph — March 15, 2012 @ 1:47 pm GMT Mar 15,2012 |
The test case is the query in my original comment against the standard SH demo schema. Note that this “bug” appears to be only for FIRST_ROWS, without the number of rows being specified, either via FIRST_ROWS_n, or FIRST_ROWS (n). The latter generate appropriate execution plans. Also, “FIRST_ROWS” by itself is apparently deprecated in 11GR2.
Comment by Jeff Jacobs, Oracle Ace — March 15, 2012 @ 4:34 pm GMT Mar 15,2012 |
Jeff,
Despite appearing in recent code released by Oracle Corp. the old first_rows hint was originally labelled in 9i as “available only for backwards compatibility”. As Pavol indicated, some of its paths (or avoidance of paths) were dictated by simple rules – so it’s fairly easy to find cases where first_rows and first_rows(n) – even with n = 1 – differ dramatically in their outcome.
Comment by Jonathan Lewis — March 31, 2012 @ 6:24 am BST Mar 31,2012 |
Jeff’s example was messed up by the way that WordPress reads the > and < symbols as markers for html directives. I think a number of lines of the various inline views were deleted – leaving the oddity of a statement ending with “rownum = :1”.
Comment by Jonathan Lewis — March 31, 2012 @ 6:18 am BST Mar 31,2012 |
[…] posting by Jonathan Lewis is about a problem as well as being about a hash […]
Pingback by Log Buffer #263, A Carnival of the Vanities for DBAs | The Pythian Blog — March 16, 2012 @ 6:02 am GMT Mar 16,2012 |