There’s a thread on the OTN database forum at present asking for advice on optimising a query that’s trying to find “the most recent price” for a transaction given that each transaction is for a stock item on a given date, and each item has a history of prices where each historic price has an effective start date. This means the price for a transaction is the price as at the most recent date prior to the transaction date.
General strategy
There is an absolutely standard way of expressing “the most recent occurrence” in SQL. Assume we have a table of (item_code, effective_date, price) with the obvious primary key of (item_code, effective_date), then a requirement to find “the most recent price for item XXXX as at 25th Dec 2014” case would give us code like the following (note – all the examples in this note were run against Oracle 11.2.0.4):
select * from prices pri1 where item_code = 'XXXX' and effective_date = ( select max(effective_date) from prices pri2 where pri2.item_code = 'XXXX' and pri2.effective_date <= date'2014-12-25' -- > comment to avoid wordpress format issue ) /
The ideal execution plan that we should expect to see for this query is as follows (with a small variation if you had created the prices table as an index-organized table – which would probably be sensible in many cases):
----------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | ----------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | 52 | 2 (0)| 00:00:01 | | 1 | TABLE ACCESS BY INDEX ROWID | PRICES | 1 | 52 | 2 (0)| 00:00:01 | |* 2 | INDEX UNIQUE SCAN | PRI_PK | 1 | | 1 (0)| 00:00:01 | | 3 | SORT AGGREGATE | | 1 | 32 | | | | 4 | FIRST ROW | | 1 | 32 | 2 (0)| 00:00:01 | |* 5 | INDEX RANGE SCAN (MIN/MAX)| PRI_PK | 1 | 32 | 2 (0)| 00:00:01 | ----------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 2 - access("ITEM_CODE"='XXXX' AND "EFFECTIVE_DATE"= (SELECT MAX("EFFECTIVE_DATE") FROM "PRICES" "PRI2" WHERE "PRI2"."EFFECTIVE_DATE".ge.TO_DATE(' 2014-12-25 00:00:00', 'syyyy-mm-dd hh24:mi:ss') AND "PRI2"."ITEM_CODE"='XXXX')) 5 - access("PRI2"."ITEM_CODE"='XXXX' AND "PRI2"."EFFECTIVE_DATE".ge. TO_DATE('2014-12-25 00:00:00', 'syyyy-mm-dd hh24:mi:ss'))
As you can see, this plan is using the “driving subquery” approach – the order of operation is 5, 4, 3, 2, 1, 0: we do an index min/max range scan in line 5 to find the maximum effective date for the item, then pass that up through the (essentially redundant) First Row and Sort Aggregate operations to use as an input to the index unique scan at operation 2 which passes the rowid up to operation 1 to find the specific row. In my case this was 2 consistent gets for the range scan, 2 more for the unique scan, and one for the table access.
You might point out that my example uses the item_code ‘XXXX’ twice, once in the main query, once in the subquery; and you might decide that this was in very poor taste since we should clearly be using a correlated subquery – the correlating predicate ought to be: pri2.item_code = pri1.item_code. Here’s the execution plan I got when I made that change:
---------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | ---------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | 78 | 3 (0)| 00:00:01 | | 1 | NESTED LOOPS | | 1 | 78 | 3 (0)| 00:00:01 | | 2 | NESTED LOOPS | | 1 | 78 | 3 (0)| 00:00:01 | | 3 | VIEW | VW_SQ_1 | 1 | 26 | 2 (0)| 00:00:01 | |* 4 | FILTER | | | | | | | 5 | HASH GROUP BY | | 1 | 32 | 2 (0)| 00:00:01 | |* 6 | INDEX RANGE SCAN | PRI_PK | 1 | 32 | 2 (0)| 00:00:01 | |* 7 | INDEX UNIQUE SCAN | PRI_PK | 1 | | 0 (0)| 00:00:01 | | 8 | TABLE ACCESS BY INDEX ROWID| PRICES | 1 | 52 | 1 (0)| 00:00:01 | ---------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 4 - filter("PRI2"."ITEM_CODE"='XXXX') 6 - access("PRI2"."ITEM_CODE"='XXXX' AND "PRI2"."EFFECTIVE_DATE".ge. TO_DATE('2014-12-25 00:00:00', 'syyyy-mm-dd hh24:mi:ss')) 7 - access("ITEM_CODE"='XXXX' AND "EFFECTIVE_DATE"="MAX(EFFECTIVE_DATE)")
The plan changes dramatically, the optimizer has unnested the subquery. In my case this didn’t make any difference to the overall performance as my data set was small, I only had one or two prices per item code, and the query was very basic; but in most other cases the change could be catastrophic.
The Specific Problem
The requirement on OTN had a stock transactions (xo_stock_trans) table and a prices (xo_prices) table, and the OP had supplied some code to create and populate these tables with 6.4 million and 4.5 million rows respectively. Unfortunately the xo_prices table didn’t have a suitable unique constraint on it and ended up with lots of items having multiple prices for the same date. The OP had created a function to return a price for an item given a driving date and price_type, and had a query that called that function three times per row (once for each of three price types); but this did not perform very well and the OP wanted to know if there was a way of addressing the requirement efficiently using pure SQL; (s)he had already tried the following:
rem rem Script: most_recent_2.sql rem Author: Jonathan Lewis rem Dated: Jan 2015 rem Purpose: rem select tr.item, tr.trans_date, tr.quantity , pr.gross_price , pr.net_price , pr.special_price from xo_stock_trans tr join xo_prices pr on pr.item = tr.item and pr.price_date = (select max(pr2.price_date) from xo_prices pr2 where pr2.item = pr.item and pr2.price_date <= tr.trans_date -- > comment for wordpress ) where tr.trans_date between '01-AUG-2014' and '31-AUG-2014';
That was SO close – it’s clearly implementing the right sort of strategy: but it didn’t perform well, so let’s check the execution plan:
------------------------------------------------------------------------------------------------------------ | Id | Operation | Name | Rows | Bytes |TempSpc| Cost (%CPU)| Time | ------------------------------------------------------------------------------------------------------------ | 0 | SELECT STATEMENT | | 1 | 70 | | 168M(100)|234:06:13 | | 1 | NESTED LOOPS | | 1 | 70 | | 168M(100)|234:06:13 | | 2 | NESTED LOOPS | | 9 | 70 | | 168M(100)|234:06:13 | | 3 | NESTED LOOPS | | 9 | 450 | | 168M(100)|234:06:13 | | 4 | VIEW | VW_SQ_1 | 286 | 10010 | | 168M(100)|234:06:11 | | 5 | HASH GROUP BY | | 286 | 7722 | | 168M(100)|234:06:11 | | 6 | MERGE JOIN | | 456G| 11T| | 9153K(100)| 12:42:50 | | 7 | SORT JOIN | | 202K| 2960K| | 548 (2)| 00:00:03 | |* 8 | INDEX RANGE SCAN | XO_STOCK_TRANS_IX2 | 202K| 2960K| | 548 (2)| 00:00:03 | |* 9 | SORT JOIN | | 4045K| 46M| 154M| 19043 (6)| 00:01:36 | |* 10 | INDEX FAST FULL SCAN | XO_PRICES_IX1 | 4045K| 46M| | 1936 (10)| 00:00:10 | |* 11 | TABLE ACCESS BY USER ROWID| XO_STOCK_TRANS | 1 | 15 | | 1 (0)| 00:00:01 | |* 12 | INDEX RANGE SCAN | XO_PRICES_IX1 | 1 | | | 2 (0)| 00:00:01 | | 13 | TABLE ACCESS BY INDEX ROWID | XO_PRICES | 1 | 20 | | 3 (0)| 00:00:01 | ------------------------------------------------------------------------------------------------------------ Predicate Information (identified by operation id): --------------------------------------------------- 8 - access("TR"."TRANS_DATE">=TO_DATE(' 2014-08-01 00:00:00', 'syyyy-mm-dd hh24:mi:ss') AND "TR"."TRANS_DATE"<=TO_DATE(' 2014-08-31 00:00:00', 'syyyy-mm-dd hh24:mi:ss')) -- > comment for wordpress 9 - access(INTERNAL_FUNCTION("PR2"."PRICE_DATE")<=INTERNAL_FUNCTION("TR"."TRANS_DATE")) -- > comment for wordpress filter(INTERNAL_FUNCTION("PR2"."PRICE_DATE")<=INTERNAL_FUNCTION("TR"."TRANS_DATE")) -- > comment for wordpress 10 - filter("PR2"."PRICE_DATE"<=TO_DATE(' 2014-08-31 00:00:00', 'syyyy-mm-dd hh24:mi:ss')) -- > comment for wordpress 11 - filter("TR"."TRANS_DATE">=TO_DATE(' 2014-08-01 00:00:00', 'syyyy-mm-dd hh24:mi:ss') AND "TR"."TRANS_DATE"<=TO_DATE(' 2014-08-31 00:00:00', 'syyyy-mm-dd hh24:mi:ss')) -- > comment for wordpress 12 - access("ITEM_1"="PR"."ITEM" AND "PR"."PRICE_DATE"="MAX(PR2.PRICE_DATE)") filter("PR"."ITEM"="TR"."ITEM")
The query was limited to August 2014, which was about 198,000 rows in my table, so we might expect some signs of a brute-force approach (tablescans and hash joins rather than indexes and nested loops) – but what we get ends up with a high-precision approach with a very bad cardinality estimate after a brute-force unnesting of the “max(price_date)” subquery. The unnesting has done a range scan over 200,000 stock_trans rows, and an index fast full scan on 4.5 million prices to do a merge join and hash aggregation to find the maximum price_date for each target row in the xo_stock_trans table. (See my earlier posting on table duplication for a variation and explanation of what Oracle has done here). This step is a lot of work, but the optimizer thinks it’s going to produce only 286 rows in the aggregated result, so the next steps in the plan are indexed nested loops – which actually operate 198,000 times.
With the clue from my initial description, we need to aim for a strategy where Oracle doesn’t unnest that subquery – so let’s experiment with a basic /*+ no_unnest */ hint in the subquery and see what happens. Here’s the resulting execution plan:
-------------------------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes |TempSpc| Cost (%CPU)| Time | -------------------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 527 | 18445 | | 6602M (1)|999:59:59 | |* 1 | FILTER | | | | | | | |* 2 | HASH JOIN | | 3423M| 111G| 5336K| 76973 (90)| 00:06:25 | |* 3 | TABLE ACCESS FULL | XO_STOCK_TRANS | 202K| 2960K| | 2531 (13)| 00:00:13 | | 4 | TABLE ACCESS FULL | XO_PRICES | 4571K| 87M| | 2275 (11)| 00:00:12 | | 5 | SORT AGGREGATE | | 1 | 12 | | | | | 6 | FIRST ROW | | 1 | 12 | | 3 (0)| 00:00:01 | |* 7 | INDEX RANGE SCAN (MIN/MAX)| XO_PRICES_IX1 | 1 | 12 | | 3 (0)| 00:00:01 | -------------------------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 1 - filter("PR"."PRICE_DATE"= (SELECT /*+ NO_UNNEST */ MAX("PR2"."PRICE_DATE") FROM "XO_PRICES" "PR2" WHERE "PR2"."PRICE_DATE"<=:B1 AND "PR2"."ITEM"=:B2)) -- > comment for wordpress 2 - access("PR"."ITEM"="TR"."ITEM") 3 - filter("TR"."TRANS_DATE">=TO_DATE(' 2014-08-01 00:00:00', 'syyyy-mm-dd hh24:mi:ss') AND "TR"."TRANS_DATE"<=TO_DATE(' 2014-08-31 00:00:00', 'syyyy-mm-dd hh24:mi:ss')) -- > comment for wordpress 7 - access("PR2"."ITEM"=:B1 AND "PR2"."PRICE_DATE"<=:B2)
The subquery now survives, and we can see a min/max range scan in the plan – but the subquery is a filter() subquery and is applied to the result of joining the 200,000 transactions to every price that applies for the item in each transaction. The optimizer thinks that this join will produce roughly 3.4 million rows but in fact with the sample data set (which had many prices per item) the join resulted in 4.4 Billion rows. The min/max subquery is as efficient as it can be, but it’s running far too often; ideally we would like it to run at most once per transaction, so why is it running late ? We could try adding the /*+ push_subq */ hint to the subquery but if we do the plan doesn’t change.
Our rapid “most recent occurrence” revolved around accessing the prices table by index while “pre-querying” for the date using a min/max subquery that knew the relevant item code already. In this case, though, we’re doing a full tablescan of the xo_prices table so the method doesn’t apply. So let’s manipulate the query to force an indexed access path for the join to the xo_prices table by adding the hints /*+ leading(tr pr) use_nl(pr) index(pr) */ to the main body of the query. This is the resulting plan:
-------------------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | -------------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 527 | 18445 | 6614M (1)|999:59:59 | | 1 | NESTED LOOPS | | 3413K| 113M| 11M (2)| 16:29:13 | | 2 | NESTED LOOPS | | 3413K| 113M| 11M (2)| 16:29:13 | |* 3 | TABLE ACCESS FULL | XO_STOCK_TRANS | 202K| 2960K| 2531 (13)| 00:00:13 | |* 4 | INDEX RANGE SCAN | XO_PRICES_IX1 | 16 | | 52 (2)| 00:00:01 | | 5 | SORT AGGREGATE | | 1 | 12 | | | | 6 | FIRST ROW | | 1 | 12 | 3 (0)| 00:00:01 | |* 7 | INDEX RANGE SCAN (MIN/MAX)| XO_PRICES_IX1 | 1 | 12 | 3 (0)| 00:00:01 | | 8 | TABLE ACCESS BY INDEX ROWID | XO_PRICES | 17 | 340 | 59 (2)| 00:00:01 | -------------------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 3 - filter("TR"."TRANS_DATE">=TO_DATE(' 2014-08-01 00:00:00', 'syyyy-mm-dd hh24:mi:ss') AND "TR"."TRANS_DATE"<=TO_DATE(' 2014-08-31 00:00:00', 'syyyy-mm-dd -- > comment for wordpress hh24:mi:ss')) 4 - access("PR"."ITEM"="TR"."ITEM") filter("PR"."PRICE_DATE"= (SELECT /*+ NO_UNNEST */ MAX("PR2"."PRICE_DATE") FROM "XO_PRICES" "PR2" WHERE "PR2"."PRICE_DATE"<=:B1 AND "PR2"."ITEM"=:B2)) 7 - access("PR2"."ITEM"=:B1 AND "PR2"."PRICE_DATE"<=:B2) -- > comment for wordpress
We’re nearly there, the shape of the execution plan – lines 4 to 7, at any rate – matches the shape of the very simple example at the start of this article, we seem to be driving from the min/max subquery at line 7; unfortunately when we look at the predicate section of line 4 of the plan we can see that the subquery is still a filter() subquery not an access() subquery – it’s (nominally) being performed for every index entry in the range scan of the xo_prices index that we do for each xo_stock_trans row. What we want to see is an access() subquery – and checking the SQL we can see how to get there: the subquery currently correlates the item back to the xo_prices table, not to the xo_stock_trans table, so let’s correct that correlation. Here’s our final query (though not formatted to my preference – see end-note) with execution plan:
select /*+ leading(tr pr) use_nl(pr) index(pr) */ -- hint added tr.item, tr.trans_date, tr.quantity , pr.gross_price , pr.net_price , pr.special_price from xo_stock_trans tr join xo_prices pr on pr.item = tr.item and pr.price_date = (select /*+ no_unnest */ -- hint added max(pr2.price_date) from xo_prices pr2 where pr2.item = tr.item -- correlate to tr, not pr and pr2.price_date <= tr.trans_date -- > comment for wordpress ) where tr.trans_date between '01-AUG-2014' and '31-AUG-2014' ; -------------------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | -------------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 3423M| 111G| 1824K (1)| 02:32:02 | | 1 | NESTED LOOPS | | 3413K| 113M| 1824K (1)| 02:32:02 | | 2 | NESTED LOOPS | | 3413K| 113M| 1824K (1)| 02:32:02 | |* 3 | TABLE ACCESS FULL | XO_STOCK_TRANS | 202K| 2960K| 2531 (13)| 00:00:13 | |* 4 | INDEX RANGE SCAN | XO_PRICES_IX1 | 16 | | 2 (0)| 00:00:01 | | 5 | SORT AGGREGATE | | 1 | 12 | | | | 6 | FIRST ROW | | 1 | 12 | 3 (0)| 00:00:01 | |* 7 | INDEX RANGE SCAN (MIN/MAX)| XO_PRICES_IX1 | 1 | 12 | 3 (0)| 00:00:01 | | 8 | TABLE ACCESS BY INDEX ROWID | XO_PRICES | 17 | 340 | 9 (0)| 00:00:01 | -------------------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 3 - filter("TR"."TRANS_DATE">=TO_DATE(' 2014-08-01 00:00:00', 'syyyy-mm-dd hh24:mi:ss') AND "TR"."TRANS_DATE"<=TO_DATE(' 2014-08-31 00:00:00', 'syyyy-mm-dd -- > comment for wordpress hh24:mi:ss')) 4 - access("PR"."ITEM"="TR"."ITEM" AND "PR"."PRICE_DATE"= (SELECT /*+ NO_UNNEST */ MAX("PR2"."PRICE_DATE") FROM "XO_PRICES" "PR2" WHERE "PR2"."PRICE_DATE"<=:B1 AND -- > comment for wordpress "PR2"."ITEM"=:B2)) 7 - access("PR2"."ITEM"=:B1 AND "PR2"."PRICE_DATE"<=:B2) -- > comment for wordpress
Finally we can see (from the predicate for line 4) the we run the subquery at most once for each row from xo_stock_trans and we use the result of each subquery execution to drive the index range scan to pick up the matching rows from xo_prices with no further filtering. The order of operation is: 3, 7, 6, 5, 4, 2, 8, 1, 0
The only thing we can do now is decide whether the strategy for indexing into the xo_prices table 200,000 times (for our 30 day requirement) is better than a brute force approach that does a massive join and sort, or a data duplication approach that puts a “price end date” on each xo_prices row to avoid the need to check all prices for an item to find the appropriate one. Ultimately the choice may depend on trading off the human development resources against the machine run-time resources, with an eye on the number of times the query runs and the size of the date range typically involved.
Footnote:
There’s plenty more I could say about this query and how to handle it – but there are too many questions about the correctness of the data definition and content to make it worth pursuing in detail. You will note, however, that the various execution plans which logically should be returning the same data report dramatically different cardinalities for the final row source; if nothing else this should warn you that maybe the optimizer is going to have trouble producing a good plan because it’s model produced a bad cardinality estimate at some point in a series of transformations.
In fact, when I first saw this query I converted to traditional Oracle syntax (anticipating, incorrectly, a need to do something messy with hints), corrected the subquery correlation to the “obvious” choice, and put in a cardinality hint /*+ cardinality(tr 100) */ for the xo_stock_trans table, and got the execution plan that I’ve managed to produce as the final plan above.
Tactically the correlation column is the really important bit – if that can be set up suitably we just have to work around the optimizer’s arithmetic assumptions.
End-note
For completeness, and for my own benefit, I’ve reformatted the adjusted query to match my preferences:
select /*+ leading(tr pr) use_nl(pr) index(pr) */ -- hint added tr.item, tr.trans_date, tr.quantity, pr.gross_price, pr.net_price, pr.special_price from xo_stock_trans tr join xo_prices pr on pr.item = tr.item and pr.price_date = ( select /*+ no_unnest */ -- hint added max(pr2.price_date) from xo_prices pr2 where pr2.item = tr.item -- correlate to tr, not pr and pr2.price_date <= tr.trans_date -- > comment for wordpress ) where tr.trans_date between to_date('01-AUG-2014','dd-mon-yyyy') and to_date('31-AUG-2014','dd-mon-yyyy') ;
I have no database available at the moment to test it with your data, but my “standard way” for queries like this is KEEP LAST/FIRST as described at http://rwijk.blogspot.de/2012/09/keep-clause.html
For time critical queries I test different approaches and usually the KEEP clause wins.
Regards
Marcus
Comment by Marcus — January 8, 2015 @ 8:07 am GMT Jan 8,2015 |
Marcus,
Thanks for the link – there are a few ways of addressing the problem, and the optimum choice typically depends on the volume of data, the ratio of the two data sets, and the most important resource bottleneck. The threat point from the analytic approach is the volume of data that might have to be sorted, of course.
Comment by Jonathan Lewis — January 14, 2015 @ 1:49 pm GMT Jan 14,2015 |
Yes, and therefore it is important to know more than one way to get your results and to test them with adequate data. It also helps (myself and any successor) to write a comment in your code WHY you choose the approach. The comment might even have more lines than the SQL :-)
Most developers, even novices, will be able to understand the subquery while the KEEP clause is disappointingly unfamiliar. So whenever I use it I include a link to the documentation and to Rob Wijk’s blog post.
Comment by Marcus — January 15, 2015 @ 7:38 am GMT Jan 15,2015 |
Marcus,
“a comment in your code WHY you choose the approach”
Absolutely – including at least a brief note of the options you discarded and why, and the pattern of the data that made the choice a good one. Generally that’s what I advise as minimum documentation for any SQL statement that required some special design or optimisation effort. (Not forgetting the expected execution plan, of course).
Comment by Jonathan Lewis — January 15, 2015 @ 7:46 am GMT Jan 15,2015
Another trick to use is a materialized view, which calculates the most recent price in the background. It’s not good for stock trading systems, where it’s of essential importance that the price is the real latest price, but it’s good enough for most of the web stores.
Clustering stock and price tables also helps and decreases the number of blocks read.
Comment by mgogala — January 8, 2015 @ 1:05 pm GMT Jan 8,2015 |
mladen,
Agreed, this type of approach can be very effective – though it is restricted to “current” rather than “most recent prior to given point in time”.
Rate of change – and rate of refresh – becomes very important of course. In some cases I’ve suggested a base table with trigger rather than materialized views, so that the base table is “current price” but any relevant update insert a clone row into a history table.
I’ll second the clustering strategy – co-location of related data items can make an enormous difference to performance; I mentioned the index-organized table above for the same reason.
Comment by Jonathan Lewis — January 14, 2015 @ 1:54 pm GMT Jan 14,2015 |
[…] https://jonathanlewis.wordpress.com/2015/01/07/most-recent/ […]
Pingback by get the most recent history | idlesummerbreeze — January 8, 2015 @ 3:51 pm GMT Jan 8,2015 |
Hi Jonathan,
I might be missing something obvious, but could you please explain why modifying the subquery’s correlation column ITEM from pr.item to tr.item (the last modification you applied) allowed the subquery to change its behaviour from a filter() type to an access() type subquery?
Regards,
Jure Bratina
Comment by Jure Bratina — January 13, 2015 @ 10:37 pm GMT Jan 13,2015 |
Jure,
It’s not obvious why it makes a difference because in a perfect world it wouldn’t – and that’s why I showed you the simpler example first. The correlated version of the simple query could (it seems) use transitive closure to copy down the ‘XXXX’ then find the driving subquery approach.
In the same way, transitive closure could push the tr.item into the subquery – and given the fact the join is a nested loop the optimizer could “know” that all the rows we would currently want to examine from the xo_prices table have the same value of tr.item code and therefore could allow a path using a driving subquery.
Comment by Jonathan Lewis — January 14, 2015 @ 2:20 pm GMT Jan 14,2015 |
Thank you Jonathan.
Comment by Jure Bratina — January 16, 2015 @ 10:48 am GMT Jan 16,2015 |
Hi,
What are you views on using an approach such as:
SELECT * FROM (
SELECT col1, col2, ROW_NUMBER() OVER (ORDER BY col3 DESC) r
)
WHERE r=1
as opposed to a subquery? I know that will force sorting, however it avoids the need to run the subquery for each row doesn’t it? Can you elaborate?
Thanks
Comment by John — March 25, 2015 @ 12:22 pm GMT Mar 25,2015 |
See the discussion (and link) started by comment #1 above.
There are several different ways of achieving the same result – best choice is dependent on data volume, pattern, and the general complexity of the query.
Comment by Jonathan Lewis — March 26, 2015 @ 9:46 am GMT Mar 26,2015 |
[…] very fast way to find the lowest or highest values when you have the appropriate index – the index range scan (min/max) – but you have to access the table twice, once for the low, once for the high. Here’s […]
Pingback by Counting | Oracle Scratchpad — December 12, 2015 @ 3:54 pm GMT Dec 12,2015 |
[…] from, or make local) some indexes that they’ve previously created to handle queries of the “most recent occurence” […]
Pingback by Min/Max upgrade | Oracle Scratchpad — May 31, 2018 @ 2:13 pm BST May 31,2018 |
[…] is “Most Recent” because the commonest requirement for a query of this shape is “find the most recent row matching the following predicates”, even though in this case the interpretation is […]
Pingback by Most Recent – 2 | Oracle Scratchpad — June 29, 2020 @ 1:03 pm BST Jun 29,2020 |