Functions

July 7, 2007

Functions

Filed under: Hints,Oracle,Performance,Statspack,Tuning — Jonathan Lewis @ 8:25 pm BST Jul 7,2007

The following question appeared on the Oracle Forums recently:

The use of functions – a function with other selects (eg. calculate availability of a part) – is slowing down our system when we do a select over our product file.
Is there some kind of rule when functions should be used or when we should try to create a more complex – combined – SQL that does not use the function.
Can functions be used in the where clause without loosing a lot of speed?

It’s a really good question, because it prompts such a lot of ideas that need to be tied together, so I thought I’d jot down a few thoughts.

First off – I’m not too surprised to hear that the function slows things down during a “select over the product table”. It’s relatively easy to get into situations where a function is being called once for every single row being returned – or worse, once for every single row examined. Even if the function is extremely efficient it’s still possible for the total resource usage to be huge.

Thoughts, then, about pl/sql functions that call SQL – make sure that the SQL is as efficient as possible – which means ensuring that it doesn’t even run unless necessary!

A standard optimisation method is to ‘remember’ the last set of values for the input parameters and output values so that you don’t have to run the SQL if the latest inputs are the same as the previous inputs. (Packaged global variables help). Extensions on this idea include building arrays of input and output values if the number of possible inputs isn’t too big.

If you have to call the SQL, you may be able to write it with stacks of hints so that Oracle always takes the most efficient path. (After all, it’s your function, you know how it’s supposed to be used – if you’re lucky there will be only one efficient option).

Of course, 10g release 2 has made deterministic functions work – nearly – so you may not have to worry too much about getting clever inside the PL/SQL. Another “near” solution to minimising the resource threats from functions without getting clever in the PL/SQL is to wrap them inside “scalar subqueries” – which will invoke ‘scalar subquery caching’ (which I have mentioned on my website in the past) in earlier versions of Oracle.

For example, change:

select
	func1(t1.owner)
from
	t1
;

to the slightly counter-intuitive:

select
	(select func1(t1.owner) from dual)
from
	t1
;

This is not guaranteed to reduce the number of times the function is called – there is a limit on the number of input and output values cached (for more details see Chapter 9 of Cost Based Oracle) but it may help if there are between a few dozen and a couple of hundred input values. (The cache is a fairly small hash-table – 256 or 1024 values for numbers, depending on Oracle versions – and hash collisions will occur).

This brings us to the last part of the question – what happens if the function is in the where clause? The results can be disastrous, although you may be rescued by the scalar subquery method. At one site I visited recently, a statspack snapshot of one hour showed that the most frequently executed statement has been called 35 million times; the next statement in the list had been executed just 35,000 times. That’s a good clue that the critical statement is in a function that appears in a where clause – and the table in question has been hit with a full tablescan.

If scalar subquery caching can’t help then you probably do need to take the statement out of the function and embed it in the main statement. There’s no telling what the best strategy may be – but you may find that the approach of modifying the statement from the function to access the entire data set and produce a result table – either with a no_merge view, or a materialized factored subquery – is a good starting step – then join this result into the original statement.

I’m going to leave it at that and see if anyone else wants to throw in other “generic” solutions for taking SQL out of functions.

Comments (12)

12 Comments »

Nice article.

This is something that has been occupying my mind for a while. At a current client, I’m seeing SQL statements with a significant number of nested function lookups that are killing performance (made worse when they are both arguments in an NVL). Given the number of rows, I tend towards rewriting the SQL using materialised WITHs, as suggested, and in general this works well.

One trade off is how this fits with test driven development and unit testing approaches – i.e. a little function lookup is easily tested and reused. But good database code relies on thinking in sets and performance is usually much more important.

Therefore, as ever, it depends.

Comment by dombrooks — July 8, 2007 @ 7:33 am BST Jul 8,2007 | Reply
I seem to recall someone pointing out a while ago that in the code …

Nvl(a.column,some_function(a.column))

… the function is executed even when a.column is not null, thus it was preferable to use Coalesce() or Decode(), which don’t suffer from that problem.

Comment by David Aldridge — July 8, 2007 @ 1:20 pm BST Jul 8,2007 | Reply
David, there are certainly some cases with nulls and functions where the normal ‘short-circuit evaluation’ does not apply. I don’t recall all the details but I think the same would be true for decode and coalesce() – it might be case that behaves nicely.

Comment by Jonathan Lewis — July 8, 2007 @ 1:34 pm BST Jul 8,2007 | Reply
Hi Jonathan,

I’m poring over a huge SELECT query at the moment which gives me these statistics in a trace file:
```
	call     count       cpu    elapsed       disk      query
	------- ------  -------- ---------- ---------- ----------
	Parse        1      0.04       0.04          0          0
	Execute      1     27.93      27.45          0          0
	Fetch       22      1.81       3.22        194     180379
	------- ------  -------- ---------- ---------- ----------
	total       24     29.78      30.71        194     180379
```
The query returns ~2000 rows and the execution plan (as well as the row source from the trace) looks OK. But the “Execute” time is really hurting us.

There is a PL/SQL function that’s being called from several places in the query. It looks optimized enough. But still, it forces a context switch every time it’s executed. Your article gave me the idea of trying to replace it with a subquery or merging it directly in the main SELECT.

(1) Could this huge execution time originate in something else than the abovementioned function?

(1a) What, exactly, makes up the “Execute” time of a SELECT query? I’ve searched the web and the Oracle docu but there is no definition for this (that I have found). Do you have a list (or a pointer to some article?)

(2) How much, in general, do context switches from SQL to PL/SQL “cost”? Say, in terms of LIOs or some other meaningful “currency”?

Thanks for any hints (and thanks for this great site!)

Kaspar

Comment by Kaspar Pflugshaupt — July 9, 2007 @ 7:03 am BST Jul 9,2007 | Reply
@Jonathan
I remember an article of “AskTom” that say that (in general) deterministic function are not cached in any way (I’m talking of Oracle 9).

As deterministic function are used mainly for the creation of function-based index, can we create some function-based index to “store” the result of the function so optimizing the query?

Thank you.

Comment by Antonio — July 9, 2007 @ 9:26 am BST Jul 9,2007 | Reply
Jonathan:

If the function in the WHERE clause is not doable in sql directly, is there any percentage in wrapping everything except the function as an in-line view and then applying the function to that? Something like:
```
SELECT *
FROM (SELECT stuff FROM table 
      WHERE predicates and
            rownum &gt; 0 -- to prevent merging)
WHERE f(param) = value
```
Awesome site by the way.

John

Comment by John Spencer — July 10, 2007 @ 5:02 pm BST Jul 10,2007 | Reply
Kaspar,

> (1) Could this huge execution time originate in something
> else than the abovementioned function?

Well sure, but it should be pretty easy to comment out the function calls and see how your performance changes, no? I’ve done that very successfully and occasionally rewrite the function call as an inline query using WITH.

> (1a) What, exactly, makes up the “Execute” time of a SELECT query? I’ve
> searched the web and the Oracle docu but there is no definition for
> this (that I have found).

It’s simply the elapsed time or “wall time” (how much time on the wall clock).

> (2) How much, in general, do context switches from SQL to PL/SQL “cost”?
> Say, in terms of LIOs or some other meaningful “currency”?

The answer is a big “it depends”.

Comment by Stew Stryker — July 10, 2007 @ 7:28 pm BST Jul 10,2007 | Reply
@Stew,

Thanks for the hints. The huge “Execute” time has turned out to be, actually, parsing time. I did not know this before, but whenever parsing includes Bind Variable Peeking, the time used for it will appear in the “Execute” row in the trace file. When I replaced all bind vars by literal values, the time suddenly appeared in the “Parse” line.

Off to bring down that parse time now…

Comment by Kaspar Pflugshaupt — July 13, 2007 @ 9:21 am BST Jul 13,2007 | Reply
[…] Lewis schreibt in seinem Artikel Functions über eine oft gestellte Frage: The use of functions – a function with other selects (eg. […]

Pingback by Sven’s Technik-Blog » Blog Archive » Functions — October 19, 2007 @ 3:36 pm BST Oct 19,2007 | Reply
It will be interesting to see the effect of the 11g result cache on the crippling usage of function lookups that I often see.

Comment by dombrooks — October 20, 2007 @ 6:46 am BST Oct 20,2007 | Reply
Dom,

It is entertaining to wonder whether the “result cache” was the original target behind the code for scalar subquery caching, or whether things went the other way round and someone said “hey we could make this subquery caching thing much more general. (Ditto deterministic functions).

I can see a time when it will be cheaper (and risk-free) to use a function in the select list to translate an id into a description than it will be to use a join. (As it often would be already if deterministic functions or scalar subquery caching worked 100% of the time).

Comment by Jonathan Lewis — October 20, 2007 @ 10:44 am BST Oct 20,2007 | Reply
[…] optimization of plsql function calls in sql, this note from Jonathan provides more details. Leave a […]

Pingback by PARALLEL_ENABLE & Deterministic « Oracle — October 27, 2010 @ 6:15 am BST Oct 27,2010 | Reply