A **question came up on Oracle-l** list-server a few days ago about how Oracle calculates costs for a scalar subquery in the select list. The question included an example to explain the point of the question. I’ve reproduced the test below, with the output from an 18.3 test system. The numbers don’t match the numbers produced in the original posting but they are consistent with the general appearance.

rem
rem Script: ssq_costing.sql
rem Author: Jonathan Lewis
rem Dated: May 2019
rem Purpose:
rem
rem Last tested
rem 18.3.0.0
rem 12.2.0.1
rem
create table t_1k ( n1 integer ) ;
create table t_100k ( n1 integer ) ;
insert into t_1k
select
level
from dual
connect by level <= 1e3;
insert into t_100k
select level
from dual
connect by level <= 1e5;
commit ;
begin
dbms_stats.gather_table_stats ( null, 'T_1K') ;
dbms_stats.gather_table_stats ( null, 'T_100K') ;
end ;
/
explain plan for
select
/*+ qb_name(QB_MAIN) */
(
select /*+ qb_name(QB_SUBQ) */ count(*)
from t_1k
where t_1k.n1 = t_100k.n1
)
from t_100k
;
select * from table(dbms_xplan.display);
-----------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-----------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 100K| 488K| 1533K (2)| 00:01:00 |
| 1 | SORT AGGREGATE | | 1 | 4 | | |
|* 2 | TABLE ACCESS FULL| T_1K | 1 | 4 | 17 (0)| 00:00:01 |
| 3 | TABLE ACCESS FULL | T_100K | 100K| 488K| 36 (9)| 00:00:01 |
-----------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - filter("T_1K"."N1"=:B1)

The key point to note is this – the scalar subquery has to execute 100,000 times because that’s the number of rows in the driving table. The cost for executing the scalar subquery once is 17 – so the total cost of the query should be 1,700,036 – not 1,533K (and for execution plans the K means x1000, not x1024). There’s always room for rounding errors, of course, but a check of the 10053 (CBO trace) file shows the numbers to be 17.216612 for the **t_1k** tablescan, 36.356072 for the **t_100K** tablescan, and 1533646.216412 for the whole query. So how is Oracle managing to get a cost that looks lower than it ought to be?

There’s plenty of scope for experimenting to see how the numbers change – and my first thought was simply to see what happens as you change the number of distinct values in the **t_100K.n1** column. It would be rather tedious to go through the process of modifying the data a few hundred times to see what happens, so I took advantage of the **get_column_stats()** and **set_column_stats()** procedures in the **dbms_stats** package to create a PL/SQL loop that faked a number of different scenarios that lied about the actual table data.

delete from plan_table;
commit;
declare
srec dbms_stats.statrec;
n_array dbms_stats.numarray;
m_distcnt number;
m_density number;
m_nullcnt number;
m_avgclen number;
begin
dbms_stats.get_column_stats(
ownname => user,
tabname => 't_100k',
colname => 'n1',
distcnt => m_distcnt,
density => m_density,
nullcnt => m_nullcnt,
srec => srec,
avgclen => m_avgclen
);
for i in 1 .. 20 loop
m_distcnt := 1000 * i;
m_density := 1/m_distcnt;
dbms_stats.set_column_stats(
ownname => user,
tabname => 't_100k',
colname => 'n1',
distcnt => m_distcnt,
density => m_density,
nullcnt => m_nullcnt,
srec => srec,
avgclen => m_avgclen
);
execute immediate
'
explain plan set statement_id = ''' || m_distcnt ||
'''
for
select
/*+ qb_name(QB_MAIN) */
(
select /*+ qb_name(QB_SUBQ) */ count(*)
from t_1k
where t_1k.n1 = t_100k.n1
)
from t_100k
';
end loop;
end;
/

The code is straightforward. I’ve declared a few variables to hold the column stats from the **t_100k.****n1** column, called **get_column stats(****)**, then looped 20 times through a process that changes the number of distinct values (and corresponding density) recorded in the column stats, then used execute immediate to call *“explain plan”* for the original query.

You’ll notice I’ve given each plan a separate **statement_id** that corresponds to the **num_distinct** that generated the plan. In the code above I’ve changed the **num_distinct** from 1,000 to 20,000 in steps of 1,000.

Once the PL/SQL block ends I’ll have a plan table with 20 execution plans stored in it and, rather than reporting those plans with calls to **dbms_xplan.display()**, I’m going to be selective about which rows and columns I report.

select
statement_id,
io_cost,
io_cost - lag(io_cost,1) over (order by to_number(statement_id)) io_diff,
cpu_cost,
cpu_cost - lag(cpu_cost,1) over (order by to_number(statement_id)) cpu_diff,
cost
from
plan_table
where
id = 0
order by
to_number(statement_id)
;

I’ve picked **id = 0** (the top line of the plan) for each **statement_id** and I’ve reported the **cost** column, which is made up of the **io_cost** column plus a scaled down value of the **cpu_cost** column. I’ve also used the analytic **lag()** function to calculate how much the **io_****cost** and **cpu_cost** changed from the previous **statement_id**. Here are my results from 18c:

STATEMENT_ID IO_COST IO_DIFF CPU_COST CPU_DIFF COST
------------------------------ ---------- ---------- ---------- ---------- ----------
1000 17033 1099838920 17253
2000 34033 17000 2182897480 1083058560 34470
3000 51033 17000 3265956040 1083058560 51686
4000 68033 17000 4349014600 1083058560 68903
5000 85033 17000 5432073160 1083058560 86119
6000 102033 17000 6515131720 1083058560 103336
7000 119033 17000 7598190280 1083058560 120553
8000 136033 17000 8681248840 1083058560 137769
9000 153033 17000 9764307400 1083058560 154986
10000 170033 17000 1.0847E+10 1083058560 172202
11000 197670 27637 1.2608E+10 1760725019 200191
12000 338341 140671 2.1570E+10 8962036084 342655
13000 457370 119029 2.9153E+10 7583261303 463200
14000 559395 102025 3.5653E+10 6499938259 566525
15000 647816 88421 4.1287E+10 5633279824 656073
16000 725185 77369 4.6216E+10 4929119846 734428
17000 793452 68267 5.0565E+10 4349223394 803565
18000 854133 60681 5.4431E+10 3865976350 865019
19000 908427 54294 5.7890E+10 3459031472 920005
20000 957292 48865 6.1003E+10 3113128324 969492

The first pattern that hits the eye is the constant change of 17,000 in the **io_cost** in the first few lines of the output. For “small” numbers of distinct values the (IO) cost of the query is (33 + 17 * **num_distinct**) – in other words, the arithmetic seems to assume that it will execute the query once for each value and then cache the results so that repeated executions for any given value will not be needed. This looks as if the optimizer is trying to match its arithmetic to the *“scalar subquery caching”* mechanism.

But things change somewhere between 10,000 and 11,000 distinct values. The point comes where adding one more distinct value causes a much bigger jump in cost than 17, and that’s because Oracle assumes it’s reached a point where there’s a value that it won’t have room for in the cache and will have to re-run the subquery multiple times for that value as it scans the rest of the table. Let’s find the exact break point where that happens.

Changing my PL/SQL loop so that we calculate **m_distcnt** as *“19010 + i”* this is the output from the final query:

-- m_distcnt := 10910 + i;
STATEMENT_ID IO_COST IO_DIFF CPU_COST CPU_DIFF COST
------------------------------ ---------- ---------- ---------- ---------- ----------
10911 185520 1.1834E+10 187887
10912 185537 17 1.1835E+10 1083059 187904
10913 185554 17 1.1836E+10 1083058 187921
10914 185571 17 1.1837E+10 1083059 187938
10915 185588 17 1.1838E+10 1083058 187956
10916 185605 17 1.1839E+10 1083059 187973
10917 185622 17 1.1841E+10 1083059 187990
10918 185639 17 1.1842E+10 1083058 188007
10919 185656 17 1.1843E+10 1083059 188025
10920 185673 17 1.1844E+10 1083058 188042
10921 185690 17 1.1845E+10 1083059 188059
10922 185707 17 1.1846E+10 1083058 188076
10923 185770 63 1.1850E+10 4027171 188140
10924 185926 156 1.1860E+10 9914184 188298
10925 186081 155 1.1870E+10 9912370 188455
10926 186237 156 1.1880E+10 9910555 188613
10927 186393 156 1.1890E+10 9908741 188770
10928 186548 155 1.1900E+10 9906928 188928
10929 186703 155 1.1909E+10 9905114 189085
10930 186859 156 1.1919E+10 9903302 189243

If we have 10,922 distinct values in the column the optimizer calculates as if it will be able to cache them all; but if we have 10,923 distinct values the optimizer thinks that there’s going to be one value where it can’t cache the result and will have to run the subquery more than once.

Before looking at this in more detail let’s go to the other interesting point – when does the cost **stop** changing: we can see the cost increasing as the number of distinct values grows, we saw at the start that the cost didn’t seem to get as large as we expected, so there must be a point where it stops increasing before it “ought” to.

I’ll jump straight to the answer: here’s the output from the test when I start **num_distinct** off at slightly less than half the number of rows in the table:

-- m_distcnt := (50000 - 10) + i;
STATEMENT_ID IO_COST IO_DIFF CPU_COST CPU_DIFF COST
------------------------------ ---------- ---------- ---------- ---------- ----------
49991 1514281 9.6488E+10 1533579
49992 1514288 7 9.6489E+10 473357 1533586
49993 1514296 8 9.6489E+10 473337 1533594
49994 1514303 7 9.6490E+10 473319 1533601
49995 1514311 8 9.6490E+10 473299 1533609
49996 1514318 7 9.6491E+10 473281 1533616
49997 1514325 7 9.6491E+10 473262 1533624
49998 1514333 8 9.6492E+10 473243 1533631
49999 1514340 7 9.6492E+10 473224 1533639
50000 1514348 8 9.6493E+10 473205 1533646
50001 1514348 0 9.6493E+10 0 1533646
50002 1514348 0 9.6493E+10 0 1533646
50003 1514348 0 9.6493E+10 0 1533646
50004 1514348 0 9.6493E+10 0 1533646
50005 1514348 0 9.6493E+10 0 1533646
50006 1514348 0 9.6493E+10 0 1533646
50007 1514348 0 9.6493E+10 0 1533646
50008 1514348 0 9.6493E+10 0 1533646
50009 1514348 0 9.6493E+10 0 1533646
50010 1514348 0 9.6493E+10 0 1533646

The cost just stops changing when **num_distinct** = half the rows in the table.

### Formulae

During the course of these experiments I had been exchanging email messages with **Nenad Noveljic** via the Oracle-L list-server (**full monthly archive here**) and he came up with the suggesion of a three-part formula that assumed a cache size and gave a cost of

*“tablescan cost + num_distinct * subquery unit cost”* for values of **num_distinct** up to the cache size;
- then, for values of
**num_distinct** greater than the cache_size and up to half the size of the table added a marginal cost representing the probability that some values would not be cached;
- then for values of
**num_distinct** greater than half the number of rows in the table reported the cost associated with **num_distinct** = half the number of rows in the table.

Hence:

- for 1 <= num_distinct <= 10922, cost = (33 + num_distinct + 17)
- for 10,923 <= num_distinct <= 50,000, cost = (33 + 10,922 * 17) + (1 – 10,922/num_distinct) * 100,000 * 17
- for 50,000 <= num_distinct <= 100,000, cost = cost(50,000).

The middle line needs a little explanation: ( 1-10,922 / num_distinct ) is the probability that a value will not be in the cache; this has to be 100,000 to give the expected number of rows that will not be cached, and then multiplied by 17 as the cost of running the subquery for those rows.

The middle line can be re-arranged as 33 + 17 * (10,922 + (1 – 10,922/num_distinct) * 100,000)

### Tweaking

At this point I could modify my code loop to report the calculated value for the cost and compare it with the actual cost to show you that the two values didn’t quite match. Instead I’ll jump forward a little bit to a correction that needs to be made to the formula above. It revolves around how Oracle determines the cache size. There’s a hidden parameter (which I mentioned in CBO Fundamentals) that controls scalar subquery caching. In the book I think I only referenced it in the context of subqueries in the *“where”* clause. The parameter is *“_query_execution_cache_max_size”* and has a default value of 131072 (*power(2,7)*) – so when I found that the initial formula didn’t quite work I made the following observation:

- 131072 / 10922 = 12.00073
- 131072 / 12 = 10922.666…

So I put 1092.66667 into the formula to see if that would improve things.

For the code change I added a variable **m_cost** to the PL/SQL block, and set it inside the loop as follows:

m_cost := round(33 + 17 * (10922.66667 + 100000 * (1 - (10922.66667 / m_distcnt))));

Then in the *“execute immediate”* I changed the *“explain plan”* line to read:

explain plan set statement_id = ''' || lpad(m_distcnt,7) || ' - ' || lpad(m_cost,8) ||

This allowed me to show the formula’s prediction of (IO)cost in final output, and here’s what I got for values of **num_distinct** in the region of 10,922:

STATEMENT_ID IO_COST IO_DIFF CPU_COST CPU_DIFF COST
------------------------------ ---------- ---------- ---------- ---------- ----------
10911 - 183901 185520 1.1834E+10 187887
10912 - 184057 185537 17 1.1835E+10 1083059 187904
10913 - 184212 185554 17 1.1836E+10 1083058 187921
10914 - 184368 185571 17 1.1837E+10 1083059 187938
10915 - 184524 185588 17 1.1838E+10 1083058 187956
10916 - 184680 185605 17 1.1839E+10 1083059 187973
10917 - 184836 185622 17 1.1841E+10 1083059 187990
10918 - 184992 185639 17 1.1842E+10 1083058 188007
10919 - 185147 185656 17 1.1843E+10 1083059 188025
10920 - 185303 185673 17 1.1844E+10 1083058 188042
10921 - 185459 185690 17 1.1845E+10 1083059 188059
10922 - 185615 185707 17 1.1846E+10 1083058 188076
10923 - 185770 185770 63 1.1850E+10 4027171 188140
10924 - 185926 185926 156 1.1860E+10 9914184 188298
10925 - 186081 186081 155 1.1870E+10 9912370 188455
10926 - 186237 186237 156 1.1880E+10 9910555 188613
10927 - 186393 186393 156 1.1890E+10 9908741 188770
10928 - 186548 186548 155 1.1900E+10 9906928 188928
10929 - 186703 186703 155 1.1909E+10 9905114 189085
10930 - 186859 186859 156 1.1919E+10 9903302 189243

The formula is only supposed to work in the range 10923 – 50,000, so the first few results don’t match; but in the range 10,923 to 10,930 the match is exact. Then, in the region of 50,000 we get:

STATEMENT_ID IO_COST IO_DIFF CPU_COST CPU_DIFF COST
------------------------------ ---------- ---------- ---------- ---------- ----------
49991 - 1514281 1514281 9.6488E+10 1533579
49992 - 1514288 1514288 7 9.6489E+10 473357 1533586
49993 - 1514296 1514296 8 9.6489E+10 473337 1533594
49994 - 1514303 1514303 7 9.6490E+10 473319 1533601
49995 - 1514311 1514311 8 9.6490E+10 473299 1533609
49996 - 1514318 1514318 7 9.6491E+10 473281 1533616
49997 - 1514325 1514325 7 9.6491E+10 473262 1533624
49998 - 1514333 1514333 8 9.6492E+10 473243 1533631
49999 - 1514340 1514340 7 9.6492E+10 473224 1533639
50000 - 1514348 1514348 8 9.6493E+10 473205 1533646
50001 - 1514355 1514348 0 9.6493E+10 0 1533646
50002 - 1514363 1514348 0 9.6493E+10 0 1533646
50003 - 1514370 1514348 0 9.6493E+10 0 1533646
50004 - 1514377 1514348 0 9.6493E+10 0 1533646
50005 - 1514385 1514348 0 9.6493E+10 0 1533646
50006 - 1514392 1514348 0 9.6493E+10 0 1533646
50007 - 1514400 1514348 0 9.6493E+10 0 1533646
50008 - 1514407 1514348 0 9.6493E+10 0 1533646
50009 - 1514415 1514348 0 9.6493E+10 0 1533646
50010 - 1514422 1514348 0 9.6493E+10 0 1533646

Again, the formula applies only in the range up to 50,000 (half the rows in the table) – and the match is perfect in that range.

### Next steps

The work so far gives us some idea of the algorithm that the optimizer is using to derive a cost, but this is just one scenario and there are plenty of extra questions we might ask. What, as the most pressing one, is the significance of the number 12 in the calculation 131,072/12. From previous experience I guess that is was related to the length of the input and output values of the scalar subquery – as in “value **X** for n1 returns value **Y** for count(*)”.

To pursue this idea I recreated the data sets using **varchar2(10)** as the definition of **n1** and **lpad(rownum,10) **as the value – the “breakpoint” dropped from 10,922 down to 5,461. Checking the arithmetic 131,072 / 5461 = 24.001456, then 131,072/24 = 5461.333… And that’s the number that made fhe formular work perfectly for the modified data set.

Then I set used **set_column_stats()** to hack the **avg_col_,len** of **t_100K.n1** to 15 and the break point dropped to 4,096. Again we do the two arithmetic steps: 131072/4096 = 32 (but then we don’t need to do the reverse step since the first result is integral).

Checking the original data set when **n1** was a numeric the **avg_col_len** was 5, so we have three reference points:

- Avg_col_len = 5. “Cache unit size” = 12
- Avg_col_len = 11. Cache unit size = 24 (don’t forget the
**avg_col_len** includes the length byte, so our padded **varchar2(10)** has a length of 11).
- Avg_col_len = 15, Cache unit size = 32

There’s an obvious pattern here: *“Cache unit size”* = (2 x **avg_col_len** + 2). Since I hadn’t been changing the **t_1k.n1** column at the same time, that really does look like a deliberate factor of 2 (I’d thought intially that maybe the 12 was affected by the lengths of both columns in the predicate – but that doesn’t seem to be the case.)

The scientific method says I should now make a prediction based on my hypothesis – so I set the **avg_col_len** for t_100K.n1 to 23 and guessed that the break point would be at 2730 – and it was. (131072 / (2 * 23 + 2) = 2730.6666…) .

The next question, of course, is *“where does the “spare 2″ come from?”* Trying to minimize the change in the code I modified my subquery to select **sum(to_number(n1))** rather than **count(*)**, then to **avg(to_number(n1)****)** – remember I had changed **n1** to a **varchar2(10)** that looked like a number left-padded with spaces. In every variant of the tests I’d done so far all I had to do to get an exact match between the basic formula and the optimizer’s cost calculation was to use “2 * avg_col_len + **22**” as the cache unit size – and 22 is the nominal maximum length of an internally stored numeric column.

Bottom line: the cache unit size seems to be related to the input and output values, but I don’t know why there’s a factor of 2 applied to the input column length, and I don’t know why the length of **count(*)** is deemed to be 2 when other derived numeric outputs use have the more intuitive 22 for their length.

### tl;dr

The total cost calculation for a scalar subquery in the select list is largely affected by:

- a fixed cache size (131,072 bytes) possibly set by hidden parameter _query_execution_cache_max_size
- the avg_col_len of the input (correlating) column(s) from the driving table
- the nominal length of the output (select list) of the subquery

There is an unexplained factor of 2 used with the avg_col_len of the input, and a slightly surprising value of 2 if the output is simply count(*).

If the number N of distinct values for the driving column(s) is less than the number of possible cache entries the effect of the scalar subquery is to add N * estimated cost of executing the subquery once. As the number of distinct values for the driving column(s) goes above the limit then the incremental effect of the subquery is based on the expected number of times an input value will not be cached. When the number of distinct values in the driving column(s) exceeds half the number of rows in the driving table the cost stops increasing – there is no obvious reason when the algorithm does this.

There are many more cases that I could investigate at this point – but I think this model is enough as an indication of general method. If you come across a variation where you actually need to work out how the optimizer derived a cost then this framework will probably be enough to get you started in the right direction.