Oracle Scratchpad

March 3, 2023

Empty indexes

Filed under: Indexing,Oracle — Jonathan Lewis @ 3:01 pm GMT Mar 3,2023

Here’s a little quiz that came to light on a database running 11g. It’s one of those anomalies that is likely to get noticed only in fairly extreme cases and it’s about DDL rather than DML so it shouldn’t be happening very often on anyone’s system.

Read through the following script, which simply creates a table, populates it with 4 million rows and creates an index, then answer the question that follows:

rem
rem     Script:         pt_iot_oddity.sql
rem     Author:         Jonathan Lewis
rem     Dated:          March 2023
rem     Purpose:        
rem
rem     Last tested 
rem             19.11.0.0
rem             11.2.0.4
rem

create table t1 (
        key1            varchar2(20),
        secondary       varchar2(10),
        other           varchar2(10),
        constraint t1_pk primary key (key1)
)
/

insert into t1( key1, secondary, other)
with generator as (
        select 
                rownum id
        from dual 
        connect by 
                level <= 1e4    -- > comment to avoid WordPress format issue
)
select
        lpad(rownum,20),
        null,
        lpad(rownum,10)
from
        generator v1,
        generator v2
where
        rownum <= 4e6
/

begin
        dbms_stats.gather_table_stats(
                ownname     => null,
                tabname     => 'T1',
                method_opt  => 'for all columns size 1'
        );
end;
/

create index t1_i1 on t1(secondary) parallel 4;

Take note that the index is a single column B-tree index on a column which is NULL for every single row; and remind yourselves that completely null entries not recorded in B-tree indexes – if you run the script you will find that the index has no entries: user_indexes.num_rows = 0. So the question is: how many rows does Oracle sort while creating the index?

To help you out, perhaps, here’s the execution plan:

------------------------------------------------------------------------------------------------------------------
| Id  | Operation                | Name     | Rows  | Bytes | Cost (%CPU)| Time     |    TQ  |IN-OUT| PQ Distrib |
------------------------------------------------------------------------------------------------------------------
|   0 | CREATE INDEX STATEMENT   |          |  4000K|    26M|  1183   (7)| 00:00:06 |        |      |            |
|   1 |  PX COORDINATOR          |          |       |       |            |          |        |      |            |
|   2 |   PX SEND QC (ORDER)     | :TQ10001 |  4000K|    26M|            |          |  Q1,01 | P->S | QC (ORDER) |
|   3 |    INDEX BUILD NON UNIQUE| T1_I1    |       |       |            |          |  Q1,01 | PCWP |            |
|   4 |     SORT CREATE INDEX    |          |  4000K|    26M|            |          |  Q1,01 | PCWP |            |
|   5 |      PX RECEIVE          |          |  4000K|    26M|   816  (10)| 00:00:05 |  Q1,01 | PCWP |            |
|   6 |       PX SEND RANGE      | :TQ10000 |  4000K|    26M|   816  (10)| 00:00:05 |  Q1,00 | P->P | RANGE      |
|   7 |        PX BLOCK ITERATOR |          |  4000K|    26M|   816  (10)| 00:00:05 |  Q1,00 | PCWC |            |
|   8 |         TABLE ACCESS FULL| T1       |  4000K|    26M|   816  (10)| 00:00:05 |  Q1,00 | PCWP |            |
------------------------------------------------------------------------------------------------------------------

Note
-----
   - estimated index size: 100M bytes


And here’s another little bit of information (cut-n-paste after creating the index) just in case you didn’t believe me when I said completely null keys don’t appear in Oracle’s B-tree indexes:

SQL> select index_name, num_rows from user_indexes where table_name = 'T1';

INDEX_NAME             NUM_ROWS
-------------------- ----------
T1_PK                   4059833
T1_I1                         0


Special note for Richard Foote: you’re allowed to think about the question, but you’re not allowed to answer it in the comments.

Answer

The starting thought to this one is this: we know that completely null entries don’t get into B-tree indexes so when, in the plan above, does Oracle discard the nulls. There are three possible places where it might do so:

  • At the table access full
  • At the sort
  • At the index build

I’m not counting the “PX send” as that’s more or less tied into the table “access full”, similarly the “PX receive” is tied into the sort operation.

Ideally, of course, we’d hope that Oracle would discard the nulls as soon as possible – i.e. during the table access full. There are potentiall 4 ways (at least) that we can check to see what actually happens:

  • Looking at the SQL Monitor output
  • Looking at the dbms_xplan.display_cursor() output with rowsource execution stats enabled
  • Looking at the v$pq_tqstat information immediately after creating the index.
  • Looking at the output from the 10046 trace
  • Looking at the results of a 10032 (sort) trace

Here’s the execution plan body from the SQL Monitor report in 11.2.0.4:

SQL Plan Monitoring Details (Plan Hash Value=2666861883)
===============================================================================================================================================================================================
| Id |         Operation          |   Name   |  Rows   | Cost |   Time    | Start  | Execs |   Rows   | Read | Read  | Write | Write |  Mem  | Temp  | Activity |       Activity Detail       |
|    |                            |          | (Estim) |      | Active(s) | Active |       | (Actual) | Reqs | Bytes | Reqs  | Bytes | (Max) | (Max) |   (%)    |         (# samples)         |
===============================================================================================================================================================================================
|  0 | CREATE INDEX STATEMENT     |          |         |      |         1 |     +4 |     9 |        4 |      |       |       |       |       |       |          |                             |
|  1 |   PX COORDINATOR           |          |         |      |         4 |     +1 |     9 |        4 |      |       |       |       |       |       |    18.18 | Cpu (2)                     |
|  2 |    PX SEND QC (ORDER)      | :TQ10001 |      4M |      |         1 |     +4 |     4 |        4 |      |       |       |       |       |       |          |                             |
|  3 |     INDEX BUILD NON UNIQUE | T1_I1    |         |      |         1 |     +4 |     4 |        4 |      |       |       |       |       |       |          |                             |
|  4 |      SORT CREATE INDEX     |          |      4M |      |         2 |     +3 |     4 |       4M |  922 |  48MB |   243 |  48MB |   95M |   52M |    27.27 | Cpu (3)                     |
|  5 |       PX RECEIVE           |          |      4M |  816 |         3 |     +2 |     4 |       4M |      |       |       |       |       |       |     9.09 | Cpu (1)                     |
|  6 |        PX SEND RANGE       | :TQ10000 |      4M |  816 |         2 |     +2 |     4 |       4M |      |       |       |       |       |       |    27.27 | Cpu (3)                     |
|  7 |         PX BLOCK ITERATOR  |          |      4M |  816 |         1 |     +3 |     8 |       4M |      |       |       |       |       |       |          |                             |
|  8 |          TABLE ACCESS FULL | T1       |      4M |  816 |         3 |     +1 |    56 |       4M |  247 | 165MB |       |       |       |       |    18.18 | Cpu (1)                     |
|    |                            |          |         |      |           |        |       |          |      |       |       |       |       |       |          | db file sequential read (1) |
===============================================================================================================================================================================================


Under “Rows (actual)” we can see operation 4 (Sort create index) supplying 4M rows to operation 3 (Index Build). If that’s not entirely convincing, here’s the output from v$pq_tqstat:

DFO_NUMBER      TQ_ID SERVER_TYPE     INSTANCE PROCESS           NUM_ROWS      BYTES ROW_SHARE DATA_SHARE      WAITS   TIMEOUTS AVG_LATENCY
---------- ---------- --------------- -------- --------------- ---------- ---------- --------- ---------- ---------- ---------- -----------
         1          0 Ranger                 1 QC                      12        528    100.00     100.00          5          1           0
                      Producer               1 P004               1007528   14126554     25.19      25.19         10          1           0
                                             1 P005                930555   13047324     23.26      23.26          8          1           0
                                             1 P006               1008101   14134600     25.20      25.20          7          0           0
                                             1 P007               1053828   14775738     26.35      26.35          5          0           0
                      Consumer               1 P000               1362213   19099470     34.06      34.06          3          0           0
                                             1 P001               1155741   16204542     28.89      28.89          3          0           0
                                             1 P002                695813    9755950     17.40      17.40          3          0           0
                                             1 P003                786233   11023726     19.66      19.66          3          1           0

                    1 Producer               1 P000                     1        298     25.00      25.00          0          0           0
                                             1 P001                     1        298     25.00      25.00          0          0           0
                                             1 P002                     1        298     25.00      25.00          0          0           0
                                             1 P003                     1        298     25.00      25.00          0          0           0
                      Consumer               1 QC                       4       1192    100.00     100.00          2          0           0


Here we see the first producer (tablescan) PX processes passing 4 million rows to the first consumer processes. which then turn around to be producers and pass a single row each to the query co-ordinator. That doesn’t really tell us anything about when the consumers discarded the 4M rows, of course, but it does confirm that 4M rows passed at least some way up the tree; so here’s an extrace from each of the trace files for the consumers turned producer showing how many rows each sorted:

 grep -A+4 "\- Sort Statistics " test_p00[0-9]*165*.trc

test_p000_16583.trc:---- Sort Statistics ------------------------------
test_p000_16583.trc-Initial runs                              2
test_p000_16583.trc-Number of merges                          1
test_p000_16583.trc-Input records                             1362213
test_p000_16583.trc-Output records                            1362213
--
test_p001_16587.trc:---- Sort Statistics ------------------------------
test_p001_16587.trc-Initial runs                              2
test_p001_16587.trc-Number of merges                          1
test_p001_16587.trc-Input records                             1155741
test_p001_16587.trc-Output records                            1155741
--
test_p002_16591.trc:---- Sort Statistics ------------------------------
test_p002_16591.trc-Input records                             695813
test_p002_16591.trc-Output records                            695813
test_p002_16591.trc-Total number of comparisons performed     4230527
test_p002_16591.trc-  Comparisons performed by in-memory sort 4230527
--
test_p003_16595.trc:---- Sort Statistics ------------------------------
test_p003_16595.trc-Input records                             786233
test_p003_16595.trc-Output records                            786233
test_p003_16595.trc-Total number of comparisons performed     4778408
test_p003_16595.trc-  Comparisons performed by in-memory sort 4778408

Compare the Pnnn identifiers with the trace file names, and compare the count of consumed rows with the sort input/output record counts: the PX processes did sort the rows that received before going into the “index build” routine and discarding the null entries.

This is just one scenario of several – it’s a simple heap table, with a parallel create index of a non-unique index in an old version of Oracle. Would it be the same for a unique index, for an index on a partitioned heap table (especially if the number of partitions matched the degree of parallelism), would the type of partitioning matter, and how would things change (if at all) for an index organized table – and what about doing all the combinations of tests on a version of Oracle that is still in support.

You may have wondered, in passing, why the script for this example was called pt_iot_oddity.sql. It’s because someone asked me if I could think of a reason why it took 15 minutes to create an empty index on a partitioned index-organized table when running parallel 8. Admittedly it was a table with 650 million rows, but the time was almost all CPU, not I/O.

Here’s one variation on the test: upgrading to 19c (19.11.0.0) and creating a unique index – I’m just going to show the v$pq_tqstat information and one of the 10032 trace file extracts:

DFO_NUMBER      TQ_ID SERVER_TYPE     INSTANCE PROCESS           NUM_ROWS      BYTES ROW_SHARE DATA_SHARE      WAITS   TIMEOUTS AVG_LATENCY
---------- ---------- --------------- -------- --------------- ---------- ---------- --------- ---------- ---------- ---------- -----------
         1          0 Ranger                 1 QC                      12       9820    100.00     100.00          2          0           0
                      Producer               1 P004               1027485   14399730     25.67      25.67         19          4           0
                                             1 P005                954287   13374652     23.84      23.84         19          4           0
                                             1 P006                993201   13919984     24.82      24.82         19          4           0
                                             1 P007               1027428   14399502     25.67      25.67         15          0           0
                      Consumer               1 P000                     0         96      0.00       0.00        807        804           0
                                             1 P001               4000000   56083760    100.00     100.00        808        803           0
                                             1 P002                     0         96      0.00       0.00      19740      19737           0
                                             1 P003                     0         96      0.00       0.00      19740      19737           0

                    1 Producer               1 P000                     1        370     25.00      25.00         26         21           0
                                             1 P001                     1        370     25.00      25.00        210         94           0
                                             1 P002                     1        370     25.00      25.00         26         21           0
                                             1 P003                     1        370     25.00      25.00         28         23           0
                      Consumer               1 QC                       4       1480    100.00     100.00          5        842           0


---- Sort Statistics ------------------------------
Initial runs                              1
Input records                             4000000
Output records                            4000000

One thing that didn’t change – the consumer PX processes sorted the data they received; one thing that did change – only one of the consumers received data. The difference is in the change from non-unique to unique, not from 11g to 19c.

A non-unique index is “made unique” by the addition of the table’s rowid to the key value so that – amongst other things – if you have multiple rows with the same key value in a table block they are adjacent in the index. The rowid for a row pointed to by a unique index is “carried” as extra data and is not treated as if it were part of the key.

For parallel index creation this means for a unique index where every key is actually NULL Oracle has to “distribute” all the key values to the same PX process. For a non-unique index Oracle is trying to distribute the combination (key, rowid) evenly between the PX processes – so each PX process should get a different set of null keys corresponding to non-overlapping rowid ranges from the table.

Update July 2023

When I first tweeted the publication of this note someone asked why you would create a completely empty index. In a case of “life imitating art”, a question has just come up on one of the Oracle Developer Forums asking why it takes so long to create an index on a 12 billion row table when you have just added a column which will initially be completely null and you want to index it.

10 Comments »

  1. […] Empty indexes (Mar 2023): completely null entries do not get stored in B-tree indexes, but does Oraclel include them in the big index sort? […]

    Pingback by Indexing Catalogue | Oracle Scratchpad — March 3, 2023 @ 3:06 pm GMT Mar 3,2023 | Reply

  2. Taking a guess: “4 million,” because if the answer were zero, you wouldn’t be blogging about this phenomenon. =P

    I didn’t notice any asterisks in the execution plan, which I’m assuming means there’s no filter to remove those rows from the table scan or sort.

    Comment by Kaley Crum — March 3, 2023 @ 6:54 pm GMT Mar 3,2023 | Reply

    • Kaley,

      Thanks for the comment, your guess is backed by a very sound argument, but I’m only saying that because it’s my blog and it’s an Oracle question.

      Regards
      Jonathan Lewis

      Comment by Jonathan Lewis — March 4, 2023 @ 1:05 am GMT Mar 4,2023 | Reply

  3. Sort by (null, rowid), so all rows?

    Comment by Sayan Malakshinov — March 4, 2023 @ 12:50 am GMT Mar 4,2023 | Reply

    • Sayan,

      Thanks for the comment, that’s a very good basis for your suggestion and an important reminder about non-unique indexes for everyone else.

      But what would the answer be if I’d declared the index to be unique ?

      Regards
      Jonathan Lewis

      Comment by Jonathan Lewis — March 4, 2023 @ 1:07 am GMT Mar 4,2023 | Reply

  4. Jonathan,
    Hm.. I suspect in this case due to the unique indexes’ structure (and logic), “sort unique” would sort only by “secondary” column and eliminate all null rows, ie it can’t put NULL values into the array of sorted values.

    PS. To be honest, I don’t even know which px slave would “receive” such null values in case of “range” distribution…

    Comment by Sayan Malakshinov — March 4, 2023 @ 1:28 am GMT Mar 4,2023 | Reply

    • Sayan,

      There’s “reasonable assumption” and then there’s “what actually happens” (and then there’s retro-fitting an explanation for why it happened that way).

      Regards
      Jonathan Lewis

      Comment by Jonathan Lewis — March 4, 2023 @ 1:39 am GMT Mar 4,2023 | Reply

      • Ah, I’m sorry, I’ve attempted to guess the answer without running the test case because, intuitively, I feel it’s unfair to answer quizzes after thoroughly debugging them. So I’ve just run and traced it, and now I see what we get in both cases, and I see how it was distributed between slaves: https://gist.github.com/xtender/d4404bc4f794e17b28dab5b4cd2acc69

        Comment by Sayan Malakshinov — March 4, 2023 @ 3:46 am GMT Mar 4,2023 | Reply

        • Sayan,

          No need to apologise – you did what I wanted people to do, think about what might happen, what ought to happen, and come up with a test to show what does happen.

          In fact you’ve also used Oracle 19 for your test (I assume from the trace file names) – and it looks like there’s an enhancement between 11g and 19c that changes the distibribution when the index is unique.

          Key (no pun intended) feature, of course, is that Oracle doesn’t discard “null rows” early, and this does make a bit of a difference when there are 500M rows in the table that won’t be in the index.

          Regards
          Jonathan Lewis

          Comment by Jonathan Lewis — March 5, 2023 @ 10:17 am GMT Mar 5,2023

  5. *null values == null keys of the sorted array

    Comment by Sayan Malakshinov — March 4, 2023 @ 1:30 am GMT Mar 4,2023 | Reply


RSS feed for comments on this post. TrackBack URI

Comments and related questions are welcome.

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Website Powered by WordPress.com.