Oracle Scratchpad

August 22, 2022

Encryption oddity

Filed under: Bugs,LOBs,Oracle,Troubleshooting — Jonathan Lewis @ 12:34 am BST Aug 22,2022

Here’s a strange problem (with possible workaround) that appeared in a thread on the Oracle developer forum a couple of days ago. It looks like the sort of problem that might be a memory overflow problem in a rarely use code path, given that it seems to need a combination of:

  • move LOB column
  • varchar2() declared with character semantics
  • transparent data encryption (TDE)

Here’s a simple script to generate a tiny data set that demonstrates the problem in my 19.11.0.0 system – but you’ll need to enable transparent data encryption if you want it to work (see footnote).

rem
rem     Script:         encryption_problem.sql
rem     Author:         Solomon Yakobson / Jonathan Lewis
rem     Dated:          August 2022
rem
rem     Last tested
rem             21.3.0.0 
rem             19.11.0.0
rem

drop table test_tbl_move purge;

create table test_tbl_move(
        encrypted_column varchar2(9 char) 
                encrypt using 'aes192' 'sha-1' no salt 
                constraint ttm_nn_ec not null,
        clob_column      clob
)
lob (clob_column) store as securefile
;

insert into test_tbl_move
values( '123456789', 'x')
;

commit;

begin
        dbms_stats.gather_table_stats(
                ownname     => null,
                tabname     => 'test_tbl_move',
                method_opt  => 'for all columns size 1'
        );
end;
/

alter table test_tbl_move move lob(clob_column) store as securefile;

The script creates a table with a clob column (enable storage in row by default), and a single encrypted varchar2() column declared using character semantics viz: “9 CHAR“.

We insert a row, gather stats – just in case – and then try to move the LOB storage (which also has to move the table). Would you expect a result like the following:

alter table test_tbl_move move lob(clob_column) store as securefile
*
ERROR at line 1:
ORA-12899: value too large for column ??? (actual: 27, maximum: 9)

That “maximum: 9” suggests that Oracle is complaining about encrypted_column – but why? Before worrying about that question, though, I wanted to pursue a different point: if you check the original post and compare the error message with the one above you’ll see that the “actual:” value was different. So I ran the entire test again, and again, and again; and by the time I had executed the entire script half a dozen times this is the list of error messages I had collected:

ORA-12899: value too large for column ??? (actual: 27, maximum: 9)
ORA-12899: value too large for column ??? (actual: 31, maximum: 9)
ORA-12899: value too large for column ??? (actual: 29, maximum: 9)
ORA-12899: value too large for column ??? (actual: 29, maximum: 9)
ORA-12899: value too large for column ??? (actual: 26, maximum: 9)
ORA-12899: value too large for column ??? (actual: 21, maximum: 9)

Your first thought might have been that Oracle is trying to copy an encrypted value while forgetting that it had been encrypted, but the variation in the actual lengths makes it look as if Oracle is injecting generating random data somehow (maybe through a pointer error) and generating garbage as a side-effect. Moreover if you know your encryption you’ll be suspicious of almost all the actual lengths reported because Oracle’s working is as follows:

  • Round the source length up to the next multiple of 16 bytes
  • aes192 encryption (the default encryption) will add 16 bytes (‘nomac’ and no salt)
  • adding a salt (default behaviour) will add a further 16 bytes
  • adding the ‘SHA-1’ integrity (the default) will add another 20 bytes

You can’t get an odd number of bytes as an encrypted value (unless, perhaps, the code thought it was reading a null-terminated character-string and there was a zero in the middle of it).

Workaround

You’ll see in the thread that Solomon Yakobson did a number of experiments to see what effects they had on the test case; but there was one experiment that he didn’t do that happened to be the first one I thought of. (There was a certain amount of luck in the choice, plus a bit of background suspicion from a couple of prior bugs I’d seen, plus it seemed to be the virtually the only thing that SY hadn’t tried).

Declaring a varchar2(x CHAR) is fairly rare – and with all the messing around with padding, encoding etc. going on, the code to handle multi-byte character sets might be a fruitful source of bugs. So I re-ran the test, but changed the declaration from varchar2(9 CHAR) to varchar2(9 [byte]), and Oracle happily ran my test to completion.

On its own this isn’t a complete workaround. If you’re running with a multi-byte character set a declaration using character semantics means Oracle allows you to store many more bytes than characters. Conversely, if you use byte semantics you will have to declare your column with a large enough byte count to store the number of (multi-byte) characters you really want – but then that could allow your users to insert more characters than you wanted (unless the character set was a fixed-width character set – but then you could waste a lot of space storing character strings – see this note about “in row” CLOB columns).

So, to use byte semantics with a character limit, you have to adopt a strategy that I once saw at a company running Peoplesoft (I assume it’s been changed since – it was a long time ago). They declared their varchar2() columns with far too many bytes (4 times the required character count) then added a check constraint on the length to restrict the number of characters. (In their case that resulted in tens of thousands of check constraints in the database with an undesirable overhead on dictionary cache latching and parse times).

Here’s an alternative declaration of the table that allows the alter table move command to work and still ensures the correct maximum number of characters in the varchar2() column:

create table test_tbl_move(
        encrypted_column varchar2(18)
                encrypt using 'aes192' 'sha-1' no salt
                constraint ttm_nn_ec not null 
                constraint ttm_ck_ec_len9 check (length(encrypted_column) <= 9),
        clob_column      clob
)
lob (clob_column) store as securefile
/

Table created.

SQL> insert into test_tbl_move values('0123456789','xxx');
insert into test_tbl_move values('0123456789','xxx')
*
ERROR at line 1:
ORA-02290: check constraint (TEST_USER.TTM_CK_EC_LEN9) violated

Footnotes

  1. If, like me, the last time you’d played around with encryption was in 11g you’ll find that a lot has changed in setting it up – not only in the added requirements for pluggable databases but also with a new command framework. (As usual Tim Hall’s blog on the topic is a good starting point if you want to do a quick experiment in a sand box.)
  2. The code sample include SHA-1 as the integrity algorithm – ‘NOMAC’ is the only alternative, and in any single table the same algorithm has to be used for all encrypted columns. (If you try to use SHA-1 on one column and NOMAC on another as you create the table Oracle will raise “ORA-28379: a different integrity algorithm has been chosen for the table”. More importantly – a note in the Oracle 21c reference manual states that SHA-1 is deprecated from that version onwards and advises moving from TDE column encryption to TDE tablespace encryption.

July 18, 2022

drop t/s bug

Filed under: Bugs,LOBs,Oracle,Partitioning,Troubleshooting — Jonathan Lewis @ 9:17 am BST Jul 18,2022

A recent thread on the MOS database admin forum (needs an account) demonstrated a number of little issues with bugs, debugging, complex syntax, and the never-ending list of “not quite complete” code that shows up when features (in this case LOBs and partitioning) collide.

It’s a silly little thing, but one to remind you that you always have to ask “What have I forgotten?”, “Does my test suite include every case I need to test?”

In this case we had been given a model where, after creating a composite partitioned table with a LOB column using a statement that referenced 3 different tablespaces, the OP had

  1. moved one component of this complex structure to a fourth tablespace
  2. dropped the tablespace that had been defined as the holder of the moved component
  3. renamed the fourth tablespace to match the name of the dropped tablespace
  4. called dbms_metadata.get_ddl() to generate a new table definition.

The result of this was that the generated statement included a reference to a tablespace with the unexpected name of “_$deleted$51$0”.

Sidenote: The name is an example of the value stored in ts$.name when you drop a tablespace. The row in ts$ is not deleted (or marked for deletion), instead ts$.online$ is set to 3 and ts$.name is set to reflect the tablespace number with a name of the format “_$deleted${ts$.ts#}$0″.

Here, with some cosmetic changes to the “create table” statement and with a preamble to create enough tablespaces and quotas, is the model supplied by the OP (if you’re thinking of running it make sure you read it carefully first):

rem
rem     Script:         drop_ts_pt_bug.sql
rem     Author:         Jean-François56 / Jonathan Lewis
rem     Dated:          July 2022
rem
rem     Last tested 
rem             19.11.0.0
rem             11.2.0.4
rem 

connect sys/sys as sysdba

prompt  =================================================
prompt  Clearing the stage - some steps may report errors
prompt  =================================================

drop table test_user.test purge;

drop tablespace jf1 including contents and datafiles;
drop tablespace jf2 including contents and datafiles;
drop tablespace jf3 including contents and datafiles;
drop tablespace jf4 including contents and datafiles;

create tablespace jf1 datafile size 100m;
create tablespace jf2 datafile size 100m;
create tablespace jf3 datafile size 100m;
create tablespace jf4 datafile size 100m;

alter user test_user quota unlimited on jf1;
alter user test_user quota unlimited on jf2;
alter user test_user quota unlimited on jf3;
alter user test_user quota unlimited on jf4;

prompt  =======================
prompt  Connecting to test user
prompt  =======================

connect test_user/test

drop table test purge;

create table test(
       idarchive                number(10,0),
       data                     blob,
       partition_date           date,
       customer                 number(10,0),
       prefix_archive_key       varchar2(5)
)
partition by range (partition_date)
subpartition by list (customer)
(
partition p1 
        values less than (to_date('2008-06-01 00:00:00', 'syyyy-mm-dd hh24:mi:ss', 'nls_calendar=gregorian'))
        tablespace jf1
        lob (data) store as basicfile (tablespace jf2 chunk 16384)
        (
        subpartition sp values (1) 
                tablespace jf3
                lob (data) store as basicfile (tablespace jf2)
        )
);

alter table test move subpartition sp lob(data) store as (tablespace jf4);


connect sys/sys as sysdba

drop tablespace jf2 including contents and datafiles;
alter tablespace jf4 rename to jf2;


connect test_user/test

set long 20000
set longchunksize 20000

set linesize 132
column text_line format a128

begin
        dbms_metadata.set_transform_param(dbms_metadata.session_transform,'PRETTY',             true);
        dbms_metadata.set_transform_param(dbms_metadata.session_transform,'TABLESPACE',         true);
        dbms_metadata.set_transform_param(dbms_metadata.session_transform,'SEGMENT_ATTRIBUTES', true);
        dbms_metadata.set_transform_param(dbms_metadata.session_transform,'STORAGE',            false);
end;
/

select dbms_metadata.get_ddl('TABLE','TEST') text_line from dual
/


The first thing I did after reading the posting was a quick search on MOS, using the four search terms: drop tablespace rename deleted. It was a lucky choice because on the first page of results from the Knowledge Base I found:

Renaming a Tablespace to An Already Dropped one Changes The Tablespace Name To "_$DELETED" (Doc ID 1937848.1)

Conveniently the notes in this document said: “This is caused by bug 18136584”,and that bug (and the problem) is labelled as “The bug is fixed in 12.2”. Unfortunately the OP was running 11.2.0.4, but that’s okay because the note also said: “Backport is feasible. However, A simple workaround is available.” So I thought I’d create the test case and check the workaround – which I why I’ve got the script.

It just so happened that I prepared, debugged and ran the script (without the workaround) on 19.11 before bothering to start up a VM with 11.2.0.4 – and I got the following output from my call to dbms_metadata.get_ddl():

  CREATE TABLE "TEST_USER"."TEST"
   (	"IDARCHIVE" NUMBER(10,0),
	"DATA" BLOB,
	"PARTITION_DATE" DATE,
	"CUSTOMER" NUMBER(10,0),
	"PREFIX_ARCHIVE_KEY" VARCHAR2(5)
   ) PCTFREE 10 PCTUSED 40 INITRANS 1 MAXTRANS 255
  TABLESPACE "TEST_8K_ASSM"
 LOB ("DATA") STORE AS SECUREFILE (
  ENABLE STORAGE IN ROW CHUNK 8192
  NOCACHE LOGGING  NOCOMPRESS  KEEP_DUPLICATES )
  PARTITION BY RANGE ("PARTITION_DATE")
  SUBPARTITION BY LIST ("CUSTOMER")
 (PARTITION "P1"  VALUES LESS THAN (TO_DATE(' 2008-06-01 00:00:00', 'SYYYY-MM-DD HH24:MI:SS', 'NLS_CALENDAR=GREGORIAN'))
PCTFREE 10 PCTUSED 40 INITRANS 1 MAXTRANS 255
  TABLESPACE "JF1"
 LOB ("DATA") STORE AS BASICFILE (
  TABLESPACE "_$deleted$36$0" ENABLE STORAGE IN ROW CHUNK 16384 RETENTION
  NOCACHE LOGGING )
 ( SUBPARTITION "SP"  VALUES (1) SEGMENT CREATION IMMEDIATE
 LOB ("DATA") STORE AS BASICFILE (
  TABLESPACE "JF2" )
  TABLESPACE "JF3"
 NOCOMPRESS ) )

If you check line 18 (highlighted) you’ll see that even in 19.11 you can end up with generated statement that references “deleted” tablespaces – and here’s a funny little side effect (cut-n-paste from SQL*Plus – but your deletion number will probably be different):

SQL> drop table test;
drop table test
           *
ERROR at line 1:
ORA-00959: tablespace '_$deleted$36$0' does not exist

SQL> drop table test purge;

Table dropped.

So (part of) the bug is still present in 19.11; and on the plus side that means I can examine the workaround to see what it is and how it works. This is what the note says: “Execute below command once you finished with dropping and renaming the tablespace”:

alter table <table_name> modify default attributes tablespace <old_tablespace_name>;

That’s not actually going to do anything for us – but it’s an important clue to how we might be able to fix things; it also suggests why the bug “fixed” in 12.2 isn’t quite fixed in 19c – someone missed a bit of the code path: maybe the bit about LOBs, or maybe the bit about composite partitioned tables, or maybe (very precisely) the bit about partition-level default values for LOBs in composite partitioned tables. (And who knows what might be missing if we start looking at index-organized tables, nested table, and other complicated structures.)

Taking the clue from the suggested workaround, here are three possible fixes to try:

alter table test modify default attributes tablespace jf2;

alter table test modify default attributes                  lob(data) (tablespace jf2);

alter table test modify default attributes for partition p1 lob(data) (tablespace jf2);

The third of these options is the one that “works” – and the word is in quote marks because all I mean is that the generated SQL uses JF2 as the tablespace name rather than the “deleted” tablespace name – I make no guarantee about how future behaviour might vary from past behaviour after this change, and it’s likely to depend on exactly how you’re expecting to add and move partitions and subpartitions anyway.

Follow-up

The problem (which means, possibly, the omitted code path) comes from the need for handling default storage clauses. When we’re handling composite partitioning the only segments that come into existence are subpartition segments – but you can still specify physical (and logical) storage information at the table and partition level with the inference (possibly not stated explicitly) that any table-level metadata should be the default metadata for the partition level and any partition-level metadata should be the default metadata for subpartitions. What does this mean in terms of our original table creation script and the subsequent call to dbms_metadata?

You’ll notice that I’ve highlighted lines 8 – 11 in the output above from dbms_metadata.

  • Line 8 references tablespace test_8k_assm: I didn’t include a default tablespace at the table level for the table segments, but that’s the tablespace that happened to be my default tablespace when I ran the script.
  • Lines 9 – 11 define a default LOB storage clause with no specified tablespace and using securefiles (which is the default LOB storage for 19c). Again I didn’t specify anything about a table-level LOB in my original definition.
  • The rest of the generated definition has, apart from the “deleted” tablespace, reproduced my original definition – including the 16KB declaration of chunk size for the partition and the lack of specified chunksize for the subpartition.

So questions to think about:

  • what chunk size is / would be used in the subpartition – is it silently picking up the value specified for the partition, or is it silently picking up the default for the table, or is it simply using the “absolute” default of 1 block?
  • what happens if I execute a simple “add subpartition” on the existing p1 partition? Where will the subpartition be stored and what will its storage details look like.
  • What will I see if I execute a simple “add partition” to add a new partition to the table. Will I also get a physical subpartition and if so where will it be and what will its storage clause look like.
  • What would the dbms_metadata output have looked like if I had had a table-level LOB definition that specified tablespace jf2?
  • What side effects might appear if I extended the definition to interval partitioning, with automatic list subpartitions, and inserted a row that needed a new partition and subpartition?!

Underlying all these detailed questions, of course, is the specification for the maintenance work that the DBA is expected to handle, viz:

  • what is the defined strategy for adding new partitions and subpartitions to the table,
  • what is the strategy for aging out old partitions and subpartitions.
  • are there any plans about grouping partitions into “age-related” tablespaces
  • are tablespaces going to be renamed to transport them to another database

It’s possible that the anomaly in this note only showed up because the OP was experimenting with options, and maybe the ultimate production code will be based on a strategy that means the anomaly will never appear. It’s possible that the anomaly is already in the production system but only became visible when someone decided to think about archiving out old partitions and the archival code started raising errors. Playing around with models to discover what happens is time well spent; and modelling the full production life cycle before going live is a critical activity.

Some answers

To find out what we’ve actually got from the original create table statement we can query the views (user/all/dba/cdb):

USER_TABLES
USER_PART_TABLES
USER_TAB_PARTITIONS
USER_TAB_SUBPARTITIONS

USER_LOBS
USER_PART_LOBS
USER_LOB_PARTITIONS
USER_LOB_SUBPARTITIONS

USER_SEGMENTS

For the chunk sizes we find: user_lobs.chunk = 8192, user_lob_partitions.chunk = 2 (blocks?) and user_lob_subpartitions.chunk = 16,384 (bytes?). We also have user_part_lobs.def_chunk = 1 (block?). So the explicit p1 partition chunk size has cascaded down to its subpartition.

Side note: who needs consistency! You might also want to remember that when you call dbms_space.space_usage() procedure that appeared in 11g to report the space used in a (secure file) LOB, the “blocks” input parameters actually return chunks and (for CLOBS) the “bytes” input parameters actually return character counts.

For the tablespaces we find: user_lobs.tablespace_name= TEST_8K_ASSM, user_lob_partitions.tablespace_name = _$deleted$36$0 and user_lob_subpartitions.tablespace_name = JF2. We also have user_part_lobs.def_tablespace_name is null.

What happens if I try to add a subpartition to the existing p1 partition without first applying the fix:

SQL> alter table test modify partition p1 add subpartition sp2 values(2);
alter table test modify partition p1 add subpartition sp2 values(2)
            *
ERROR at line 1:
ORA-00959: tablespace '_$deleted$36$0' does not exist

Now try again but apply the fix before adding the subpartition:

SQL> alter table test modify default attributes for partition p1 lob(data) (tablespace jf2);

Table altered.

SQL> alter table test modify partition p1 add subpartition sp2 values(2);

Table altered.

Checking the data dictionary for the effects of changing the default attribute we find that user_lob_partitions.tablespace_name is now JF2, which has then been taken on by the new subpartition.

What about adding a new partition:

SQL> alter table test add partition p2 values less than (to_date('01-Jan-2010'));

Table altered.

SQL> select partition_name, chunk , tablespace_name from user_lob_partitions order by partition_name;

PARTITION_NAME              CHUNK TABLESPACE_NAME
---------------------- ---------- ------------------------------
P1                              2 JF2
P2                              1

SQL> select lob_partition_name, subpartition_name, chunk , tablespace_name from user_lob_subpartitions order by 1,2;

LOB_PARTITION_NAME   SUBPARTITION_NAME           CHUNK TABLESPACE_NAME
-------------------- ---------------------- ---------- ------------------------------
SYS_LOB_P20609       SP                          16384 JF2
SYS_LOB_P20609       SP2                         16384 JF2
SYS_LOB_P20622       SYS_SUBP20621                8192 TEST_8K_ASSM

The new partition has no tablespace_name, but it has automatically generated a subpartition (values (default)), which has climbed the tree to the table-level to set the tablespace for the LOB, and that had defaulted to the tablespace of the table itself, which was the user default tablespace of TEST_8K_ASSM. Maybe we should have modified the “default attributes lob(data)” at some point so that the user_part_lobs.def_tablespace_name was not null.

I’ll finish with just one more comment – you’ve seen how messy things can get and how much detail could be overlooked when handling marginally complex composite partitioned table. Do you really think that interval partitioning and automatic list partitioning are really going to mean you don’t have to worry about partition maintenance code? Possibly. If you plan to have one huge tablespace for all the bits and never have to worry about backing up and restoring that tablespace you will be able to forget about all the housekeeping code, but realistically you’ll need to know how to check and change the metadata and rename, move or otherwise manipulate segments so make sure you know what’s going to happen so that you don’t have to work it out when everyone’s running around in panic mode.

May 31, 2022

Join View delete

Filed under: ANSI Standard,Bugs,CBO,Join Elimination,Oracle,Transformations — Jonathan Lewis @ 5:39 pm BST May 31,2022

I’ve written a couple of notes about deleting from join views and (ultimately) the fact that if you have muliple key-preserved tables in join view then the first key preserved table in the from clause is the one where the delete is applied. The topic came up on the Oracle developer forum again today, and closed with a funny little quirk.

Here’s a little data model followed by one query and two deletes that I’ve just run on Oracle 19.11.0.0 and 21.3.0.0. The query and first delete feature a join using “ANSI” syntax, the second delete translates the join into “traditional” Oracle syntax.

rem
rem     Script:         delete_join_2.sql       
rem     Dated:          May 2022
rem     Author:         J P Lewis
rem
rem     Last tested
rem             21.3.0.0
rem             19.11.0.0
rem

create table t_parent (
        id      number 
                        constraint t_parent_pk primary key,
        val     varchar2(10)
);

create table t_child (
        id      number 
                        constraint t_child_fk references t_parent(id),
        val     varchar2(10)
);

alter table t_child add constraint t_child_pk primary key(id);

insert into t_parent values(1,rpad('parent',10));
insert into t_child  values(1,rpad('child', 10));

commit;

execute dbms_stats.gather_table_stats(user,'t_parent',cascade=>true)
execute dbms_stats.gather_table_stats(user,'t_child', cascade=>true)

set autotrace on explain

select t_child.* from t_parent join t_child on t_child.id = t_parent.id;

delete
        (select t_child.* from t_parent join t_child on t_child.id = t_parent.id);

delete (select t_child.* from t_parent, t_child where t_child.id = t_parent.id);

set autotrace off

The table definitions and content (with a little modification) come from the forum posting, and I have to admit that the “one-to-one” referential integrity from parent to child is a little unusual (though occasionally a very effective strategy). I’ve also enabled autotrace to get the (probable) execution plans for the three statements, but in the script on my test machine I’ve also enabled the CBO (10053) trace to pick up a few extra details.

The query is there to show us that the join will return a single row, and the row content will be from the t_child table. What happens, though, when we try to use that query as an inline view to the delete.

It is an updateable join view – and both tables are, in fact, key-preserved thanks to the primary key constraints at both ends of the join. Here’s what we get (with the execution plans removed from the output):

        ID VAL
---------- ----------
         1 child

1 row selected.



delete
*
ERROR at line 1:
ORA-02292: integrity constraint (TEST_USER.T_CHILD_FK) violated - child record found


1 row deleted.

Line 5 tells us that we selected one row – as expected.

Line 11 tells us that we’ve raised an error because we’re trying to break the foreign key constraint on the t_child table – and that error shouldn’t be a surprise because the t_parent table appears first in the from clause, so that’s the table the optimizer wants to delete rows from.

However, line 15 tells us that if we change to “traditional ” Oracle syntax the delete works – so which of the two delete statements is displaying buggy behaviour?

Execution plans

There’s a clue in the execition plans (when we get them – autotrace doesn’t give a plan for the delete that raised the error). So here are the plans for the select and the “traditional” delete in that order:

----------------------------------------------------------------------------
| Id  | Operation         | Name    | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |         |     1 |    14 |     2   (0)| 00:00:01 |
|   1 |  TABLE ACCESS FULL| T_CHILD |     1 |    14 |     2   (0)| 00:00:01 |
-----------------------------------------------------------------------------



-------------------------------------------------------------------------------
| Id  | Operation        | Name       | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------
|   0 | DELETE STATEMENT |            |     1 |     3 |     1   (0)| 00:00:01 |
|   1 |  DELETE          | T_CHILD    |       |       |            |          |
|   2 |   INDEX FULL SCAN| T_CHILD_PK |     1 |     3 |     1   (0)| 00:00:01 |
-------------------------------------------------------------------------------

With assistance from the foreign key constraint, Join Elimination has occurred in both cases leaving t_child as the candidate for deletion. You’ll also note that the optimizer has recognised that while the view’s select list is “all columns” it only needs to optimize for “select rowid” to drive the delete, so produces an index-only access path to get the data it needs.

If we use explain plan, or check the 10053 trace file, for the ANSI delete that failed we’ll find the following:

-----------------------------------------------------------------------------------
| Id  | Operation           | Name        | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------
|   0 | DELETE STATEMENT    |             |     1 |     6 |     1   (0)| 00:00:01 |
|   1 |  DELETE             | T_PARENT    |       |       |            |          |
|   2 |   NESTED LOOPS      |             |     1 |     6 |     1   (0)| 00:00:01 |
|   3 |    INDEX FULL SCAN  | T_CHILD_PK  |     1 |     3 |     1   (0)| 00:00:01 |
|*  4 |    INDEX UNIQUE SCAN| T_PARENT_PK |     1 |     3 |     0   (0)| 00:00:01 |
-----------------------------------------------------------------------------------

Query Block Name / Object Alias (identified by operation id):
-------------------------------------------------------------
   1 - SEL$4396EC5C
   3 - SEL$4396EC5C / T_CHILD@SEL$1
   4 - SEL$4396EC5C / T_PARENT@SEL$1

Predicate Information (identified by operation id):
---------------------------------------------------
   4 - access("T_CHILD"."ID"="T_PARENT"."ID")

There’s no Join Elimination, and Oracle is following the rule that “the first key preserved table in the from clause should be the target of the deletes.

Where’s the bug?

This is a question that doesn’t have an obvious answer.

The traditional syntax gives you the result you probably wanted but it’s deleting from the second table in the from clause of the join view – so is that a documentation bug or a coding bug because if the documentation is correct the statement should have raised the error that the ANSI syntax gave.

Conversely, is there an optimizer bug (or limitation) because table elimination could have been used in the ANSI case but wasn’t, although you could say that the ANSI code is doing the right thing (following the documentation) by raising an error and that the traditional syntax is doing the wrong thing.

If we search the CBO (10053) traces of the select and the ANSI join, looking for sections where the optimizer considers Join Elimination, the results are helpful:

grep -nT "^JE" or19_ora_30861_*.trc >temp.txt

With a little cosmetic editing here’s the output from the trace of the select statement:

or19_ora_30861_select.trc:  78  JE - Join Elimination
or19_ora_30861_select.trc: 807  JE:[V2]   Considering Join Elimination on query block SEL$1 (#0)
or19_ora_30861_select.trc: 813  JE:[V2] Query block (0x7fd3a152f5c0) before join elimination:
or19_ora_30861_select.trc: 816  JE:[V2]: Try to eliminate T_CHILD by ref. join elim using NTSID
or19_ora_30861_select.trc: 817  JE:[V2]: Cannot eliminate T_CHILD by ref. join elim - no constr. based join pred(s)
or19_ora_30861_select.trc: 818  JE:[V2]: Try to eliminate T_PARENT by ref. join elim using PRIMARY(ID) <- FOREIGN(ID)
or19_ora_30861_select.trc: 819  JE:[V2]: Cannot eliminate T_PARENT by ref. join elim - predicate column refs.
or19_ora_30861_select.trc: 820  JE:[V2]: Try to eliminate T_PARENT by ref. join elim using NTSID
or19_ora_30861_select.trc: 821  JE:[V2]: Cannot eliminate T_PARENT by ref. join elim - no constr. based join pred(s)
or19_ora_30861_select.trc: 822  JE:[V2] Query block (0x7fd3a152f5c0) after join elimination:

or19_ora_30861_select.trc: 844  JE:[V2]   Considering Join Elimination on query block SEL$58A6D7F6 (#0)
or19_ora_30861_select.trc: 850  JE:[V2] Query block (0x7fd39c09ee60) before join elimination:
or19_ora_30861_select.trc: 853  JE:[V2]: Try to eliminate T_CHILD by ref. join elim using NTSID
or19_ora_30861_select.trc: 854  JE:[V2]: Cannot eliminate T_CHILD by ref. join elim - no constr. based join pred(s)
or19_ora_30861_select.trc: 855  JE:[V2]: Try to eliminate T_PARENT by ref. join elim using PRIMARY(ID) <- FOREIGN(ID)
or19_ora_30861_select.trc: 856  JE:[V2]: Can eliminate T_PARENT by ref. join elim using PRIMARY(ID) <- FOREIGN(ID)
or19_ora_30861_select.trc: 857  JE:[V2] Eliminate table: T_PARENT (T_PARENT)
or19_ora_30861_select.trc: 858  JE:[V2] Query block (0x7fd39c09ee60) after join elimination:

We can see that the optimizer has considered join elimination twice (lines 2 and 12), once for each of two query blocks (sel$1 – a query block embedded in the text – and sel$58a6d7f6, a query block name generated during some query transformation).

At lines 5 and 6 we can see that Oracle failed to take advantage of the referential integrity constraint to eliminate t_parent with the explanation “predicate column refs”; but at lines 16 and 17 we see that there’s a change of heart and the optimizer does take advantage of the referential integrity to eliminate t_parent. So let’s look at the text of the “query block before join elimination” in both cases and see what’s changed:

First pass
----------
SELECT 
        "T_PARENT"."ID"  "QCSJ_C000000000300000",
        "T_PARENT"."VAL" "QCSJ_C000000000300002",
        "T_CHILD"."ID"   "QCSJ_C000000000300001",
        "T_CHILD"."VAL"  "QCSJ_C000000000300003" 
FROM 
        "TEST_USER"."T_PARENT" "T_PARENT",
        "TEST_USER"."T_CHILD" "T_CHILD" 
WHERE 
        "T_CHILD"."ID"="T_PARENT"."ID"


Second Pass
-----------
SELECT 
        "T_CHILD"."ID" "ID",
        "T_CHILD"."VAL" "VAL" 
FROM 
        "TEST_USER"."T_PARENT" "T_PARENT",
        "TEST_USER"."T_CHILD" "T_CHILD" 
WHERE 
        "T_CHILD"."ID"="T_PARENT"."ID"

The first pass seems to have expanded the two tables in the intial JOIN, ignoring the limited select list, so join elimination is blocked by the presence of the (non-PK) column t_parent.val (aliased as qcsj_c000000000300002, but when we get to the 2nd pass the select list has been limited to the minimum necessary set of columns that would be needed by a pure select statement and join elimination is possible.

The clue to how/why this has happened is the in the few lines between the two passes; the trace file also reports: “CVM: Merging SPJ view SEL$1 (#0) into SEL$2 (#0)” and when we look for the point where the optimizer registered query block sel$2 it turns out to have one item in its from clause, an object called: from$_subquery$_003 (which tells us where the 3 came from in all those column aliases QCSJ_C0000000003nnnnn. In effect it looks like the optimizer started by rewriting the ANSI select in Oracle syntax as (with a little poetic licence for clarity):

select  child_id, child_val, parent_id
from    (
         select  t_child.*, t_parent.*
         from    t_child, t_parent
         where   t_child.id = t_parent.id
        ) from$_subquery$_003
/ 

So the first pass was the optimizer trying to do join elimination on the inner query block and the second pass happened after the inner block was merged with the outer block and the resulting query block (called sel$58a6d7f6) allowed the join elimination to take place

So what happened with the delete statement. Here’s the result of the call to grep:

or19_ora_30861_delete.trc:  90  JE - Join Elimination
or19_ora_30861_delete.trc: 837  JE:[V2]   Considering Join Elimination on query block SEL$1 (#0)
or19_ora_30861_delete.trc: 843  JE:[V2] Query block (0x7fd3a152f758) before join elimination:
or19_ora_30861_delete.trc: 846  JE:[V2]: Try to eliminate T_CHILD by ref. join elim using NTSID
or19_ora_30861_delete.trc: 847  JE:[V2]: Cannot eliminate T_CHILD by ref. join elim - no constr. based join pred(s)
or19_ora_30861_delete.trc: 848  JE:[V2]: Try to eliminate T_PARENT by ref. join elim using PRIMARY(ID) <- FOREIGN(ID)
or19_ora_30861_delete.trc: 849  JE:[V2]: Cannot eliminate T_PARENT by ref. join elim - predicate column refs.
or19_ora_30861_delete.trc: 850  JE:[V2]: Try to eliminate T_PARENT by ref. join elim using NTSID
or19_ora_30861_delete.trc: 851  JE:[V2]: Cannot eliminate T_PARENT by ref. join elim - no constr. based join pred(s)
or19_ora_30861_delete.trc: 852  JE:[V2] Query block (0x7fd3a152f758) after join elimination:

That’s the lot – there is no attempt to do join elimination on a querye block called sel$58a6d7f6, though a few lines after this extract the trace file reports:

CVM:   Merging SPJ view SEL$1 (#0) into SEL$2 (#0)
Registered qb: SEL$58A6D7F6 0x9c0a3038 (VIEW MERGE SEL$2; SEL$1; SEL$2)

That’s exactly the same view merging that was present in the select – but the optimizer doesn’t attempt join elimination before moving on to report, a couple of lines later:

CVM:   Merging SPJ view SEL$58A6D7F6 (#0) into DEL$1 (#0)
Registered qb: SEL$4396EC5C 0x9c09dde0 (VIEW MERGE DEL$1; SEL$58A6D7F6; DEL$1)

It looks like this is a case of the optimizer missing a cycle in its transform/optimze loopback. The delete went through exactly the same process as the select, but failed to optimize the intermediate query block (which happened to be called from$_subquery$_004 in this case, being one layer deeper thanks to the extra level introduced by the delete).

Workaround

I tried two or three things to get the ANSI-style code to work without changing the table order in the from clause – extra layers of subqueries, explicit aliases, but Oracle kept coming back to a point where it had the full projection of all the columns in the two tables. (This reminded me of an old bug in ANSI select expansion that was fixed in 12cR2. It made me wonder if this was a code path where the same bug had been overlooked – it also made me wonder if there would be any problems if the two tables in the join totalled more than 1,000 columns – but that’s a test for another day.)

I did eventually find a fix (that didn’t involved switching the table order in the inline view):

delete
        (select t_child.id from t_parent join t_child on t_child.id = t_parent.id);

Note that I now have only the primary key of t_child in the select

The section in the CBO trace file for Join Elimination looked like this:

JE:[V2] Query block (0x7f93996231f8) before join elimination:
SQL:******* UNPARSED QUERY IS *******
SELECT "T_PARENT"."ID" "QCSJ_C000000000400000","T_CHILD"."ID" "QCSJ_C000000000400001" FROM "TEST_USER"."T_PARENT" "T_PARENT","TEST_USER"."T_CHILD" "T_CHILD" WHERE "T_CHILD"."ID"="T_PARENT"."ID"
JE:[V2]: Try to eliminate T_CHILD by ref. join elim using NTSID
JE:[V2]: Cannot eliminate T_CHILD by ref. join elim - no constr. based join pred(s)
JE:[V2]: Try to eliminate T_PARENT by ref. join elim using PRIMARY(ID) <- FOREIGN(ID)
JE:[V2]: Can eliminate T_PARENT by ref. join elim using PRIMARY(ID) <- FOREIGN(ID)
JE:[V2] Eliminate table: T_PARENT (T_PARENT)
JE:[V2] Replaced column: T_PARENT.ID with column: T_CHILD.ID
JE:[V2] Query block (0x7f93996231f8) after join elimination:
SQL:******* UNPARSED QUERY IS *******
SELECT "T_CHILD"."ID" "QCSJ_C000000000400000","T_CHILD"."ID" "QCSJ_C000000000400001" FROM "TEST_USER"."T_CHILD" "T_CHILD"
Registered qb: SEL$E703A888 0x996231f8 (JOIN REMOVED FROM QUERY BLOCK SEL$1; SEL$1; "T_PARENT"@"SEL$1")

You may decide that this is an obvious workaround, but it’s not particularly helpful. In the general case I would expect to see a stored view that joined the two tables, with the expectation that this type of delete was just one of many operations the view was used for. Rewriting a view to use the increasingly common ANSI syntax shouldn’t result in some statements crashing “for no apparent reason”.

The last laugh

Given that the select and delete both included a merge of sel$1 into sel$2 to produce a query block called sel$58a6d7f6 I wondered what would happen if I blocked the merge:

SQL>  delete  /*+  no_merge (@sel$1) */
  2          (select t_child.id from t_parent join t_child on t_child.id = t_parent.id);
	(select t_child.id from t_parent join t_child on t_child.id = t_parent.id)
        *
ERROR at line 2:
ORA-01752: cannot delete from view without exactly one key-preserved table

Isn’t that the error message that should have been coming out the entire time – according to the orginal documentation for updateable join views?

May 5, 2022

Lag/Lead slow

Filed under: Bugs,Oracle,Performance,Problem Solving,Troubleshooting,Tuning — Jonathan Lewis @ 10:05 am BST May 5,2022

This note is about a surprising performance difference between the lead() and lag() analytic functions (which31 turns out to be due to the behaviour of the nth_value() function) when the option to “ignore nulls” is included in their use (jump to conclusion). The detail I’ll be writing about was highlighted in a thread on the Oracle developer forum about a requirement to add a number of analytic columns to a dataset of 156 million rows using a statement of the following shape:

create table tb_target_ignore
as
select
        pat_id,
        visit_date_dt,
        ed_ucc_dt,
        lag (ed_ucc_dt ignore nulls, 1) over (partition by pat_id order by visit_date_dt) as prev_ed_ucc_dt,
        lead(ed_ucc_dt ignore nulls, 1) over (partition by pat_id order by visit_date_dt) as next_ed_ucc_dt,
        row_number() over (partition by pat_id order by visit_date_dt) as row_num
from
        tb_source
;

You’ll notice that I’ve introduced a row_number(), and both a lead() and a lag() of a column called ed_ucc_dt. All three analytic columns use the same partitioning and ordering, though, so Oracle will only be doing one “window sort” in the execution plan. Part of the performance problem, of course, was that with 156M rows of a couple of dozen existing columns and adding a dozen new columns, the workload due to sorting was bound to be very large, so there were various suggestions of how to minimise that part of the workload.

However Solomon Yakobson pointed out that the code was using the “ignore nulls” option and there was a bug in 11g that made lead() and lag() very slow when this option was used. He subsequently reported that this defect was still present in 19c, restricted it to just the lag() function, and cited a MOS document ID referencing the problem: LAG Function is slow when using ignore nulls (Doc ID 2811596.1). The implication of the MOS note is that we shouldn’t expect this to change.

A follow-up posting by User_H3J7U gave us a reason for the slowness of the lag() function by running a sample query through dbms_utility.expand_sql(). Oracle rewrites the query to use variants of the nth_value() function when you use “ignore nulls”, but rewrites it to use variants of first_value() when you aren’t using the “ignore nulls” option. This isn’t a complete explanation of why lag() should be slow while lead() is not – but it’s a significant pointer towards a possible implementation issue and is a good clue about working around the problem. So let’s build a model of the situation.

The basic model

rem
rem     Script:         lag_ignore_nulls.sql
rem     Author:         Jonathan Lewis / Sam P
rem     Dated:          May 2022
rem     Purpose:     
rem
rem     Last tested
rem             19.11.0.0
rem

create table tb_source (
        pat_id,
        visit_date_dt,
        ed_ucc_dt
)
as
with generator as (
        select rownum id
        from    dual
        connect by
                level <= 1e4    --> comment to avoid wordpress format issue
)
select
        g1.id,
        to_date('01-Apr-2022') + dbms_random.value(0,100),
        to_date('01-Apr-2022') + dbms_random.value(5,105)
--      to_date(null)
from
        generator g1,
        generator g2
where
        g2.id <= 20     --> comment to avoid wordpress format issue
order by
        dbms_random.value
/


spool lag_ignore_nulls.lst

set serveroutput off
set timing on

prompt  ======================
prompt  Without "ignore nulls"
prompt  (My time 0.61 seconds)
prompt  ======================

create table tb_target_no_ignore
as
select
        pat_id,
        visit_date_dt,
        ed_ucc_dt,
        lag (ed_ucc_dt, 1) over (partition by pat_id order by visit_date_dt) as prev_ed_ucc_dt,
        lead(ed_ucc_dt, 1) over (partition by pat_id order by visit_date_dt) as next_ed_ucc_dt,
        row_number() over (partition by pat_id order by visit_date_dt) as row_num
from
        tb_source
;

prompt  ======================
prompt  With "ignore nulls"
prompt  (My time 0.88 seconds)
prompt  ======================

create table tb_target_ignore
as
select
        pat_id,
        visit_date_dt,
        ed_ucc_dt,
        lag (ed_ucc_dt ignore nulls, 1) over (partition by pat_id order by visit_date_dt) as prev_ed_ucc_dt,
        lead(ed_ucc_dt ignore nulls, 1) over (partition by pat_id order by visit_date_dt) as next_ed_ucc_dt,
        row_number() over (partition by pat_id order by visit_date_dt) as row_num
from
        tb_source
;

I’ve created a source table with 200,000 rows, consisting of 10,000 pat_id values, and 20 rows per pat_id. The 20 rows for a pat_id (probably) each have a different visit_date_dt and a different ed_ucc_dt.

After creating the data set I’ve created two more tables using the lead() and lag() functions to generate a previous (lag) and next (lead) ed_ucc_dt for each row, partitioning by pat_id, ordering by visit_date_dt. One statement includes the “ignore nulls” option the other doesn’t and, as you can see, the time to create the “no ignore” table was 0.61 seconds while the time to create the “ignore null” table was 0.88 seconds.

The variation isn’t dramatic – but this is just 200,000 rows, in memory, with only a few columns and only two columns added through lead and lag.

After the baseline test I tweaked the statement that created the table with the “ignore nulls” option to get three more times.

  • With neither lead() nor lag() the time was 0.29 seconds
  • With just the lead() column the time was 0.46 seconds – an increase of 0.17 seconds
  • With just the lag() column the time was 0.71 seconds – an increase of 0.42 seconds

You might note that 0.29 + 0.17 + 0.42 = 0.88 (the time I got for adding both columns) – it’s a little lucky that it looks like a perfect match, but even matching within a couple of hundredths of a second would be have been a nice detail. It certainly seems that lag() – with my test data – consumes more resources than lead() for a pair of operationd that look as if they should produce the same workloads.

Internal Rewrite

The next step was to check what the internal rewrite of the code looked like, so I passed the select part of the statements (the procedure won’t accepts “create as select”) through dbms_utility.expand_sql() and reformatted the results. Here are the two rewritten statements – first without “ignore nulls”:

select
        a1.pat_id pat_id,
        a1.visit_date_dt visit_date_dt,
        a1.ed_ucc_dt ed_ucc_dt,
        decode(
                count(*) over (
                        partition by a1.pat_id order by a1.visit_date_dt
                        rows between 1 preceding and 1 preceding
                        ),a
                 1,     first_value(a1.ed_ucc_dt) over (
                                partition by a1.pat_id order by a1.visit_date_dt
                                rows between 1 preceding and 1 preceding
                        ),
                        null
        ) prev_ed_ucc_dt,
        decode(
                count(*) over (
                        partition by a1.pat_id order by a1.visit_date_dt
                        rows between 1 following and 1 following
                        ),a
                 1,     first_value(a1.ed_ucc_dt) over (
                                partition by a1.pat_id order by a1.visit_date_dt
                                rows between 1 following and 1 following
                        ),
                        null
        ) next_ed_ucc_dt,
        row_number() over (partition by a1.pat_id order by a1.visit_date_dt) row_num
from
        test_user.tb_source a1


The code looks a little long and messy, but that’s mainly because just about everything it does happens twice. The lag() function (prev column) turns into a first_value() function that looks at the row preceding the current row in the partition (rows between 1 preceding and 1 preceding). However it first has to count over the same clause to see if a row exists, and then either report its value or report a null – hence the structure decode(count(), 1, first_value(), null)

Note: the full lag() function call is: “lag(expression, offset, default)” where the offset (number of rows to lag) defaults to 1 and the default is the value you want reported when there is no matching row, and defaults to null.

The call to lead() basically does the same thing, but uses (rows between 1 following and 1 following) to access the next row in the partition.

On the other hand this is the SQL that Oracle generates when we include the “ignore nulls” clause (which means Oracle can’t restrict the row range to just one preceding or following row):

select
        a1.pat_id pat_id,
        a1.visit_date_dt visit_date_dt,
        a1.ed_ucc_dt ed_ucc_dt,
        nvl(
                nth_value(a1.ed_ucc_dt, 1) from last ignore nulls over (
                        partition by a1.pat_id order by a1.visit_date_dt
                        rows between unbounded preceding and 1 preceding
                ),
                 null
        ) prev_ed_ucc_dt,
        nvl(
                nth_value(a1.ed_ucc_dt, 1)           ignore nulls over (
                        partition by a1.pat_id order by a1.visit_date_dt
                        rows between 1 following and unbounded following
                ),
                null
        ) next_ed_ucc_dt,
        row_number() over (partition by a1.pat_id order by a1.visit_date_dt) row_num
from
        test_user.tb_source a1

Both lag() and lead() turn into nth_value() with a second parameter of 1 (i.e. nth == 1st … which makes you wonder why Oracle isn’t using first_value()), and we can also see the “ignore nulls” still being used.

The lag() call now uses the range (rows between unbounded preceding and 1 preceding) i.e. everything from the start of partition to the previous row, while the lead() call uses the range (rows between 1 following and unbounded following) i.e. from the next row to the end of partition.

The other important detail to note is that the translation of the lag() call also includes the clause “from last” – in other words we want the first row when reading the partition in reverse order, and that might have something to do with the extra time it takes to operate the (translated) lag() function.

Workaround

Oracle is using a generic nth_value() to translate a generic lead()/lag(), but we’re in the special case where we know n = 1, which means we (and Oracle) could use first_value()/last_value(). It’s perfectly reasonable for Oracle’s internal code to avoid special cases if it makes no difference to performance, of course, but maybe in this case we could imitate Oracle’s rewrite to get some benefit.

  • Step 1 – change nth_value() to the appropriate first/last.
  • Step 2 – get rid of the “from last” which won’t be needed with last_value()
  • Step 3 – move the “ignore nulls” to the spot that Oracle wants to see it with first/last

Here’s the resulting SQL – I’ve left the nvl(count, expression, null) in place, but if you wanted a null as the default return value for the original lead()/lag() calls you could simplify the code a little further.

create table tb_target 
as
select
        a1.pat_id pat_id,
        a1.visit_date_dt visit_date_dt,
        a1.ed_ucc_dt ed_ucc_dt,
        nvl(
                last_value(a1.ed_ucc_dt ignore nulls) over (
                        partition by a1.pat_id order by a1.visit_date_dt
                        rows between unbounded preceding and 1 preceding
                ),
                 null
        ) prev_ed_ucc_dt,
        nvl(
                first_value(a1.ed_ucc_dt ignore nulls) over (
                        partition by a1.pat_id order by a1.visit_date_dt
                        rows between 1 following and unbounded following
                ),
                null
        ) next_ed_ucc_dt,
        row_number() over (partition by a1.pat_id order by a1.visit_date_dt) row_num
from
        test_user.tb_source a1
/

  • Run-time: 0.61 seconds.
  • Run-time with just last_value() / preceding: 0.47 seconds
  • Run time with just first_value() / following: 0.43 seconds

There still seems to be a little discrepancy between accessing to the preceding data compared to accessing the following data but there’s a much better balance than before.

One more edit – taking out the nvl() construct because the original lead()/lag() calls didn’t have a non-null default supplied:

create table tb_target 
as
select
        a1.pat_id pat_id,
        a1.visit_date_dt visit_date_dt,
        a1.ed_ucc_dt ed_ucc_dt,
--
        last_value(a1.ed_ucc_dt ignore nulls) over (
                partition by a1.pat_id order by a1.visit_date_dt
                rows between unbounded preceding and 1 preceding
        ) prev_ed_ucc_dt,
--
        first_value(a1.ed_ucc_dt ignore nulls) over (
                partition by a1.pat_id order by a1.visit_date_dt
                rows between 1 following and unbounded following
        ) next_ed_ucc_dt,
--
        row_number() over (partition by a1.pat_id order by a1.visit_date_dt) row_num
from
        test_user.tb_source a1
/

With this simplification the time dropped by a further couple of hundredths of a second hovering between 0.57 and 0.58 seconds.

There was one last detail about the test code that should be mentioned – do the changes in code still produce the same results? As a quick and dirty check I ran the following query after each variant of creating the tb_target table (tb_target_ignore is the table created using the original lead()/lag() code with “ignore nulls”):

select  *
from   (
        select * from tb_target minus select * from tb_target_ignore
        union all
        select * from tb_target_ignore minus select * from tb_target
        )
/

The result was always zero rows.

Hypothesis

I raised the idea that the difference in timing for the lead() and lag() functions might have something to do with the volume of data that Oracle could be processing to find the one row it needed.

My thinking was that for the lead() rewrite – the call to nth_value(ed_ucc_dt,1) – would simply be looking at the next row in the partition (if it existed) because my data has no nulls that neeed to be ignored, while the rewrite of the lag() function with its “from last” requirement could be making Oracle re-read the entire preceding section of the partition before starting to process it backwards.

As a possible check to see if this was a viable hypothesis I ran one more test – visible in the initial declaration of tb_source – I created the data with ed_ucc_dt set to null in every single row, so that Oracle would be forced to process from the current position to whichever end of the partition was relevant regardless of whether it was calling lead() or lag().

With this change in place the timing for the lead() only and lag() only statements were nearly identical – which is a weak support for the hypothesis.

And once I’d done that test the next obvious test was to see what happened if I increased size of each partition (using non-null values for ed_ucc_dt) to see if larger partitions would increase the difference between the forward and backward tests. To do this I changed the script to create the tb_source table to produce 5,000 pat_id value with 40 rows per pat_id by changing the where clause to:

where
        g2.id <= 40     --> comment to avoid wordpress format issue
and     g1.id <= 5e3    --> comment to avoid wordpress format issue

With this change in place the timings for the original form of the lead()/lag() statement were:

  • With both lead() and lag() in place the time was 1.05 seconds
  • With neither lead() nor lag() the time was 0.25 seconds
  • With just the lead() column the time was 0.41 seconds – an increase of 0.16 seconds
  • With just the lag() column the time was 0.98 seconds – an increase of 0.73 seconds

So the lag() time (ballpark figures) nearly doubles as the partition size doubles but the lead() time stays pretty much the same.

The results of these two tests do tend to suggest that the generic nth_value() implementation can do some short-circuiting when working “forwards”, using a mechanism that isn’t available when the “from last” clause requires it to work “backwards”.

Writing the previous paragraph prompted me to do one last test – it wouldn’t produce the same results, of course, but I ought to check the performance when I moved the “from last” clause out of the “prev”/lag() column expression into the “next”/lead() column expression in Oracle’s original translation to confirm that the problem was associated with the “from last” and not with the choice of “preceding” or “following” in the row range section of the over() clause. (It was the “from last” that made the difference.)

tl;dr

If you’re using the lag() or lead() functions with “ignore nulls” on a very large dataset you may find that you can rewrite the code with first_value() or last_value() calls that use less CPU. The benefit is probably only significant on fairly large data sets, and may be particularly noticeable for cases where the same over() clause is used many times and the partition sizes are more than a couple of dozen rows each.

The potential for excess CPU usage comes from the effect of a generic internal rewrite using the nth_value() function with the “from last” clause even for the special case where your lead()/lag() have an offset (i.e. n) of 1 which would allow for the use of first_value()/last_value().

To get the correct rewrite you can use dbms_utility.expand_sql() to generate a suitable statement from which you can extract and edit the relevant pieces of text.

Footnote

A further point raised by Solomon Jakobson was that in the generic case (where you can’t use first_value() / last_value() to rewrite the SQL because you want to use an offset greater than one) it’s worth noting that the following two expressions are effectively identical:

        nvl(
                nth_value(a1.ed_ucc_dt, 2) from last ignore nulls over (
                        partition by a1.pat_id order by a1.visit_date_dt
                        rows between unbounded preceding and 1 preceding
                ),
                 null
        ) prev_ed_ucc_dt

        nvl(
                nth_value(a1.ed_ucc_dt, 2)           ignore nulls over (
                        partition by a1.pat_id order by a1.visit_date_dt desc
                        rows between 1 following and unbounded following
                ),
                 null
        ) prev_ed_ucc_dt
/

Note particularly that to avoid the issue of scanning the data “from last”, I’ve changed the ordering (in line 11) to descending and I’ve changed the preceding of line 4 to following in line 12 while reversing the positions of 1 and unbounded. (The same strategy can be applied for the lag() function in the original code).

In general it’s likely that lead() is will be more efficient than lag() so this change could make a big difference in performance. It’s worth noting, however, that if you’ve been using both lead() and lag() as my example did then your execution plan will (truthfully) show two “window sort” operations and the second (i.e. lower number in the plan) window sort will also be sorting the columns added by the first window sort. Depending on the nature of your data the additional sort might increase the workload by more than the benefit you get from eliminating a lag().

In an example created by Solomon Jakobson the effect of using lag( ,5) was catastrophic and the change that introduced the extra sort to get a lead( ,5) made a huge difference (40 seconds down to 0.15 seconds – and I haven’t made a mistake in the decimal point there); in my original test case the improvement I got from applying the same strategy was small (0.88 seconds down to 0.86 seconds). The critical difference between the test cases was that one had a single partition of 20,000 rows, the other had (10,000) partitions of only 20 rows.

(Footnote to the footnote – it did cross my mind that if you had a performance catastrophe with the simple lag(,n) and with the extra sort from reversing the order to use a lead(,n) would Oracle do something clever if you found a way to rewrite the query with a cascade of inline-views that all used last_value() – but I decided I wasn’t going to look at that option until someone paid me to. More seriously I did wonder about the option for using match_recognize to bypass the lag() problem, and may get around to thinking about that more carefully at some point.)

May 4, 2022

redefinition error

Filed under: Bugs,Infrastructure,Oracle — Jonathan Lewis @ 12:22 pm BST May 4,2022

Here’s a note about a data error generated by using (possibly mis-using) the dbms_redefinition package (jump to conclusion). The original code to demonstrate the problem comes from a note on the Oracle Developer forum, and was brought to my attention by a tweet from Daniel Stein.

The critical feature of the demo is that we can end up with a column containing nulls despite being declared NOT NULL (and I don’t mean by that a simple “is not null” check constraint – which is not quite the same as a NOT NULL declaration).

Here’s the first part of a script, mostly copied from the forum post, that I’ve I’ve been running on 19.11.0.0:

rem
rem     Script:         redef_bug.sql
rem     Author:         Jonathan Lewis / Sebastian (User_6AT2M)
rem     Dated:          May 2022
rem
rem     Last tested 
rem             19.11.0.0
rem

create table test_nulls (
        id      number (8)      constraint tn_pk primary key,
        text    varchar2 (25)   constraint tn_nn_tx not null
)
/

create unique index i_test_nulls on test_nulls(text);

insert into test_nulls (id, text) 
select rownum, 'some text '||rownum from dual connect by level <= 50;

-- create an empty copy, without the constraints or indexes

create table test_nulls_interim (
        id      number (8),
        text    varchar2 (25)
);

begin
        dbms_redefinition.start_redef_table(
                uname           => user,
                orig_table      => 'test_nulls',
                int_table       => 'test_nulls_interim',
                col_mapping     => 'id id, substr(text, id, 2) text'
        );
end;
/

The script creates a table with a primary key declared on an id column, a not null declaration on a text column and a unique index on the text column, then populates the table with 50 rows that have the form (N, “some text N”) where N is a number between 1 and 50.

After creating an empty copy of the table with no constraints or indexes I start an online redefinition – modifying the content of the text column as part of the redefinition. If you check the col_mapping carefully you will realise that when id reaches 13 the result from the substr() function becomes null.

This is where the trouble starts. If I now call dbms_redefition.copy_table_dependents() to add the original constraints and indexes to the interim table what’s Oracle going to do about the not null declaration on the text column?

declare
        error_ct pls_integer;
begin
        dbms_redefinition.copy_table_dependents(
                uname           => user,
                orig_table      => 'test_nulls',
                int_table       => 'test_nulls_interim',
                num_errors      => error_ct,
                copy_indexes    => 1,
                copy_constraints=> true,
                ignore_errors   => false,
--
                copy_triggers   => false,
                copy_privileges => false,
                copy_statistics => false,
                copy_mvlog      => false
        );  

        dbms_output.put_line('error count: '||error_ct);
end;
/

begin
        dbms_redefinition.finish_redef_table(user, 'test_nulls', 'test_nulls_interim');
end;
/

drop table TEST_NULLS_INTERIM purge;

I’ve exposed all the parameters to the copy_table_dependents() procedure call in my code, and you can see that I’ve chosen to copy only the constraints and indexes, and I don’t want to ignore errors.

The PL/SQL anonymous block terminates successfully, doesn’t report any errors, and outputs an error count of zero. So let’s see what we’ve got as the final result of the redefinition.

column search_condition_vc format a20
break on table_name skip 1 on index_name
set echo on

select
         table_name, index_name, column_name 
from 
        user_ind_columns 
order by 
        1,2,column_position
/

desc test_nulls

select count(*) from test_nulls;
select /*+ full(test_nulls) */ count(*) from test_nulls;

select
        constraint_name,
        constraint_type,
        search_condition_vc,
        status,
        deferrable,
        deferred,
        validated
from
        user_constraints
where
        table_name = 'TEST_NULLS'
/

set echo off

Here are the results – my login.sql has a lot of column format commands so your results may look a lot messier if you run this bit of code. Here’s the echoed output:

SQL> select
  2           table_name, index_name, column_name
  3  from
  4          user_ind_columns
  5  order by
  6          1,2,column_position
  7  /

TABLE_NAME                INDEX_NAME           COLUMN_NAME
------------------------- -------------------- --------------------
TEST_NULLS                I_TEST_NULLS         TEXT
                          TN_PK                ID


2 rows selected.

SQL> 
SQL> desc test_nulls
 Name                                                                     Null?    Type
 ------------------------------------------------------------------------ -------- -------------------------------------------------
 ID                                                                       NOT NULL NUMBER(8)
 TEXT                                                                     NOT NULL VARCHAR2(25)

SQL> 
SQL> select count(*) from test_nulls;

  COUNT(*)
----------
        12

1 row selected.

SQL> select /*+ full(test_nulls) */ count(*) from test_nulls;

  COUNT(*)
----------
        50

1 row selected.

SQL> 
SQL> select
  2          constraint_name,
  3          constraint_type,
  4          search_condition_vc,
  5          status,
  6          deferrable,
  7          deferred,
  8          validated
  9  from
 10          user_constraints
 11  where
 12          table_name = 'TEST_NULLS'
 13  /

CONSTRAINT_NAME      C SEARCH_CONDITION_VC  STATUS   DEFERRABLE     DEFERRED  VALIDATED
-------------------- - -------------------- -------- -------------- --------- -------------
TN_NN_TX             C "TEXT" IS NOT NULL   ENABLED  NOT DEFERRABLE IMMEDIATE VALIDATED
TN_PK                P                      ENABLED  NOT DEFERRABLE IMMEDIATE VALIDATED

Both indexes have arrived; both columns have NOT NULL declarations.

When you count the number of rows in the table it’s only 12 – unless you force a full tablescan in which case it’s 50. This happens because the NOT NULL declaration of column text allows the optimizer to use an index-only plan on the index i_test_nulls and there are 38 nulls in the table that don’t appear in the index.

The check on user_constraints shows that both the primary key constraint and the “is not null” check constraint are enabled and validated.

Conclusion

There is code in the copy_table_dependents() procedure that is defective, deficient and dangerous. Clearly I’ve done something that I probably shouldn’t have done (and, historically, I wouldn’t have done) but the code should still have protected me from an error that leaves the database in a state that is not internally consistent and can produce incorrect results.

In my example the copy_table_dependents() procedure has allowed me to add a NOT NULL declaration to a column after the col_mapping parameter of start_table_redef() has resulted in rows that have nulls in that column. There may be variants on this theme that allow other types of error to appear.

Notes

If you want to repeat this test and try some variations on the theme you will need a few non-trivial privileges (which can be granted through a role). For the basic dbms_redefinition package you will need: execute on dbms_redefinition, select any table, create any table, alter any table, lock any table, drop any table; and to execute the copy_table_dependents() procedure you will also need create any index, create any trigger.

The copy_table_dependents() procedure appeared (I think) in the 10g time-line; prior to that you had to sort out all the constraints and dependencies “by hand” – which you would probably manage online through the sync_interim_table() procedure (viz: “create an index, sync interim table, create next index, sync etc.) before calling the finish_redef_table() procedure. That being the case my immediate response to this issue was that if you don’t want the not null declaration on text then you can have to exclude the copy_constraints option when copying the table dependants; if you did want the not null declaration then you should have included it in the initial definition of the interim table because the start_redef_table() call would then have failed, raising:

ORA-12008: error in materialized view or zonemap refresh path
ORA-01400: cannot insert NULL into ("{schema}"."TEST_NULLS_INTERIM"."TEXT")

March 28, 2022

Drop column bug

Filed under: Bugs,Infrastructure,Oracle — Jonathan Lewis @ 11:29 am BST Mar 28,2022

In the previous note about a problem dropping virtual columns the “guilty party” that made it impossible to drop any columns was based on a complex data type owned by the MDSYS (Spatial) schema. This note demonstrates the same problem with a very simple example created from scratch in an ordinary user schema.

rem
rem     Script:         object_virtual_col.sql
rem     Author:         Jonathan Lewis
rem     Dated:          Mar 2022
rem
rem     Last tested 
rem             19.11.0.0
rem

create type point_type as object(x_coord number, y_coord number);
/

create or replace function my_point(inpoint in point_type)
return point_type
deterministic as 
begin
        return inpoint;
end;
/

show errors

create table t1 (id, n1, p1, n2, v1, padding)
as
with generator as (
        select 
                rownum id
        from dual 
        connect by 
                level <= 1e4    -- > comment to avoid WordPress format issue
)
select
        rownum                          id,
        rownum                          n1,
        point_type(rownum, rownum)      p1,
        rownum                          n2,
        lpad(rownum,10,'0')             v1,
        lpad('x',100,'x')               padding
from
        generator       v1
where
        rownum <= 100   -- > comment to avoid WordPress format issue
;

begin
        dbms_stats.gather_table_stats(
                ownname     => null,
                tabname     => 'T1',
                method_opt  => 'for all columns size 1'
        );
end;
/

alter table t1 add constraint t1_pk primary key(id);

So I’ve declared a type “point” which is an object with two attributes of type number, and I’ve created a function that takes a point as its input parameter and returns a point. Then I’ve created a table which includes a column of type point.

Let’s start with a little reminder of what a pain it is to use even simple object types correctly. What’s going to happen with the following three SQL statements:

select    p1.x_coord from t1    where rownum <= 4;
select t1.p1.x_coord from t1    where rownum <= 4;
select t1.p1.x_coord from t1 t1 where rownum <= 4;

The first two will fail – the first one shouldn’t be too surprising, the second does seem a little unreasonable:

ORA-00904: "P1"."X_COORD": invalid identifier
ORA-00904: "T1"."P1"."X_COORD": invalid identifier

So let’s try adding some virtual columns to pick out the X value:

alter table t1 add x_val generated always as (p1.x_coord) virtual;
alter table t1 add c_val generated always as (cast(p1.x_coord as number)) virtual;

The first call will fail (ORA-54016: Invalid column expression was specified) but the second will succeed. What if we try to hide out point column behind a call to our function:

alter table t1 add fp_val generated always as (my_point(p1)) virtual;
alter table t1 add fx_val generated always as (my_point(p1).x_coord) virtual;

Again the first call will fail (ORA-54004: resultant data type of virtual column is not supported) but that’s a documented restriction – a user-defined type may not be used as the type of a virtual column and I wasn’t at that point trying to return just the one attribute.

The second call, however, will succeed. So I can’t create a virtual column p1.x_coord, but I can create a virtual column my_point(p1).x_coord.

We now have two virtual columns that should return the required values, so that’s do a quick check with a couple of simple queries – cut and paste:

SQL> select fx_val "my_point(p1).x_coord" from t1 where rownum <= 4;

my_point(p1).x_coord
--------------------
                   1
                   2
                   3
                   4

4 rows selected.

SQL> select c_val  "cast(p1.x_coord as -" from t1 where rownum <= 4;

cast(p1.x_coord as -
--------------------
                   1
                   2
                   3
                   4

4 rows selected.

Finally we’ll finish off by demonstrating that I’ve just created a problem that no-one will notice until long after I’ve left the site (maybe):

SQL> alter table t1 drop column n1;
alter table t1 drop column n1
 *
ERROR at line 1:
ORA-00904: "TEST_USER"."MY_POINT": invalid identifier

After creating (and using successfully) the virtual column that calls my function, I can’t drop any of the columns in the table.

Summary

The manuals have a stated restriction for virtual columns that they cannot be a user-defined type, and this restriction seems to carry forward to an attribute of a user-defined type unless the attribute has been cast() to a base type.

The same restriction seems to apply to functions returning a user-defined type, but not to the individual attributes of the returned value – it is not necessary to cast() them to a base type. However, if you (accidentally) take advantage of this relaxation of the restriction you will be unable to drop any columns from the table in the future.

March 24, 2022

Drop column bug

Filed under: Bugs,Infrastructure,Oracle,Troubleshooting — Jonathan Lewis @ 11:07 am GMT Mar 24,2022

Here’s a problem that appeared recently on the Orace Developer forum showing one of the classic symptons of new features namely that “mix and match” often runs into problems. This example has been a long time coming so “new” is something of a misnomer, but the alternative source of unlucky collisions is in the “rare” features – in this case Spatial. (It’s possible that the problem is not restricted to just Spatial but it probably does require a somewhat exotic data type.)

The problem appeared in a thread on the Oracle Developer Forum from someone who was trying to drop a pair of columns and finding that the statement failed with a surprising choice of error: ORA-00904: Invalid Identifier. The surprising thing about this error was that the named identifier was clearly not invalid. The suspicion that that this was an example of “new features colliding” was that the columns to be dropped were virtual columns based on a real column of the table that had been declared as an object type defined in the MDSYS (Spatial) schema.

Conveniently the author had supplied a short, simple, script to demonstrate the issue, so I copied it and modified it a bit to do a few tests around the edges of the problem. Here’s the code that I used to start my investigation:

rem
rem     Script:         drop_col_bug.sql
rem     Author:         Jonathan Lewis/User_77G7L
rem     Dated:          Mar 2022
rem     Purpose:        
rem
rem     Last tested 
rem             21.3.0.0        Still broken
rem             19.11.0.0
rem

create table xxx (
        v1              varchar2(10),
        n1              number,
        shape           mdsys.sdo_geometry,
        x_value         number generated always as (mdsys.sdo_geom.sdo_pointonsurface(shape,0.005).sdo_point.x) virtual,
        y_value         number generated always as (mdsys.sdo_geom.sdo_pointonsurface(shape,0.005).sdo_point.y) virtual,
        v2              varchar2(10),
        n2              number,
        n3              number
)
segment creation immediate
;

insert into xxx(v1, n1, v2, n2, n3) values('z',1,'y',2,3);
update xxx  set
        shape = sdo_geometry(
                2003,                           -- two-dimensional polygon
                null,
                null,
                sdo_elem_info_array(1,1003,3),  -- one rectangle (1003 = exterior)
                sdo_ordinate_array(1,1, 5,7)    -- only 2 points needed to define rectangle
        )
;


commit;

alter table xxx drop (x_value, y_value) ;

The modifications I made from the original code are:

  • I’ve removed a couple of redundant sets of parentheses from the virtual column definitions
  • I’ve added a few columns before, in between, and after the virtual columns
  • I’ve used “segment creation immediate”
  • I’ve inserted a row into the table

The last two are simply to ensure that I have data segments and at least one item for the drop to work on – just in case it’s a run-time problem being reported as a parse time issue.

The extra columns are to test whether the type and position of the column I drop affects the outcome, and the change in parentheses is purely aesthetic.

Here’s the result of the attempt to drop the virtual columns:

alter table xxx drop (x_value, y_value)
  *
ERROR at line 1:
ORA-00904: "MDSYS"."SDO_GEOM"."SDO_POINTONSURFACE": invalid identifier

This is a little strange since I have used the packaged function mdsys.sdo_geom.sdo_pointonsurface() to define the virtual columns and Oracle didn’t complain when I created the column. (Part of the reason I had reduced the original parentheses was to check that the compiler hadn’t got confused by an excess of paretheses).

As a quick “what if” test I tried using the alternative syntax for drop column that you can use with just one column:

SQL> alter table xxx drop column x_value;
alter table xxx drop column x_value
  *
ERROR at line 1:
ORA-00904: "MDSYS"."SDO_GEOM"."SDO_POINTONSURFACE": invalid identifier

What about trying to set the column unused before dropping all unused columns?

SQL> alter table xxx set unused column x_value;
alter table xxx set unused column x_value
  *
ERROR at line 1:
ORA-00904: "MDSYS"."SDO_GEOM"."SDO_POINTONSURFACE": invalid identifier

So is the problem restricted to the virtual columns – what happens if I try to drop a column from the end of the table, what about the one between the two virtual columns, how about a column that appears before even the shape column? Nothing changes:

SQL> alter table xxx drop column v1;
alter table xxx drop column v1
  *
ERROR at line 1:
ORA-00904: "MDSYS"."SDO_GEOM"."SDO_POINTONSURFACE": invalid identifier


SQL> alter table xxx set unused column v1;
alter table xxx set unused column v1
  *
ERROR at line 1:
ORA-00904: "MDSYS"."SDO_GEOM"."SDO_POINTONSURFACE": invalid identifier

What if I have only one of the virtual columns? No difference.

What if I don’t have either of the virtual columns? Finally I can drop any column I like from the table (including the shape column). Not that that’s much use to the user.

You can’t set unused or drop any columns in the table thanks to an error that looks as if it’s associated with the definition of those virtual columns.

Workaround

Is there any way to bypass the problem and still store the information we need (until we want to drop it). Let’s start by taking a look at the way Oracle has used our table definition to create column definitions, just in case that gives us a clue:

select 
        column_id id, segment_column_id seg_id, internal_column_id int_id, 
        column_name, data_type, data_default  
from 
        user_tab_cols 
where 
        table_name = 'XXX' 
order by 
        column_id,
        internal_column_id
/

        ID     SEG_ID     INT_ID COLUMN_NAME          DATA_TYPE                 DATA_DEFAULT
---------- ---------- ---------- -------------------- ------------------------- --------------------------------------------------------------------------------
         1          1          1 V1                   VARCHAR2
         2          2          2 N1                   NUMBER
         3          3          3 SHAPE                SDO_GEOMETRY
         3          4          4 SYS_NC00004$         NUMBER
         3          5          5 SYS_NC00005$         NUMBER
         3          6          6 SYS_NC00006$         NUMBER
         3          7          7 SYS_NC00007$         NUMBER
         3          8          8 SYS_NC00008$         NUMBER
         3          9          9 SYS_NC00009$         SDO_ELEM_INFO_ARRAY
         3         10         10 SYS_NC00010$         SDO_ORDINATE_ARRAY
         4                    11 X_VALUE              NUMBER                    (("MDSYS"."SDO_GEOM"."SDO_POINTONSURFACE"("SHAPE",0.005))."SDO_POINT")."X"
         5                    12 Y_VALUE              NUMBER                    (("MDSYS"."SDO_GEOM"."SDO_POINTONSURFACE"("SHAPE",0.005))."SDO_POINT")."Y"
         6         11         13 V2                   VARCHAR2
         7         12         14 N2                   NUMBER
         8         13         15 N3                   NUMBER

15 rows selected.

There’s quite a lot going on there in terms of columns hidden behind the sdo_geometry type. In fact internal columns 9 and 10 might prompt you to look for other objects like table types or LOBs:

SQL> select column_name, segment_name, index_name from user_lobs where table_name = 'XXX';

COLUMN_NAME                    SEGMENT_NAME                   INDEX_NAME
------------------------------ ------------------------------ ------------------------------
"SHAPE"."SDO_ELEM_INFO"        SYS_LOB0000100168C00009$$      SYS_IL0000100168C00009$$
"SHAPE"."SDO_ORDINATES"        SYS_LOB0000100168C00010$$      SYS_IL0000100168C00010$$

2 rows selected.

But the interesting detail is the data_default column for our two virtual columns – which have more parentheses than the original definitions. Perhaps the storage of the expression has gone wrong (as happened in an older version of Oracle with case expressions) and is causing the ORA-00904 error to appear. So let’s try selecting data from the table using the expression stored in data dictionary:

select
        ((MDSYS.SDO_GEOM.SDO_POINTONSURFACE(SHAPE,0.005)).SDO_POINT).X          old_x,
          mdsys.sdo_geom.sdo_pointonsurface(shape,0.005).sdo_point.x            new_x,
        ((MDSYS.SDO_GEOM.SDO_POINTONSURFACE(SHAPE,0.005)).SDO_POINT).Y          old_y
from
        xxx
/

     OLD_X      NEW_X      OLD_Y
---------- ---------- ----------
         1          1          1

1 row selected.


No syntax error there – as far as a simple select is concerned. I’ve included my tidier format for the x_value column aligned with the resulting stored value (with all the double quotes removed – though I’ve also tested it with the quotes in place) – and the only significant visual difference is the number of parentheses, so maybe that’s a clue. In particular we note that the error reports “MDSYS”.”SDO_GEOM”.”SDO_POINTONSURFACE” as the invalid identifier and the first time an extra (close) parenthesis appears is just after that function call. Maybe (for no good reason) the code path involved with handling column data during a drop/set unused call is getting confused by parentheses. So let’s try to reduce the complexity of the expression by hiding it inside a local function.

First attempt – create a function to return an sdo_point_type and define the virtual columns to expose the X and Y values from the point:

create or replace function my_point(
        inshape         in mdsys.sdo_geometry,
        intolerance     in number
)
return mdsys.sdo_point_type
deterministic
as
begin
        return mdsys.sdo_geom.sdo_pointonsurface(inshape, intolerance).sdo_point;
end;
/

        x_value         number generated always as (my_point(shape,0.005).x) virtual,
        y_value         number generated always as (my_point(shape,0.005).y) virtual,

This approach still produces an ORA-00904, though the invalid identifier becomes “TEST_USER”.”MY_POINT”.

Second attempt – two separate functions, one for the x value, one for the y value:

create or replace function my_x(
        inshape         in mdsys.sdo_geometry,
        intolerance     in number
)
return number
deterministic
as
begin
        return   mdsys.sdo_geom.sdo_pointonsurface(inshape, intolerance).sdo_point.x;
end;
/

show errors

create or replace function my_y(
        inshape         in mdsys.sdo_geometry,
        intolerance     in number
)
return number
deterministic
as
begin
        return   mdsys.sdo_geom.sdo_pointonsurface(inshape, intolerance).sdo_point.y;
end;
/

        x_value         number generated always as (my_x(shape, 0.005)) virtual,
        y_value         number generated always as (my_y(shape, 0.005)) virtual,

This worked so, finally, I looked at the SQL Language reference manual to see if there were any restrictions on virtual columns that might explain the problem I had had with all the previous definitions (and, yes, I know I should have done that right after the first failure) and I found the following:

  • The virtual column cannot be an Oracle supplied data type, a user-defined type, or LOB or LONG RAW.

None of my virtual column definitions returned an Oracle supplied data type or a user-defined data type. But would the restriction also apply to single attributes of such a data type, or has the complexity of spatial data types found a hole in the code? And the answer to that question is a whole new blog note waiting to be finish (because I’d forgotten what a pain it was to use object types in simple SQL.)

March 22, 2022

Upgrade Surprise

Filed under: 19c,Bugs,Oracle,Transformations,Upgrades — Jonathan Lewis @ 10:04 am GMT Mar 22,2022

Here’s a little surprise that showed up in the most recent (March 2022) article that I sent to Simpletalk for the series on transformations. I had been using 19c (19.11.0.0) to create and run my little demos but the editor had used 12.2.0.1 to check the examples and questioned a comment I had made about a “default plan”.

Here’s the query in question. I was using the emp and dept tables from the Scott schema to demonstrate a point about subquery execution:

rem
rem     Script:         unnest_demo_simpletalk_3.sql
rem     Author:         Jonathan Lewis
rem     Dated:          Feb 2022
rem 

break on deptno skip 1

select
        /*+ 
                qb_name(main)
                gather_plan_statistics 
        */
        *
from    emp e1
where   e1.sal + nvl(e1.comm,0) > (
                select  /*+ qb_name(subq) */
                        avg(e2.sal + nvl(e2.comm,0))
                from    emp e2
                where   e2.deptno = e1.deptno
        )
order by
        e1.deptno, e1.empno
/

As you can see, I’ve used a correlated aggregate subquery to report all employees who earned more than the average for their department, where “earnings” is calculated as the sum of salary and commission.

Here’s the plan I got when I ran this query under 19c:

------------------------------------------------------------------------------------------------------------------
| Id  | Operation            | Name | Starts | E-Rows | A-Rows |   A-Time   | Buffers |  OMem |  1Mem | Used-Mem |
------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT     |      |      1 |        |      6 |00:00:00.01 |      24 |       |       |          |
|   1 |  SORT ORDER BY       |      |      1 |      1 |      6 |00:00:00.01 |      24 |  2048 |  2048 | 2048  (0)|
|*  2 |   FILTER             |      |      1 |        |      6 |00:00:00.01 |      24 |       |       |          |
|   3 |    TABLE ACCESS FULL | EMP  |      1 |     14 |     14 |00:00:00.01 |       6 |       |       |          |
|   4 |    SORT AGGREGATE    |      |      3 |      1 |      3 |00:00:00.01 |      18 |       |       |          |
|*  5 |     TABLE ACCESS FULL| EMP  |      3 |      5 |     14 |00:00:00.01 |      18 |       |       |          |
------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - filter("E1"."SAL"+NVL("E1"."COMM",0)>)
   5 - filter("E2"."DEPTNO"=:B1)

When my editor ran the query on 12.2.0.1, and when I started up an instance of 12.2.0.1 and ran the query, the plan looked like this:

---------------------------------------------------------------------------------------------------------------------------
| Id  | Operation            | Name | Starts | E-Rows | A-Rows |   A-Time   | Buffers | Reads  |  OMem |  1Mem | Used-Mem |
---------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT     |      |      1 |        |      6 |00:00:00.02 |      29 |      6 |       |       |          |
|*  1 |  FILTER              |      |      1 |        |      6 |00:00:00.02 |      29 |      6 |       |       |          |
|   2 |   SORT GROUP BY      |      |      1 |      4 |     14 |00:00:00.02 |      29 |      6 |  2048 |  2048 | 2048  (0)|
|*  3 |    HASH JOIN         |      |      1 |     70 |     70 |00:00:00.02 |      29 |      6 |  1922K|  1922K| 1053K (0)|
|   4 |     TABLE ACCESS FULL| EMP  |      1 |     14 |     14 |00:00:00.01 |       7 |      6 |       |       |          |
|   5 |     TABLE ACCESS FULL| EMP  |      1 |     14 |     14 |00:00:00.01 |       7 |      0 |       |       |          |
---------------------------------------------------------------------------------------------------------------------------

  /*+
      BEGIN_OUTLINE_DATA
      IGNORE_OPTIM_EMBEDDED_HINTS
      OPTIMIZER_FEATURES_ENABLE('12.2.0.1')
      DB_VERSION('12.2.0.1')
      ALL_ROWS
      OUTLINE_LEAF(@"SEL$9E18A930")
      MERGE(@"SEL$AA0D0E02" >"SEL$B4BE209F")
      OUTLINE(@"SEL$B4BE209F")
      UNNEST(@"SUBQ")
      OUTLINE(@"SEL$AA0D0E02")
      OUTLINE(@"SEL$D6166863")
      OUTLINE(@"SUBQ")
      OUTLINE(@"MAIN")
      FULL(@"SEL$9E18A930" "E2"@"SUBQ")
      FULL(@"SEL$9E18A930" "E1"@"MAIN")
      LEADING(@"SEL$9E18A930" "E2"@"SUBQ" "E1"@"MAIN")
      USE_HASH(@"SEL$9E18A930" "E1"@"MAIN")
      END_OUTLINE_DATA
  */

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter("E1"."SAL"+NVL("E1"."COMM",0)>SUM("E2"."SAL"+NVL("E2"."COMM",0))/COUNT("E2"."SAL"+NVL("E2"."COMM",0))
              )
   3 - access("E2"."DEPTNO"="E1"."DEPTNO")

(I’ve added in a request for the ‘outline’ in the display_cursor() format.) The correlated subquery has been unnested and the resulting inline view has been subject to complex view merging. It was only at this point that I realised that the 19c plan was a little surprising and not what I should have expected.

After checking that the configuration and statistics (including the system stats) were the same on the two instances I re-ran the query on 12c with the /*+ no_unnest */ hint to make it use the plan that 19c had produced and I found (as expected) that the plan with filter subquery produced by 19c had a higher cost than the unnesting plan produced by 12c.

Next I re-ran the query on 19c with the /*+ unnest */ hint to make it use the plan that 12c had produced – but it didn’t! 19c “ignored” the hint and carried on using the filter subquery plan. It did, however, supply the following extra information when I added the ‘hint_report’ option to the to the display_cursor() format:

Total hints for statement: 3 (U - Unused (1))
---------------------------------------------------------------------------

   1 -  MAIN
           -  qb_name(main)

   4 -  SUBQ
         U -  unnest / Failed basic validity checks
           -  qb_name(subq)

The code in 19c thinks that it would be illegal to unnest the subquery that 12c was unnesting (does this mean that some people may be getting wrong results in 12c). So I checked the 10053 (CBO) trace file to see if there was any further information there that would “explain” the problem. This is what I found:

SU: Checking validity for Null Accepting Semi JoinUnnesting for query block MAIN(#1)
SU:   Checking validity of unnesting subquery SUBQ (#2)
SU:     SU bypassed: More than 1 column in connect condition.
SU:     SU bypassed: Failed basic validity checks.
SU:   Validity checks failed.

The reference to “Null accepting” looks a little suspect but prompted a few experiments (modifying the data to eliminate nulls, adding not null declarations to columns, simplifying the query etc.) that suggested that the problem was essentially that the optimizer did not want to unnest when the comparison was with the expression (sal + comm) regardless of the operator, and even when all the relevant columns had been populated, declared not null, and the nvl() function had been removed.

It doesn’t seem reasonable in this case, but possibly the block is a generic strategy to avoid errors in some more subtle cases, and perhaps the block will be refined and relaxed in future versions. (Or maybe it’s a bug that the wrong test is being called at this point – there was nothing in the query requiring “null acceptance” by the time I got through the last test.)

I did find a workaround that you could use to avoid any code rewrite:

alter table emp add nvl_earnings 
        invisible 
        generated always as (sal + nvl(comm,0)) 
        virtual
;

There’s seems to be no good reason why this should work – but it does. The subquery unnests and the Predicate Information in the plan doesn’t give any clue that it’s using a virtual column.

Summary:

When you upgrade from 12c there are some queries involving correlated subqueries that no longer unnest the subquery. This may have a significant impact on performance and it may not be possible to bypass the problem unless you rewrite the query to do a manual unnest although I did find a virtual column workaround for my specific example. So far I’ve tested the query on 19.11.0.0 and 21.3.0.0, the behaviour is the same in both versions.

Footnote:

After failing to find anything on MOS about the problem I emailed a draft of this note to Nigel Bayliss at Oracle – who did find a promising match on MOS.

The failure to unnest may be the consequence of the fix for bug 30593046: “A query having a scalar subquery returned a wrong result”. The fix was introduced in 19.9.0.0 but was too restrictive, leading to the creation of bug 33325981: “Query Loses Subquery Unnesting From Plan in 19.9 and Above”.

The fix for 33325981 was distributed in 19.13.0.0 and 21.4.0.0 (plus a couple of earlier RURs, with patches available for various versions back to 12.2.0.1). Unfortunately the newer fix still doesn’t go far enough in reducing the restrictions and my example still doesn’t unnest.

Make sure you check any code that depends on “expression-based” subquery unnesting before you upgrade to 19.9, as it may change plan, which may make a difference to performance and a requirement for a workaround.

January 1, 2022

Happy New Year

Filed under: 12c,Bugs,CBO,Execution plans,Oracle,Transformations — Jonathan Lewis @ 12:01 am GMT Jan 1,2022

Here’s an entertaining little bug that appeared on the oracle-l list server just in time to end one year and start another in a suitable way. The thread starts with an offering from Matthias Rogel (shown below with some cosmetic changes) to be run on Oracle 12.2.0.1:

rem
rem     Script:         group_by_bug.sql
rem     Author:         Mathias Rogel  / Jonathan Lewis
rem     Dated:          Dec 2021
rem
rem     Last tested 
rem             19.11.0.0       Fixed
rem             12.2.0.1        Fail
rem

create table t as (
        select date'2021-12-30' d from dual 
        union all 
        select date'2021-12-31'   from dual
);

select extract(year from d), count(*) from t group by extract(year from d);

alter table t add primary key(d);
select extract(year from d), count(*) from t group by extract(year from d);

This doesn’t look particularly exciting – I’ve created a table with two rows holding two dates in the same year, then counted the number of rows for “each” year before and after adding a primary key on the date column. Pause briefly to think about what the results might look like …

Table created.


EXTRACT(YEARFROMD)   COUNT(*)
------------------ ----------
              2021          2

1 row selected.


Table altered.


EXTRACT(YEARFROMD)   COUNT(*)
------------------ ----------
              2021          1
              2021          1

2 rows selected.

After adding the primary key (with its unique index) the result changes to something that is clearly (for this very simple data set) wrong.

At this point I offered a hypothetical reason why Oracle might be producing the wrong result, but Tim Gorman was one step ahead of me and supplied a bug reference from MOS: Wrong Result Using GROUP BY with EXTRACT Function Against DATE (Doc ID 2629968.1)

The MOS document describes this as a bug introduced in the upgrade from 12.1.0.2 to 12.2.0.1, demonstrates the error with the extract() function applied to a date, and supplies three possible workarounds (but not the workaround or explanation I supplied in my response on oracle-l).

The document also pointed to a further bug note that described how the problem also appeared with the combination of the to_char() function applied to a date column with a unique indexes: 12.2 Wrong Results from Query with GROUP BY Clause Based on To_char Function of Unique Index Columns (Doc ID 2294763.1) with a further suggestion for applying a patch (see MOS Doc ID: 26588069.8) or upgrading to 18.1 (where the bug has been fixed).

Matthias Rogel supplied a follow-up demonstrating the problem with to_char(), which prompted me to create an example showing that it wasn’t just about dates – which I’ve tidied up below (reminder, all results on this page are from 12.2.0.1):

create  table t1 as 
select  round(rownum/10,1) n1 
from    all_objects 
where   rownum <= 10
;


select n1 from t1 order by n1;
select n1, count(*) from t1 group by n1 order by n1;

column nch format A3

select to_char(n1,'99') nch, count(*) from t1 group by to_char(n1,'99') order by 1,2;

select * from table(dbms_xplan.display_cursor(format =>'outline'));

alter table t1 add constraint t1_pk primary key(n1);
select to_char(n1,'99') nch , count(*) from t1 group by to_char(n1,'99') order by 1,2;

select * from table(dbms_xplan.display_cursor(format =>'outline'));

As before I’ve created a simple table, and populated it with a few rows of data. THe first two queries are there to show you the data (0.1 to 1.0 by steps of 0.1), and show that aggregating the raw data produces one row per value.

I’ve then repeated the aggregation query, but converted each value to a character string that effectively rounds the value to an integer. Here are the two sets of results, before and after adding the primary key.

NCH   COUNT(*)
--- ----------
  0          4
  1          6

2 rows selected.

Table altered.

NCH   COUNT(*)
--- ----------
  0          1
  0          1
  0          1
  0          1
  1          1
  1          1
  1          1
  1          1
  1          1
  1          1

10 rows selected.

Again, the introduction of the primary key constraint on the column results in wrong results. In this example, though I’ve pulled the execution plans from memory along with their outlines, and this is what the two plans look like.

SQL_ID  gt5a14jb0g4n0, child number 0
-------------------------------------
select to_char(n1,'99') nch, count(*) from t1 group by to_char(n1,'99')
order by 1,2

Plan hash value: 2808104874

----------------------------------------------------------------------------
| Id  | Operation           | Name | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------
|   0 | SELECT STATEMENT    |      |       |       |     4 (100)|          |
|   1 |  SORT ORDER BY      |      |    10 |    30 |     4  (50)| 00:00:01 |
|   2 |   HASH GROUP BY     |      |    10 |    30 |     4  (50)| 00:00:01 |
|   3 |    TABLE ACCESS FULL| T1   |    10 |    30 |     2   (0)| 00:00:01 |
----------------------------------------------------------------------------

Outline Data
-------------

  /*+
      BEGIN_OUTLINE_DATA
      IGNORE_OPTIM_EMBEDDED_HINTS
      OPTIMIZER_FEATURES_ENABLE('12.2.0.1')
      DB_VERSION('12.2.0.1')
      ALL_ROWS
      OUTLINE_LEAF(@"SEL$1")
      FULL(@"SEL$1" "T1"@"SEL$1")
      USE_HASH_AGGREGATION(@"SEL$1")
      END_OUTLINE_DATA
  */


SQL_ID  4fxxtmrh8cpzp, child number 0
-------------------------------------
select to_char(n1,'99') nch , count(*) from t1 group by
to_char(n1,'99') order by 1,2

Plan hash value: 1252675504

--------------------------------------------------------------------------
| Id  | Operation        | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT |       |       |       |     2 (100)|          |
|   1 |  SORT ORDER BY   |       |    10 |    30 |     2  (50)| 00:00:01 |
|   2 |   INDEX FULL SCAN| T1_PK |    10 |    30 |     1   (0)| 00:00:01 |
--------------------------------------------------------------------------

Outline Data
-------------

  /*+
      BEGIN_OUTLINE_DATA
      IGNORE_OPTIM_EMBEDDED_HINTS
      OPTIMIZER_FEATURES_ENABLE('12.2.0.1')
      DB_VERSION('12.2.0.1')
      ALL_ROWS
      OUTLINE_LEAF(@"SEL$9BB7A81A")
      ELIM_GROUPBY(@"SEL$47952E7A")
      OUTLINE(@"SEL$47952E7A")
      ELIM_GROUPBY(@"SEL$1")
      OUTLINE(@"SEL$1")
      INDEX(@"SEL$9BB7A81A" "T1"@"SEL$1" ("T1"."N1"))
      END_OUTLINE_DATA
  */

In the absence of the primary key (index) Oracle does a full tablescan, then hash group by, then sort order by. When the primary key is put in place Oracle does an index full scan (which is legal because the index must contain all the data thanks to the not null declaration inherent in a primary key) and a sort order by without any group by.

You might wonder if the problem arises because Oracle assumes the indexed path somehow means the aggregation doesn’t apply – but with a /*+ full(t1) */ hint in place and a full tablescan in the plan the aggregation step is still missing — and if you look at the Outline Data section of the plan you can see that this is explicitly demanded by the hint(s): /*+ elim_groupby() */

My hypothesis (before I read the bug note) was that the optimizer had picked up the primary key declaration and seen that n1 was unique and therefore allowed the aggregating group by to be eliminated, but failed to “notice” that the to_char() – or extract() in the date example – meant that the assumption of uniqueness was no longer valid. To work around this problem very locally I simply added the hint /*+ no_elim_groupby */ (with no query block specified) to the query – and got the correct results.

Footnote

There is an interesting side note to this example (though not one that I would want to see used in a production system – this comment is for interest only). If you look at the Outline Data for the plan when there was no primary key you’ll notice that the only outline_leaf() is named sel$1 whereas in the plan with the primary key sel$1 appears as an outline() and the only outline_leaf() is named sel$9bb7a81a. As “outline leaf” is a query block that was used by the optimizer in constructing the final plan, while an “outline” is an intermediate query block that was examined before being transformed into another query block. So this difference in the Outline Data tells us that the problem appears thanks to a transformation that did not happen when there was no index – so what would our execution plan look like if the only hint we used in the query was /*+ outline_leaf(@sel$1) */ ?

SQL_ID  apgu34hc3ap7f, child number 0
-------------------------------------
select /*+ outline_leaf(@sel$1) */ to_char(n1,'99') nch , count(*) from
t1 group by to_char(n1,'99') order by 1,2

Plan hash value: 3280011052

---------------------------------------------------------------------------
| Id  | Operation         | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |       |       |       |     3 (100)|          |
|   1 |  SORT ORDER BY    |       |    10 |    30 |     3  (67)| 00:00:01 |
|   2 |   HASH GROUP BY   |       |    10 |    30 |     3  (67)| 00:00:01 |
|   3 |    INDEX FULL SCAN| T1_PK |    10 |    30 |     1   (0)| 00:00:01 |
---------------------------------------------------------------------------

Outline Data
-------------

  /*+
      BEGIN_OUTLINE_DATA
      IGNORE_OPTIM_EMBEDDED_HINTS
      OPTIMIZER_FEATURES_ENABLE('12.2.0.1')
      DB_VERSION('12.2.0.1')
      ALL_ROWS
      OUTLINE_LEAF(@"SEL$1")
      INDEX(@"SEL$1" "T1"@"SEL$1" ("T1"."N1"))
      USE_HASH_AGGREGATION(@"SEL$1")
      END_OUTLINE_DATA
  */


This posting was scheduled to launch at 00:01 GMT on 1st January 2022. Happy new year – just be careful that you don’t try to extract() or to_char() it if you’re running 12.2.0.1 unless you’ve applied patch 26588069.

August 21, 2021

GTT LOBs

Filed under: Bugs,Infrastructure,Oracle — Jonathan Lewis @ 1:53 pm BST Aug 21,2021

Searching my blog recently for some details about a temporary space problem I came across a draft I’d written a few years ago about LOB columns in global temporary tables. It was a summary of an exchange from OTN that did a good job of capturing my general air of scepticism when reviewing database results. The basic problem came from a user who had discovered that when they included a LOB column in a global temporary table the LOBINDEX segment “was created in the SYSTEM tablespace”.

This note is probably not going to be of much practical benefit to many people – because no-one should be using an version old enough to display the anomaly – but it does demonstrate a principle and a pattern of thinking, so here goes – 5 years late.

Like the “Earn $50M by helping me steal $100M” email the claim from the OP seemed a little suspect.  My basic approach to Oracle is: “if it looks unreasonable it probably isnt’ true so test it”, so I did. Here’s the first bit of SQL (which I ran on 12.1.0.2 on an empty schema):


rem
rem     Script:         gtt_lobs.sql
rem     Author:         Jonathan Lewis
rem     Dated:          May 2016
rem     Purpose:
rem
rem     Last tested
rem             12.1.0.2
rem             11.2.0.4

create global temporary table gtt1(
        id      number          not null,
        v1      varchar2(10)    not null,
        c1      clob,
        constraint gtt1_pk primary key(id)
)
-- tablespace gtt_temp
;

select
        table_name, index_name, tablespace_name
from
        user_indexes
;

TABLE_NAME           INDEX_NAME                       TABLESPACE_NAME
-------------------- -------------------------------- ------------------------------
GTT1                 GTT1_PK
GTT1                 SYS_IL0000168694C00003$$         SYSTEM

Sure enough, the query against user_indexes says the LOBINDEX for the LOB in the global temporary table will be created in the SYSTEM tablespace! This is clearly ridiculous – so I’m not going to believe it until I’ve actually confirmed it. I tend to trust the data dictionary rather more than I trust the manuals and MOS – but the data dictionary is just a bunch of tables and a bit of SQL that’s been written by someone else. so even the data dictionary can be wrong. It’s easy enough to test:

insert into gtt1 values(1,'a',rpad('x',4000,'x'));

select
        username, tablespace, segtype, segfile#, segblk#
from
        v$tempseg_usage
;

USERNAME                       TABLESPACE                      SEGTYPE     SEGFILE#    SEGBLK#
------------------------------ ------------------------------- --------- ---------- ----------
TEST_USER                      TEMP                            DATA             201     261632
TEST_USER                      TEMP                            INDEX            201     261760
TEST_USER                      TEMP                            LOB_DATA         201     262016
TEST_USER                      TEMP                            INDEX            201     261888

Whatever the data dictionary says, the actual segment (at file 201, block 261888 – it would be nice if the type were LOB_INDEX rather than just INDEX, but that still hasn’t been fixed, even in 21.3.0.0) has been created in the temporary tablespace. Checking the definiton of the dba_indexes view, I came to the conclusion that the property column of the sys.ind$ table hadn’t had the bit set to show that it’s an index associated with a temporary table; as a result the SQL in the view definition reports tablespace 0 (SYSTEM) rather than decoding the zero to a null.

You’ll note that there’s a commented reference to “tablespace gtt_temp” in my original table creation statement. It’s not terribly well known but in recent versions of Oracle you can associate a global temporary table with a specific temporary tablespace – this gives you some scope for easily monitoring the impact of particular global temporary tables on the total I/O resource usage. After re-running the test but specifying this tablespace as the location for the GTT I got the following results:


TABLE_NAME           INDEX_NAME                       TABLESPACE_NAME
-------------------- -------------------------------- ------------------------------
GTT1                 GTT1_PK                          GTT_TEMP
GTT1                 SYS_IL0000262251C00003$$         GTT_TEMP

USERNAME                       TABLESPACE                      SEGTYPE     SEGFILE#    SEGBLK#
------------------------------ ------------------------------- --------- ---------- ----------
TEST_USER                      GTT_TEMP                        DATA             202        512
TEST_USER                      GTT_TEMP                        INDEX            202        384
TEST_USER                      GTT_TEMP                        LOB_DATA         202        128
TEST_USER                      GTT_TEMP                        INDEX            202        256

The anomaly disappears – everything is reported as being linked to the gtt_temp tablespace (and that includes the table itself, as reported in view user_tables, though I haven’t included that query in the test or results).

Footnote:

A few months after the first draft of this note  I happened to rediscover it (but still failed to publish it) and after a quick search on MOS found the following bug, reported as fixed in 12.2, with a backport currently available to 11.2.0.3 (the document has since been hidden, and the only remaining evidence is a patch for AIX):

Bug 18897516 : GLOBAL TEMPORARY TABLE WITH LOBCOLUMNS CREATE LOB INDEX IN SYSTEM SCHEMA

Interestingly the description says: “lob index created in system tablespace” rather than “lob index incorrectly reported as being in system tablespace”. You do have to be very careful with how you describe things if you don’t want to cause confusion – this “problem” isn’t a space management threat, it’s just an irritating reporting error.

 

 

July 20, 2021

Hex tip

Filed under: 19c,Bugs,Oracle,Troubleshooting — Jonathan Lewis @ 5:40 pm BST Jul 20,2021

A surprising amount of the work I do revolves around numbers; and once I’m outside the realm of the optimizer (i.e. getting away from simple arithmetic), one of the bits of playing with numbers that I do most often is conversion – usually decimal to hexadecimal, sometimes decimal to binary.

Here’s an example of how this helped me debug an Oracle error a few days ago. We start with someone trying to purge data from aud$ using the official dbms_audit_mgmt package, first setting the package’s db_delete_batch_size parameter to the value 100,000 then calling dbms_audit_mgmt.clean_audit_trail.

In theory this should have deleted (up to) 100,000 rows from aud$ starting from the oldest data. In practice it tried to delete far more rows, generating vast amounts of undo and redo, and locking up resources in the undo tablespace for ages. The SQL statement doing all the work looked like the following (after a little cosmetic work):

DELETE FROM SYS.AUD$ 
WHERE  DBID = 382813123 
AND    NTIMESTAMP# < to_timestamp('2020-12-17 00:00:00', 'YYYY-MM-DD HH24:MI:SS.FF')
AND    ROWNUM <= 140724603553440

That’s a rather large number in the rownum predicate, much larger than the expected 100,000. Whenever I am puzzled by very large numbers in places I’m not expecting to see them one of the first things I do to poke it around is to convert it to hexadecimal. (Although it seems a fairly random thing to do it doesn’t take very long and it produces an interesting result fairly frequently.)

140724603553440 (dec) = 0x7FFD000186A0

You may not think that the resulting hex number is very interesting – but there’s a string of zeros in the middle that is asking for a little extra poking. So let’s convert the last 8 digit (starting with those 3 zeros) back to decimal.

0x000186A0 = 100,000 (dec)

There’s an interesting coincidence – we’ve got back to the 100,000 that the OP had set as the db_delete_batch_size. Is this really a coincidence or does it tell us something about a bug? That’s easy enough to test, just try setting a couple of different values for the parameter and see if this affects the rownum predicate in a consistent fashion. Here are the results from two more test values:

1,000,000 ==> 140733194388032 (dec) = 0x7FFF000F4240 .... 0x000F4240 = 1,000,000 (dec)
   50,000 ==> 140728898470736 (dee) = 0x7FFE0000C350 .... 0x0000C350 =    50,000 (dec)

The top 4 digits (2 bytes) have changed, but the bottom 8 digits (4 bytes) do seem to hold the db_delete_batch_size requested. At this point I felt that we were probably seeing some sort of pointer error in a C library routine. If you examine the file $ORACLE_HOME/rdbms/admin/prvtamgt.plb) you’ll find that one of the few readable lines says:

CREATE OR REPLACE LIBRARY audsys.dbms_audit_mgmt_lib wrapped

My guess was that there were probably a couple of external C routines involved, with PL/SQL wrappers in the public package; and that there was a mismatch between the declarations in C and the declarations in the PL/SQL.

Update (after twitter exchange)

It turns out that I wasn’t quite right, but I was in the right olympic stadium. This is now (unpublished) bug 33136016, and if you’ve been seeing unexpected work patterns when purging the audit trail after upgrading to 19c or later then there may be a patch for you in the not too distant future.

March 8, 2021

Join Elimination redux

Filed under: Bugs,CBO,Join Elimination,Oracle,Transformations — Jonathan Lewis @ 12:58 pm GMT Mar 8,2021

This note is a followup to a post from a few years back (originally dating further back to 2012) where I described an inconsistency that appeared when join elimination and deferrable constraints collided. The bug resurfacted recently in a new guise in a question on the Oracle Developer forum with a wonderful variation on the symptons that ultimately gave a good clue to underlying issue. The post included a complete working example of the anomaly, but I’ll demonstrate it using a variation of my 2012/2017 code. We start with a pair of tables with referential integrity defined between them:

rem
rem     Script:         join_eliminate_bug_3.sql
rem     Author:         Jonathan Lewis
rem     Dated:          Feb 2021
rem
rem     Last tested 
rem             19.11.0.0
rem             19.8.0.0 (LiveSQL)
rem

create table parent (
        id      number(4),
        name    varchar2(10),
        constraint par_pk primary key (id)
        deferrable initially immediate
)
;

create table child(
        id_p    number(4)       
                constraint chi_fk_par
                references parent,
        id      number(4),
        name    varchar2(10),
        constraint chi_pk primary key (id_p, id) 
)
;

insert into parent values (1,'Smith');

insert into child values(1,1,'Simon');
insert into child values(1,2,'Sally');

commit;

begin
        dbms_stats.gather_table_stats(user,'child');
        dbms_stats.gather_table_stats(user,'parent');
end;
/

You’ll notice that I’ve created the primary key constraint on parent as “deferrable initially immediate”. So let’s write some code that defers the constraint, inserts some duplicate data executes a join between the two tables:

set serveroutput off
set constraint par_pk deferred;

insert into parent (id,name) values (1,'Smith');

select
        /*+ initially immediate  PK */
        chi.*
from
        child   chi,
        parent  par
where
        par.id = chi.id_p
;

select * from table(dbms_xplan.display_cursor);

Since I’ve now got two rows with id = 1 in parent the query ought to return duplicates for every row in child where id_p = 1, but it doesn’t. Here’s the output from the query and the execution plan:

     ID_P         ID NAME
---------- ---------- ------------
         1          1 Simon
         1          2 Sally

2 rows selected.


PLAN_TABLE_OUTPUT
--------------------------------------------------
SQL_ID  gy6h8td4tmdpg, child number 0
-------------------------------------
select  /*+ initially immediate  PK */  chi.* from  child chi,  parent
par where  par.id = chi.id_p

Plan hash value: 2406669797

---------------------------------------------------------------------------
| Id  | Operation         | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |       |       |       |     2 (100)|          |
|   1 |  TABLE ACCESS FULL| CHILD |     2 |    24 |     2   (0)| 00:00:01 |
---------------------------------------------------------------------------

The optimizer has applied “join elimination” to take parent out of the transformed query, so no duplicates. Arguably this is the wrong result.

Let’s roll back the insert and change the experiment – let’s change the constraint on the parent primary key so that it’s still deferrable, but initially deferred then repeat the insert and query:

rollback;
alter table child drop constraint chi_fk_par;
alter table parent drop constraint par_pk;

alter table parent add constraint par_pk primary key (id) deferrable initially deferred;
alter table child add constraint chi_fk_par foreign key(id_p) references parent;

insert into parent (id,name) values (1,'Smith');

select
        /*+ initially deferred  PK */
        chi.*
from
        child   chi,
        parent  par
where
        par.id = chi.id_p
;

select * from table(dbms_xplan.display_cursor);

In this case we don’t need to “set constraint par_pk deferred”, it’s implicitly deferred by definition and will only be checked when we commit any transaction. Would you expect this to make any difference to the result? This is what we get:

      ID_P         ID NAME
---------- ---------- ------------
         1          1 Simon
         1          1 Simon
         1          2 Sally
         1          2 Sally

4 rows selected.


PLAN_TABLE_OUTPUT
--------------------------------------------------
SQL_ID  8gvn3mzr8uv0h, child number 0
-------------------------------------
select  /*+ initially deferred  PK */  chi.* from  child chi,  parent
par where  par.id = chi.id_p

Plan hash value: 1687613841

-----------------------------------------------------------------------------
| Id  | Operation          | Name   | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |        |       |       |     2 (100)|          |
|   1 |  NESTED LOOPS      |        |     2 |    30 |     2   (0)| 00:00:01 |
|   2 |   TABLE ACCESS FULL| CHILD  |     2 |    24 |     2   (0)| 00:00:01 |
|*  3 |   INDEX RANGE SCAN | PAR_PK |     1 |     3 |     0   (0)|          |
-----------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   3 - access("PAR"."ID"="CHI"."ID_P")

When the parent primary key is initially deferred then join elimination doesn’t take place – so we get two copies of each child row in the output. (This is still true even if we add the “rely” option to the parent primary key constraint).

Debug Analysis

As I said at the top of the article, this does give us a clue about the source of the bug. A check of the dictionary table cdef$ (constraint definitions) shows the following notes for column cdef$.defer:

  defer         number,                     /* 0x01 constraint is deferrable */
                                              /* 0x02 constraint is deferred */
                                /* 0x04 constraint has been system validated */
                                 /* 0x08 constraint name is system generated */
etc...

With my examples the “initially immediate” constraint reported defer = 5, for the “initially deferred” constraint it reported the value 7. It looks as if the optimizer code to handle join elimination look only at the static definition of the constraint (bit 0x02) and doesn’t consider the possibility that if bit 0x01 is set it should also check the session state to see if the constraint has been temporarily deferred.

Conclusion

If you are going to implement deferrable constraints be very careful about tracking exactly how you use them, and be aware that if you execute arbitrary queries in mid-transaction then you may find that the results are not exactly what you expect. In fact, though it’s not demonstrated here, different forms of SQL to that should express the same requirement may not give the same results.

Update (May 2021)

This buggy behaviour is still present in 19.11.0.0

October 6, 2020

Index FFS Cost 2

Filed under: Bugs,CBO,Indexing,Oracle,Statistics — Jonathan Lewis @ 12:58 pm BST Oct 6,2020

Here’s a little puzzle, highlighting a “bug that’s not a bug” that was “fixed but not fixed” some time in the 10.2 timeline. (If you want specifics about exactly when the fix became available and what patches might be available they’re in MOS – Bug 5099019 : DBMS_STATS DOESN’T COUNT LEAF_BLOCKS CORRECTLY.

Running 19.3.0.0, with the system statistics as shown:


begin
        dbms_stats.set_system_stats('MBRC',16);
        dbms_stats.set_system_stats('MREADTIM',10);
        dbms_stats.set_system_stats('SREADTIM',5);
        dbms_stats.set_system_stats('CPUSPEED',500);
end;

begin
        dbms_stats.gather_table_stats(
                ownname          => user,
                tabname          => 'T1',
                cascade          => true,
                estimate_percent => 100,
                method_opt       => 'for all columns size 1'
        );
end;
/

select
        index_name, leaf_blocks
from
        user_indexes
where
        table_name = 'T1'
;

alter system flush buffer_cache;

set autotrace traceonly 

select
        count(small_vc)
from    t1
where   small_vc is not null
;

set autotrace off

For my test run there are no uncommitted transactions, the gathering of table stats with cascade to indexes means all the blocks are clean so there’s no undo activity needed to produce an answer for the final query, and all that the query is going to do is run an index fast full scan to count the number of rows in the table because there’s an index on just (small_vc).

The system stats tell us that Oracle is going to cost the index fast full scan by taking the leaf block count, dividing by 16 (the MBRC) and multiplying by 2 ( mreadtim / sreadtim) and adding a bit for CPU usage. So here are the results from running the query and generating the plan with autotrace:


PL/SQL procedure successfully completed.

INDEX_NAME           LEAF_BLOCKS
-------------------- -----------
T1_V1                        153

1 row selected.

System altered.

1 row selected.

Execution Plan
----------------------------------------------------------
Plan hash value: 274579903

-------------------------------------------------------------------------------
| Id  | Operation             | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------
|   0 | SELECT STATEMENT      |       |     1 |    11 |    22   (5)| 00:00:01 |
|   1 |  SORT AGGREGATE       |       |     1 |    11 |            |          |
|*  2 |   INDEX FAST FULL SCAN| T1_V1 |  9999 |   107K|    22   (5)| 00:00:01 |
-------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - filter("SMALL_VC" IS NOT NULL)

Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
       1530  consistent gets
       1522  physical reads
          0  redo size
        558  bytes sent via SQL*Net to client
        425  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          1  rows processed

Points to notice:

  • The cost is (as predicted):  ceiling(153 leaf_blocks / 16 MBRC ) * 2 + a bit:  = 20 + bit.
  • The number of physical blocks read is 1522, not the 153 leaf blocks reported in index stats
  • No recursive SQL, and I did say that there were no undo / read consistency issues to worry about

There is clearly an inconsistency between the size of the index and the (100% gathered) leaf_block count that the optimizer is using, and this is a point I made many years ago in “Cost Based Oracle – Fundamentals”.  To gather the leaf block count Oracle has looked at the number of leaf blocks that, after cleanout. hold any index entries – and here’s how I prepared this demo to confuse the issue:


rem
rem     Script:         index_ffs_cost.sql
rem     Author:         Jonathan Lewis
rem     Dated:          July 2012
rem

create table t1
as
with generator as (
        select  --+ materialize
                rownum id
        from dual
        connect by
                level <= 1e4 -- > comment to avoid wordpress format issue
)
select
        rownum                  id,
        lpad(rownum,10,'0')     small_vc,
        rpad('x',100)           padding
from
        generator       v1,
        generator       v2
where
        rownum <= 1e5 -- > comment to avoid wordpress format issue
;

create index t1_v1 on t1(small_vc) pctfree 80;

delete
        from t1
where
        id between 5000 and 95000
;

commit;

When I gathered stats on the index I had just deleted 90% of the data from the table – so 90% of the leaf blocks in the index were empty but still in the index structure, and only 10% of the leaf blocks held any current index entries. This is why Oracle reported leaf_blocks = 153 while the index fast full scan had to read through nearly 1530 blocks.

This is one of those contradictory problems – for an index fast full scan you need to know the size of the segment that will be scanned because that’s the number of blocks you will have to examine; but in most cases the number of populated index leaf blocks is the number you need to know about when you’re trying to cost an index range scan. Of course in most cases the nature of Oracle’s implementation of B-tree indexes will mean that the two counts will be within one or two percent of each other. But there are a few extreme cases where you could get an index into a state where the segment size is large and the data set is small and you don’t want the optimizer to think that an index fast full scan will be low-cost.

Oracle produced a mechanism for getting the segment block count captured as the leaf_blocks statistic late in 10.2, but it’s not implemented by default, and it’s not something you can tweak into a query with the opt_param() hint. Fix control 509019 has the description: “set leaf blocks to the number of blocks in the index extent map”, but it’s not a run-time fix, it’s a fix that has to be in place when you gather the index stats – thus:

alter session set "_fix_control"='5099019:1';

begin
        dbms_stats.gather_table_stats(
                ownname          => user,
                tabname          =>'T1',
                cascade          => true,
                estimate_percent => 100,
                method_opt       => 'for all columns size 1'
        );
end;
/

select
        index_name, leaf_blocks
from
        user_indexes
where
        table_name = 'T1'
;

alter system flush buffer_cache;

set autotrace traceonly 

select
        count(small_vc)
from    t1
where   small_vc is not null
;

set autotrace off

===================================================

Session altered.

PL/SQL procedure successfully completed.

INDEX_NAME           LEAF_BLOCKS
-------------------- -----------
T1_V1                       1555

1 row selected.

System altered.

1 row selected.

Execution Plan
----------------------------------------------------------
Plan hash value: 274579903

-------------------------------------------------------------------------------
| Id  | Operation             | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------
|   0 | SELECT STATEMENT      |       |     1 |    11 |   201   (3)| 00:00:01 |
|   1 |  SORT AGGREGATE       |       |     1 |    11 |            |          |
|*  2 |   INDEX FAST FULL SCAN| T1_V1 |  9999 |   107K|   201   (3)| 00:00:01 |
-------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - filter("SMALL_VC" IS NOT NULL)

Statistics
----------------------------------------------------------
          1  recursive calls
          0  db block gets
       1530  consistent gets
       1522  physical reads
          0  redo size
        558  bytes sent via SQL*Net to client
        439  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          1  rows processed

Session altered.

As you can see Oracle has now set leaf_blocks = 1555. This is actually the number of blocks below the highwater mark in the segment. A further check with the dbms_space package showed that the number of blocks in the index structure was actually 1,523 with a further 32 blocks that we could infer were space management blocks.

Of the 1,523 blocks in the index structure 157 were reported as “Full” while 1364 were reported as “Freespace 2 (25% – 50% free)” which, for indexes, possibly ought to mean  “available for (re-)use” although this didn’t quite seem to be the case in some tests I did a few years ago.

Conclusion

Although Oracle has supplied a fix to a problem that I first highlighted in Cost Based Oracle – Fundamentals, I can understand why it’s not enabled by default, and I don’t think I’d want to take advantage of it in a production system given the way it’s a session setting at stats gathering time. Given the number of times it’s likely to matter I’d probably add hints to the SQL to stop the optimizer from using the index incorrectly, or do something at a stats-gathering moment to call dbms_stats.set_index_stats().

 

October 1, 2020

Index FFS Cost

Filed under: Bugs,CBO,Execution plans,Indexing,Oracle — Jonathan Lewis @ 11:46 am BST Oct 1,2020

There are a number of unexpected issues with the optimizer’s treatment of the index fast full scan, the access path where Oracle ignores the structure of the B-tree and uses multiblock reads to do a brute-force segment scan as if the index were a “skinny table” with a few blocks of irrelevant garbage (i.e. the branch blocks) that could be ignored.

A recent comment made a few days on a blog about the optimizer’s “index-join” access path reminded me that I had a few notes to finish and publish that might help some people address performance issues relating to a couple of little-known fixes for optimizer oddities in this area. Today’s “fix” is one that appeared in 10g (or maybe very late in 9i) but never became the default optimizer behaviour (possibly because it’s one of those “not a bug” types of bug with no “always correct” strategy).

The problem revolves around the optimizer_index_cost_adj parameter (which is another reason why the fix might have been kept out of the standard code base). For index-only execution plans the parameter will apply to the index range scan, full scan, or skip scan, but not to the index fast full scan. Here’s a model to demonstrate this:


rem
rem     Script:         oica_iffs.sql
rem     Author:         Jonathan Lewis
rem     Dated:          Nov 2012
rem
rem     Last tested
rem             19.3.0.0
rem

create table t1
as
with generator as (
        select  --+ materialize
                rownum id
        from dual
        connect by
                level <= 1e4 -- > comment to avoid wordpress format issue
)
select
        lpad(trunc(dbms_random.value(0,10000)),10)      v1,
        rpad('x',20)                                    padding
from
        generator       v1,
        generator       v2
where
        rownum <= 1e6 -- > comment to avoid wordpress format issue
;

create index t1_i1 on t1(v1);

begin
        dbms_stats.gather_table_stats(
                ownname          => user,
                tabname          =>'T1',
                cascade          => true,
                method_opt       => 'for all columns size 1'
        );
end;
.

select blocks from user_tables where table_name = 'T1';
select leaf_blocks from user_indexes where index_name = 'T1_I1';

I’ve created a fairly narrow table with 1M rows, and created an index on that table on the v1 column. In my test there were 5,219 blocks in the table and 3,068 blocks in the index.

I’m now going to run a query to count the rows in the table where v1 is not null and report its execution plan. The first run will be unhinted, the second run will include a hint that forces an index full scan, and the third run will include a hint to force an index fast full scan.


prompt  ==============
prompt  Unhinted query
prompt  ==============

select
        count(*)
from
        t1
where
        v1 is not null
;

prompt  ==========
prompt  Index hint
prompt  ==========

select
        /*+ index(t1) */
        count(*)
from
        t1
where
        v1 is not null
;

prompt  ==============
prompt  index_ffs hint
prompt  ==============

select
        /*+ index_ffs(t1) */
        count(*)
from
        t1
where
        v1 is not null
;

Here are the three execution plans. As you might have predicted the default plan (for my Oracle 19.3) is the index fast full scan:


==============
Unhinted query
==============
-------------------------------------------------------------------------------
| Id  | Operation             | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------
|   0 | SELECT STATEMENT      |       |     1 |    11 |   450  (15)| 00:00:01 |
|   1 |  SORT AGGREGATE       |       |     1 |    11 |            |          |
|*  2 |   INDEX FAST FULL SCAN| T1_I1 |  1000K|    10M|   450  (15)| 00:00:01 |
-------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - filter("V1" IS NOT NULL)

==========
Index hint
==========
--------------------------------------------------------------------------
| Id  | Operation        | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT |       |     1 |    11 |  3159   (3)| 00:00:01 |
|   1 |  SORT AGGREGATE  |       |     1 |    11 |            |          |
|*  2 |   INDEX FULL SCAN| T1_I1 |  1000K|    10M|  3159   (3)| 00:00:01 |
--------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - filter("V1" IS NOT NULL)

==============
index_ffs hint
==============
-------------------------------------------------------------------------------
| Id  | Operation             | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------
|   0 | SELECT STATEMENT      |       |     1 |    11 |   450  (15)| 00:00:01 |
|   1 |  SORT AGGREGATE       |       |     1 |    11 |            |          |
|*  2 |   INDEX FAST FULL SCAN| T1_I1 |  1000K|    10M|   450  (15)| 00:00:01 |
-------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - filter("V1" IS NOT NULL)

Since the size of the index is less than the size of the table the cost of the index fast full scan will be lower than the cost of a tablescan; and since an index fast full scan does multiblock reads while the index full scan is assumed to use single block reads the cost of the index fast full scan will be lower than the cost of the index full scan.

Now we set the optimizer_index_cost_adj to 1 (as happens in a number of “legacy” commercial products) and repeat the experiment. Here are the three plans (reduced to just the body of the plan):

==================
OICA = 1, Unhinted
==================
--------------------------------------------------------------------------
| Id  | Operation        | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT |       |     1 |    11 |    32   (4)| 00:00:01 |
|   1 |  SORT AGGREGATE  |       |     1 |    11 |            |          |
|*  2 |   INDEX FULL SCAN| T1_I1 |  1000K|    10M|    32   (4)| 00:00:01 |
--------------------------------------------------------------------------

====================
OICA = 1, index hint
====================
--------------------------------------------------------------------------
| Id  | Operation        | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT |       |     1 |    11 |    32   (4)| 00:00:01 |
|   1 |  SORT AGGREGATE  |       |     1 |    11 |            |          |
|*  2 |   INDEX FULL SCAN| T1_I1 |  1000K|    10M|    32   (4)| 00:00:01 |
--------------------------------------------------------------------------

========================
OICA = 1, index_ffs hint
========================
-------------------------------------------------------------------------------
| Id  | Operation             | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------
|   0 | SELECT STATEMENT      |       |     1 |    11 |   450  (15)| 00:00:01 |
|   1 |  SORT AGGREGATE       |       |     1 |    11 |            |          |
|*  2 |   INDEX FAST FULL SCAN| T1_I1 |  1000K|    10M|   450  (15)| 00:00:01 |
-------------------------------------------------------------------------------

The default plan is now the index full scan, not the index fast full scan; and when we look at the two hinted plans we can see why. The full scan cost has been reduced to 1% of the original cost (as dictated by the setting for optimizer_index_cost_adj); the fast full scan cost has not been adjusted.

So is this a good thing or a bad thing? Different people may answer that question differently. There are clearly likely to be cases where changing the setting for the optimizer_index_cost_adj (which you shouldn’t have been doing anyway for the last 20 years or so) will result in some plans switching from fast full scans to full scans with unfortunate consequences;

On the other hand when you do a fast full scan it may do direct path reads, while the full scan will do cached reads, and the cache may be holding a lot of the index leaf blocks already, so some people might prefer the current behaviour;

On the other, other hand an Exadata system can use storage indexes when doing direct path segment scans (even when the segment is an index) so an Exadata fast full scan might be far more desirable than a partially cached full scan.

So no generic right answer, only specific answers for particular sets of circumstances.

Solutions

This issue was addressed some time in the 10g timeline- my notes say the earliest successful tests I did of the fix were on 10.2.0.4, and that it wasn’t available in 9.2.0.8. There are two possibilities – a fix control, and an event.

  • Fix control: 4483286 – descibed as “Discount FFS cost using optimizer_index_cost_adj”
  • Event: 38085 at level one. I don’t think this event has ever appeared in the oraus.msg file, but it is referenced by bug 4483286

Of the two, setting the fix control is probably the safer option since the event is (apparently) non-existent and there’s always the possibility that the number will be hi-jacked in a future release to do something completely different. It’s also worth noting that the fix control can be applied to an individual statement through the /*+ opt_param() */ hint (there’s an example of the syntax towards the end of another article of mine).

A little demonstration:


-- alter session set events '38085 trace name context forever, level 1';
alter session set "_fix_control"='4483286:1';
alter session set optimizer_index_cost_adj = 1;

set autotrace traceonly explain

prompt  ==============
prompt  Unhinted query
prompt  ==============

select
        count(*)
from
        t1
where
        v1 is not null
;

-------------------------------------------------------------------------------
| Id  | Operation             | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------
|   0 | SELECT STATEMENT      |       |     1 |    11 |     4   (0)| 00:00:01 |
|   1 |  SORT AGGREGATE       |       |     1 |    11 |            |          |
|*  2 |   INDEX FAST FULL SCAN| T1_I1 |  1000K|    10M|     4   (0)| 00:00:01 |
-------------------------------------------------------------------------------

With the event or _fix_control set the cost of the index fast full scan drops to 1%, and the path reappears without being hinted.

August 24, 2020

Flashback Bug

Filed under: 18c,Bugs,Oracle,redo,Troubleshooting — Jonathan Lewis @ 9:34 am BST Aug 24,2020

[Fixed by 19.11]

Here’s a problem with the “flashback versions” technology that showed up at the end of last week. There’s a thread about it on the Oracle Developer community forum, and a chain of tweets that was my initial response to a twitter alert about it that Daniel Stein posted.

The problem appears somewhere in the 18c timeline – it doesn’t seem to be present in 12.2.0.1 – so if you’re running any versions 18+ here’s a modified version of the test case supplied by Daniel Stein to demonstrate the issue.

rem
rem     Script:         flashback_bug.sql
rem     Author:         Jonathan Lewis / Daniel Stein
rem     Dated:          Aug 2020
rem
rem     Last tested 
rem             19.3.0.0        Wrong result
rem             12.2.0.1        Correct result
rem

create table t1 (n1 number(4)) 
segment creation immediate
;

prompt  =============================================================
prompt  Sleeping 10 seconds after create table to avoid subsequent
prompt  ORA-01466: unable to read data - table definition has changed
prompt  =============================================================

-- execute dbms_session.sleep(10)
execute dbms_lock.sleep(10)

prompt  =========
prompt  Start SCN
prompt  =========

column current_scn new_value scn_vorher2 
select to_char(current_scn,'9999999999999999') current_scn from V$DATABASE;

insert into t1(n1) values (1);
insert into t1(n1) values (2);
insert into t1(n1) values (3);
insert into t1(n1) values (4);
insert into t1(n1) values (5);
insert into t1(n1) values (6);
insert into t1(n1) values (7);
insert into t1(n1) values (8);
insert into t1(n1) values (9);
insert into t1(n1) values (10);
insert into t1(n1) values (11);
insert into t1(n1) values (12);

delete from t1 where n1 in (2, 5, 8, 11);

commit;

prompt  =======
prompt  End SCN
prompt  =======

column current_scn new_value scn_nachher
select to_char(current_scn,'9999999999999999') current_scn from V$DATABASE;

prompt  =============
Propmt  Version Query 
prompt  =============

column VERSIONS_STARTSCN        format 9999999999999999 heading "Start SCN"
column VERSIONS_ENDSCN          format 9999999999999999 heading "End SCN"
column VERSIONS_OPERATION       format A9               heading "Operation"
column SCN_RANGE                format A35              heading "SCN Range"

select 
        n1, rowid, 
        versions_operation, versions_startscn, versions_endscn,
        &scn_vorher2||' and '||&scn_nachher scn_range
from
        t1 versions between scn &scn_vorher2 and &scn_nachher
order by 
        rowid, versions_startscn, versions_operation, versions_endscn
;

prompt  ==================
prompt  Current Table data
prompt  ==================

select n1,rowid from t1;


alter system dump redo scn min &scn_vorher2 scn max &scn_nachher;

I’ve created a table then inserted a few rows and deleted some of them in the same transaction. I’ve captured the database SCN at the start and end of the transaction and then tried to run a “versions between” query across that range of SCNs. Here are the results from the “versions between” query and the final query of the current contents of the table on a test on a database running 19.3:

        N1 ROWID              Operation         Start SCN           End SCN SCN Range
---------- ------------------ --------- ----------------- ----------------- -----------------------------------
         1 AAAXFOAATAAAACDAAA I            12670408139684                   12670408139679 and 12670408139687
         3 AAAXFOAATAAAACDAAC I            12670408139684                   12670408139679 and 12670408139687
         4 AAAXFOAATAAAACDAAD I            12670408139684                   12670408139679 and 12670408139687
         5 AAAXFOAATAAAACDAAE I            12670408139684                   12670408139679 and 12670408139687
         6 AAAXFOAATAAAACDAAF I            12670408139684                   12670408139679 and 12670408139687
         7 AAAXFOAATAAAACDAAG I            12670408139684                   12670408139679 and 12670408139687
         8 AAAXFOAATAAAACDAAH I            12670408139684                   12670408139679 and 12670408139687
         9 AAAXFOAATAAAACDAAI I            12670408139684                   12670408139679 and 12670408139687
        10 AAAXFOAATAAAACDAAJ I            12670408139684                   12670408139679 and 12670408139687
        11 AAAXFOAATAAAACDAAK I            12670408139684                   12670408139679 and 12670408139687

10 rows selected.

        N1 ROWID
---------- ------------------
         1 AAAXFOAATAAAACDAAA
         3 AAAXFOAATAAAACDAAC
         4 AAAXFOAATAAAACDAAD
         6 AAAXFOAATAAAACDAAF
         7 AAAXFOAATAAAACDAAG
         9 AAAXFOAATAAAACDAAI
        10 AAAXFOAATAAAACDAAJ
        12 AAAXFOAATAAAACDAAL

8 rows selected.

There’s a trap in this demonstration. If you follow the link to the forum thread you’ll find that Daniel Stein says there’s a problem with Oracle 19.6 and upwards (but not 12.2) – then various other people test on their versions and find that 19.7, 18.5, and 19.3 are broken. That last one is what made me run the test on the 19.3 I had at hand – and I got the right result, unlike the results above.

If you look at the output from this test run you’ll see that the versioning query says that in the SCN range I’ve inserted 10 rows, and that rows 2 and 12 have never been inserted. Given the SQL I’d run I was expecting to see 16 rows in the output, 12 of them saying I’d inserted values 1 through 12, and then 4 saying I’d deleted 2, 5, 8 and 11 – but (when it works correctly) Oracle manages to combine the insert and delete in the same transaction to cancel the entries. So the versioning query ought to be showing exactly the 8 rows that appear in the current data query.

So I had two puzzles:

  • What’s gone wrong for everyone else ?
  • Why hasn’t it gone wrong for me ?

To answer the second question first: I took a look at my instance parameters and found that I had set statistics_level to all. Clearly this ought to have nothing to do with an error in flashback version queries, but I changed it to typical anyway and re-ran the test: and got the (now expected) wrong results. If you read the twitter thread you’ll find that Oracle support had had an SR opened on this problem for a few days but had been unable to reproduce the problem – perhaps they, too, had a system with statistics_level set to all.

Getting back to the first question: what’s gone wrong. I couldn’t think of a good reason why setting the statistics_level should make a difference, so I took the view that the versioning query is running through the redo log to find all the changes that took place in the range, so maybe the answer will appear in the redo. That’s why the last line in the demo above is a command to dump the redo generated between the two SCNs.

Short-circuiting the details of what I looked at I’m just going to show you the layer 11 (table) redo change vectors for my DML, which I extracted from the trace file using the command grep “OP:11” {filename}.

First the set when statistics_level = all:


CHANGE #3 CON_ID:3 TYP:0 CLS:1 AFN:19 DBA:0x04c00083 OBJ:94544 SCN:0x00000b860f2d3d1b SEQ:2 OP:11.2 ENC:0 RBL:0 FLG:0x0000
CHANGE #2 CON_ID:3 TYP:0 CLS:1 AFN:19 DBA:0x04c00083 OBJ:94544 SCN:0x00000b860f2d3d1b SEQ:3 OP:11.2 ENC:0 RBL:0 FLG:0x0000
CHANGE #2 CON_ID:3 TYP:0 CLS:1 AFN:19 DBA:0x04c00083 OBJ:94544 SCN:0x00000b860f2d3d1c SEQ:1 OP:11.2 ENC:0 RBL:0 FLG:0x0000
CHANGE #2 CON_ID:3 TYP:0 CLS:1 AFN:19 DBA:0x04c00083 OBJ:94544 SCN:0x00000b860f2d3d1c SEQ:2 OP:11.2 ENC:0 RBL:0 FLG:0x0000
CHANGE #2 CON_ID:3 TYP:0 CLS:1 AFN:19 DBA:0x04c00083 OBJ:94544 SCN:0x00000b860f2d3d1c SEQ:3 OP:11.2 ENC:0 RBL:0 FLG:0x0000
CHANGE #2 CON_ID:3 TYP:0 CLS:1 AFN:19 DBA:0x04c00083 OBJ:94544 SCN:0x00000b860f2d3d1c SEQ:4 OP:11.2 ENC:0 RBL:0 FLG:0x0000
CHANGE #2 CON_ID:3 TYP:0 CLS:1 AFN:19 DBA:0x04c00083 OBJ:94544 SCN:0x00000b860f2d3d1c SEQ:5 OP:11.2 ENC:0 RBL:0 FLG:0x0000
CHANGE #2 CON_ID:3 TYP:0 CLS:1 AFN:19 DBA:0x04c00083 OBJ:94544 SCN:0x00000b860f2d3d1c SEQ:6 OP:11.2 ENC:0 RBL:0 FLG:0x0000
CHANGE #2 CON_ID:3 TYP:0 CLS:1 AFN:19 DBA:0x04c00083 OBJ:94544 SCN:0x00000b860f2d3d1c SEQ:7 OP:11.2 ENC:0 RBL:0 FLG:0x0000
CHANGE #2 CON_ID:3 TYP:0 CLS:1 AFN:19 DBA:0x04c00083 OBJ:94544 SCN:0x00000b860f2d3d1c SEQ:8 OP:11.2 ENC:0 RBL:0 FLG:0x0000
CHANGE #2 CON_ID:3 TYP:0 CLS:1 AFN:19 DBA:0x04c00083 OBJ:94544 SCN:0x00000b860f2d3d1c SEQ:9 OP:11.2 ENC:0 RBL:0 FLG:0x0000
CHANGE #2 CON_ID:3 TYP:0 CLS:1 AFN:19 DBA:0x04c00083 OBJ:94544 SCN:0x00000b860f2d3d1c SEQ:10 OP:11.2 ENC:0 RBL:0 FLG:0x0000
CHANGE #3 CON_ID:3 TYP:2 CLS:1 AFN:10 DBA:0x01009c57 OBJ:720 SCN:0x00000b860f2d3cb8 SEQ:1 OP:11.2 ENC:0 RBL:0 FLG:0x0000
CHANGE #2 CON_ID:3 TYP:0 CLS:1 AFN:19 DBA:0x04c00083 OBJ:94544 SCN:0x00000b860f2d3d1c SEQ:11 OP:11.3 ENC:0 RBL:0 FLG:0x0000
CHANGE #2 CON_ID:3 TYP:0 CLS:1 AFN:19 DBA:0x04c00083 OBJ:94544 SCN:0x00000b860f2d3d1d SEQ:1 OP:11.3 ENC:0 RBL:0 FLG:0x0000
CHANGE #2 CON_ID:3 TYP:0 CLS:1 AFN:19 DBA:0x04c00083 OBJ:94544 SCN:0x00000b860f2d3d1d SEQ:2 OP:11.3 ENC:0 RBL:0 FLG:0x0000
CHANGE #2 CON_ID:3 TYP:0 CLS:1 AFN:19 DBA:0x04c00083 OBJ:94544 SCN:0x00000b860f2d3d1d SEQ:3 OP:11.3 ENC:0 RBL:0 FLG:0x0000

What you can see are 13 op codes “11.2” – which is “insert row piece”. Most of these are for OBJ 94554, which is the object_id for my table, and one of them if for OBJ 720, which is a table owned by sys called exp_head$ (which is something to do with “expression tracking”). After the inserts there are 4 op codes “11.3” which is “delete row piece”. So there’s no obvious error or complication in the redo.

And now when we set statistics_level to typical and re-run the script (which means we get a new OBJ number):

CHANGE #3 CON_ID:3 TYP:0 CLS:1 AFN:19 DBA:0x04c00083 OBJ:94545 SCN:0x00000b860f2d3d8a SEQ:2 OP:11.2 ENC:0 RBL:0 FLG:0x0000
CHANGE #2 CON_ID:3 TYP:0 CLS:1 AFN:19 DBA:0x04c00083 OBJ:94545 SCN:0x00000b860f2d3d8a SEQ:3 OP:11.2 ENC:0 RBL:0 FLG:0x0000
CHANGE #2 CON_ID:3 TYP:0 CLS:1 AFN:19 DBA:0x04c00083 OBJ:94545 SCN:0x00000b860f2d3d8b SEQ:1 OP:11.2 ENC:0 RBL:0 FLG:0x0000
CHANGE #2 CON_ID:3 TYP:0 CLS:1 AFN:19 DBA:0x04c00083 OBJ:94545 SCN:0x00000b860f2d3d8b SEQ:2 OP:11.2 ENC:0 RBL:0 FLG:0x0000
CHANGE #2 CON_ID:3 TYP:0 CLS:1 AFN:19 DBA:0x04c00083 OBJ:94545 SCN:0x00000b860f2d3d8b SEQ:3 OP:11.2 ENC:0 RBL:0 FLG:0x0000
CHANGE #2 CON_ID:3 TYP:0 CLS:1 AFN:19 DBA:0x04c00083 OBJ:94545 SCN:0x00000b860f2d3d8b SEQ:4 OP:11.2 ENC:0 RBL:0 FLG:0x0000
CHANGE #2 CON_ID:3 TYP:0 CLS:1 AFN:19 DBA:0x04c00083 OBJ:94545 SCN:0x00000b860f2d3d8b SEQ:5 OP:11.2 ENC:0 RBL:0 FLG:0x0000
CHANGE #2 CON_ID:3 TYP:0 CLS:1 AFN:19 DBA:0x04c00083 OBJ:94545 SCN:0x00000b860f2d3d8b SEQ:6 OP:11.2 ENC:0 RBL:0 FLG:0x0000
CHANGE #2 CON_ID:3 TYP:0 CLS:1 AFN:19 DBA:0x04c00083 OBJ:94545 SCN:0x00000b860f2d3d8b SEQ:7 OP:11.2 ENC:0 RBL:0 FLG:0x0000
CHANGE #2 CON_ID:3 TYP:0 CLS:1 AFN:19 DBA:0x04c00083 OBJ:94545 SCN:0x00000b860f2d3d8b SEQ:8 OP:11.2 ENC:0 RBL:0 FLG:0x0000
CHANGE #2 CON_ID:3 TYP:0 CLS:1 AFN:19 DBA:0x04c00083 OBJ:94545 SCN:0x00000b860f2d3d8b SEQ:9 OP:11.2 ENC:0 RBL:0 FLG:0x0000
CHANGE #2 CON_ID:3 TYP:0 CLS:1 AFN:19 DBA:0x04c00083 OBJ:94545 SCN:0x00000b860f2d3d8b SEQ:10 OP:11.2 ENC:0 RBL:0 FLG:0x0000
CHANGE #3 CON_ID:3 TYP:2 CLS:1 AFN:10 DBA:0x01009c57 OBJ:720 SCN:0x00000b860f2d3d1d SEQ:1 OP:11.2 ENC:0 RBL:0 FLG:0x0000
CHANGE #2 CON_ID:3 TYP:0 CLS:1 AFN:19 DBA:0x04c00083 OBJ:94545 SCN:0x00000b860f2d3d8b SEQ:11 OP:11.12 ENC:0 RBL:0 FLG:0x0000

In this case we get 13 “insert row piece” op codes followed by a single “11.12” op code. which is the “delete multiple rows” op code. So there IS a difference in the redo, and something about that difference is probably behind the error coming and going.

Taking a closer look at the 11.12 and one of the 11.3 redo change vectors, this is what we see:


CHANGE #2 CON_ID:3 TYP:0 CLS:1 AFN:19 DBA:0x04c00083 OBJ:94545 SCN:0x00000b860f2d3d8b SEQ:11 OP:11.12 ENC:0 RBL:0 FLG:0x0000
KTB Redo
op: 0x02  ver: 0x01
compat bit: 4 (post-11) padding: 1
op: C  uba: 0x04400a24.0672.3c
KDO Op code: QMD row dependencies Disabled
  xtype: XA flags: 0x00000000  bdba: 0x04c00083  hdba: 0x04c00082
itli: 1  ispac: 0  maxfr: 4858
tabn: 0 lock: 1 nrow: 4
slot[0]: 1
slot[1]: 4
slot[2]: 7
slot[3]: 10


CHANGE #2 CON_ID:3 TYP:0 CLS:1 AFN:19 DBA:0x04c00083 OBJ:94544 SCN:0x00000b860f2d3d1d SEQ:3 OP:11.3 ENC:0 RBL:0 FLG:0x0000
KTB Redo
op: 0x02  ver: 0x01
compat bit: 4 (post-11) padding: 1
op: C  uba: 0x04400467.0595.2c
KDO Op code: DRP row dependencies Disabled
  xtype: XA flags: 0x00000000  bdba: 0x04c00083  hdba: 0x04c00082
itli: 1  ispac: 0  maxfr: 4858
tabn: 0 slot: 10(0xa)

The 11.12 op code shows us a list of 4 entries in the block’s “row directory” for “table 0”. The 11.3 Op Code shows us just one – it’s the last of the four deletes so it’s reporting slot 10 in the row directory, the previous three 11.3 vectors were reporting slots 1, 4, and 7 respectively.

Obviously we can’t draw any firm conclusions about what’s going wrong, we simply have a few observations that might give us some ideas of the type of error; and after a few more experiments I decided that the code handling flashback versioning was mis-interpreting the “delete multiple rows” record. It consistently seemed to delete the first row identified by the slot[0] entry and then jumped straight to the last slot but add one to the value for the row directory index that it found there before attempting to apply the vector.

Footnote

I’ve used dbms_lock.sleep() to pause for 10 seconds after creating the table. The sleep() procedure was copied into the dbms_session package in recent versions of Oracle so that end-user code could access it without having to receive privileges on the rather more dangerous dbms_lock package.

I’ve referenced the statistics_level parameter as the one affecting the generation of the “delete multiple row” redo vector. In fact you can get the same effect by setting the hidden parameter “_rowsource_execution_statistics” to true. Setting statistics_level to all has the effect of enabling rowsource execution statistics so this isn’t surprising.

I pointed out that the flashback versioning code seemed to “add 1” to the final slot[n] identifier before producing the flashback output. Think about that that means if you change the demonstration to delete the LAST row in the list of rows inserted. Here’s what running the query did in a test where I changed 11 to 12 in my delete statement:


ERROR at line 2:
ORA-00600: internal error code, arguments: [ktrvtgr_2], [79691907], [0], [11], [], [], [], [], [], [], [], []

Looking at the 2nd and 3rd parameters of the ORA-00600 error:

  • 79691907 is the block number of the block where the table rows were stored.
  • 11 is the correct row directory index for the last row (12) in the table – but my hypothesis is that the flashback code was trying to find (non-existent) directory index 12 because it adds one to the supplied index entry.

Update

In the interval between completing the final draft yesterday and publishing the note this morning, Oracle support replied to the forum thread with a bug number (28910586 – not yet publicly visible) and a note that a patch could be available on request if an SR were raised giving the Oracle version and platform. It will be interesting to see whether the patch blocks the creation of the 11.12 op codes or whether it corrects the interpretation of 11.12 codes by the flashback versions code.

Update (May 2021)

In 19.11.0.0 with “statistics_level = typical” the redo still shows a single 11.12 op-code, but this is now being applied correctly and the results from the flashback query are correct. In passing it’s worth nothing that with “statistics_level = all” we still get four 11.3 records. (Some extra tests suggest that this variation in op codes isn’t connected with flashback per se; it seems to be related to the special case that you do a delete by in-list by tablescan, and has most visibility if there are no indexes on the table.)

 

Next Page »

Website Powered by WordPress.com.