When you specify that a column used in an index should be a “descending” column Oracle uses the internal, undocumented, sys_op_descend() function to derive a value to store for that column in the index leaf block.
For many years I’ve claimed that this function simply takes the one’s-complement of a character value and appends 0xff to the result before storing it. This is nothing like the whole story and in a recent comment to a note on descending indexes and finding max() values I was given another part of the story in response to my confession that I did not know why a varchar2(128) had turned into a raw(193) – it’s the length not the rawness that puzzled me – when subject to sys_op_descend().
Here’s a little script to generate some data that helps to explain what’s going on.
rem
rem Script: sys_op_descend.sql
rem Author: Jonathan Lewis
rem Dated: Nov 2023
rem Purpose:
rem
rem Last tested
rem 19.11.0.0
rem
create table t1 (
v1 varchar2(10),
nv1 nvarchar2(10)
)
/
insert into t1 values('A','A');
insert into t1 values('AB','AB');
insert into t1 values('ABC','ABC');
commit;
create table t2 (
v1 varchar2(10),
nv1 nvarchar2(10)
)
/
insert into t2 values(chr(0), chr(0));
insert into t2 values(chr(1), chr(1));
insert into t2 values(chr(2), chr(2));
insert into t2 values(chr(3), chr(3));
insert into t2 values(chr(4), chr(4));
commit;
;
commit;
There’s nothing particularly special about the two tables and data I’ve inserted, I’m just trying to generate patterns that make it easy to see what’s going on; and I’ll be comparing a varchar2() and an nvarchar2() because my nvarchar2() column is a multi-byte character set.
Let’s look at the (nearly raw) data from t1 where I’ve stored growing strings of printable characters. Here’s a simple query and its result set:
set linesize 50
break on row
select
dump(v1,16) v1,
dump(sys_op_descend(v1), 16) v1_desc,
dump(nv1,16) nv1,
dump(sys_op_descend(nv1), 16) nv1_desc
from
t1
/
V1
--------------------------------------------------
V1_DESC
--------------------------------------------------
NV1
--------------------------------------------------
NV1_DESC
--------------------------------------------------
Typ=1 Len=1: 41
Typ=23 Len=2: be,ff
Typ=1 Len=2: 0,41
Typ=23 Len=4: fe,fb,be,ff
Typ=1 Len=2: 41,42
Typ=23 Len=3: be,bd,ff
Typ=1 Len=4: 0,41,0,42
Typ=23 Len=7: fe,fb,be,fe,fb,bd,ff
Typ=1 Len=3: 41,42,43
Typ=23 Len=4: be,bd,bc,ff
Typ=1 Len=6: 0,41,0,42,0,43
Typ=23 Len=10: fe,fb,be,fe,fb,bd,fe,fb,bc,ff
If you start with the first two columns of the output rows you can see: ‘A’ == 0x41, which becomes 0xbe, 0xff, following the “one’s complement with an appended 0xff” rule. The same pattern is visible for ‘AB’ and ‘ABC’.
When you look at the 3rd and 4th columns of each row (the nvarchar2), ‘A’ is now a 2-byte value (0x00, 0x41) which turns into the four bytes: 0xfe, 0xfb, 0xbe, 0xff. The value 0xbe is recognisable as the one’s-complement of 0x41 that appeared for the varchar2() values – but the 0x00 in the original nvarchar2() value seems to have turned into a two-byte 0xfe, 0xfb rather than the “expected” 0xff.
Why doesn’t Oracle use the “right” one’s complement for zero? Maybe because 0xff is the byte that’s supposed to mark the end of the string, and it’s important to avoid the risk of ambiguity. But now we have a new problem: Oracle is using 0xfe as the first of two bytes to represent the “descending” zero, and 0xfe is the one’s-complement of 0x01. So how is Oracle working around the fact that it would be a bad idea to have two possible meanings for the value 0xfe? That’s where the second data set comes in; here’s the same query, with results, run against the t2 table:
select
dump(v1,16) v1,
dump(sys_op_descend(v1), 16) v1_desc,
dump(nv1,16) nv1,
dump(sys_op_descend(nv1), 16) nv1_desc
from
t2
/
V1
--------------------------------------------------
V1_DESC
--------------------------------------------------
NV1
--------------------------------------------------
NV1_DESC
--------------------------------------------------
Typ=1 Len=1: 0
Typ=23 Len=3: fe,fe,ff
Typ=1 Len=2: 0,0
Typ=23 Len=3: fe,fd,ff
Typ=1 Len=1: 1
Typ=23 Len=3: fe,fa,ff
Typ=1 Len=2: 0,1
Typ=23 Len=3: fe,fc,ff
Typ=1 Len=1: 2
Typ=23 Len=2: fd,ff
Typ=1 Len=2: 0,2
Typ=23 Len=4: fe,fb,fd,ff
Typ=1 Len=1: 3
Typ=23 Len=2: fc,ff
Typ=1 Len=2: 0,3
Typ=23 Len=4: fe,fb,fc,ff
Typ=1 Len=1: 4
Typ=23 Len=2: fb,ff
Typ=1 Len=2: 0,4
Typ=23 Len=4: fe,fb,fb,ff
Looking at the last three groups of 4 lines we can see the varchar2() column following the “one’s complement, append 0xff” rule and the nvarchar2() following the additional “use 0xfe 0xfb for zero” rule; but for chr(0) and chr(1) the dumps need some further untangling. With the tests so far all we can say with any confidence is that “if you see the 0xfe byte then a different pattern is about to appear briefly”.
I don’t really need to do any more experiments to guess why, in my previous note, the descending varchar2(128) was projected as raw(193) (though you might want to play around to see what happens with strings of several consecutives zeros or ones). I suppose the size reported could vary with character set, but if I have a two-byte fixed width character set and most of the data corresponds to basic ASCII characters then I’ll have a lot of characters where every other byte is a zero that encodes into two bytes when descending – so it makes sense to use for the projection a size derived as: 64 * 1 + 64 * 2 + 1 (0xff) = 193. Q.E.D.
Index sizing
An important side effect of this improved understanding is the realisation of what can happen to the size of an index when declared descending. For a simple demonstration, here’s a table with 4 columns and 50,000 rows copied from all_objects.object_name, using the nvarchar2() type for one of the pairs of copies.
create table t1a (
nva nvarchar2(128),
nvd nvarchar2(128),
va varchar2(128),
vd varchar2(128)
)
/
insert into t1a
select object_name, object_name, object_name, object_name
from all_objects
where
rownum <= 50000
;
create index t1a_nva on t1a(nva);
create index t1a_nvd on t1a(nvd desc);
create index t1a_va on t1a(va);
create index t1a_vd on t1a(vd desc);
execute dbms_stats.gather_table_stats(user,'t1a')
select index_name, leaf_blocks
from
user_indexes
where
table_name = 'T1A'
order by
index_name
/
select
column_name, avg_col_len
from user_tab_cols
where table_name = 'T1A'
order by
column_name
/
INDEX_NAME LEAF_BLOCKS
-------------------- -----------
T1A_NVA 590
T1A_NVD 854
T1A_VA 336
T1A_VD 343
4 rows selected.
COLUMN_NAME AVG_COL_LEN
-------------------- -----------
NVA 74
NVD 74
SYS_NC00005$ 111
SYS_NC00006$ 39
VA 38
VD 38
6 rows selected.
As you can see, the descending varchar2() index (backed by column sys_nc0006$) is only slightly larger than the corresponding ascending index, but the descending nvarchar2() (backed by column sys_nc00005$) has increased by something much closer to 50% in size because half the bytes in each object_name were zeroes that have been replaced by the two byte 0xfe 0xfb. This is much worse than the “one byte per descending column per row” that I’ve been claiming for the last 20 or 30 years.
[…] the index one byte per column larger for every row in the index [Update 2023: it’s actually worse than that, especially for multi-byte charactersets] – while introducing the opportunity of hitting […]
Pingback by Descending bug | Oracle Scratchpad — November 27, 2023 @ 9:59 am GMT Nov 27,2023 |
[…] sys_op_descend() (November 2023): Expanded note on the effect of descending columns on index size […]
Pingback by Indexing Catalogue | Oracle Scratchpad — November 27, 2023 @ 10:09 am GMT Nov 27,2023 |
[…] sys_op_descend() (November 2023): Expanded note on the effect of descending columns on index size […]
Pingback by Infrastructure Catalogue | Oracle Scratchpad — November 27, 2023 @ 10:10 am GMT Nov 27,2023 |