Here’s another of my “draft” notes that needs some exapansion and, most importantly, proof.
I have a fact table with a status id column that shows a massive skew. But I also have a dimension table that holds the “status code” so (in theory, at least) I have to do a join from the statuses table to the facts table to find rows of a given status. Unfortunately the join hides the skew:
Oracle 12c has increased the maximum length of character-based columns to 32K bytes – don’t get too excited, they’re stored out of lines (so similar in cost to LOBs) and need some modification to the parameter file and data dictionary (starting the database in upgrade mode) before you can use them.
Richard Foote has a pair of articles on indexing such columns:
Be cautious about enabling this option and test carefully – there are going to be a number of side effects, and some of them may require a significant investment in time to resolve. The first one that came to my mind was that if you’ve created a function-based index on a pl/sql function that returns a varchar2() type and haven’t explicitly created the index on a substr() of the return value then the data type of the function’s return value will change from the current default of varchar2(4000) to varchar2(32767) – which means the index will become invalid and can’t be rebuilt or recreated.
Obviously you can redefine the index to include an explicit substr() call – but then you have to find all the code that was supposed to use the index and modify it accordingly.
Here’s one of those little details which I would have said just couldn’t be true – except it’s in the manuals, and the manuals happen to be right.
I wrote a note about the 12c “In-Memory” option some time ago on the OTN Database forum and thought I’d posted a link to it from the blog. If I have I can’t find it now so, to avoid losing it, here’s a copy of the comments I made:
It probably won’t surprise many people to hear me say that the decode() function can be a bit of a nuisance; and I’ll bet that quite a lot of people have had trouble occasionally trying to get function-based indexes that use this function to behave properly. So (to put it all together and support the general directives that case is probably a better choice than decode() and that the cast() operator is an important thing to learn) here’s an example of how function-based indexes don’t always allow you to work around bad design/code. (Note: this is a model of a problem I picked up at a client site, stripped to a minimum – you have to pretend that I’m not allowed to fix the problem by changing code).
A recent question on the Oracle-L list server described a problem with data coming in from SQL Server and an oddity with referential integrity failing on Oracle because (for example) a child row was in lower case while the parent was in upper.
This raised a few comments on how you might handle referential integrity while allowed case to differ. No doubt it’s been done before – by Tom Kyte if no-one else – but the first thought that crossed my mind was to use virtual columns:
Here’s a little note that came about after I tweeted an idle thought on Twitter yesterday
- 12c allows you to have multiple indexes on the same columns on a table, although only one of them is allowed to be visible at any one time – you can do the same with any recent versions of Oracle “almost”, and without the invisibility requirements. (Thanks to Jason Bucata for suggesting the critical detail on this one.)
- 12c allows you to have “partial” indexing on partitioned tables – you can do the same with earlier versions of Oracle “almost” but only if the indexes are local indexes or globally partitioned.
- 12c doesn’t officially allow you to create an index that is a bitmap in the past and a btree in the present (yet) – although you can almost do this in any recent versions of Oracle.
The clustering_factor is one of the most important numbers (if not the most important number) affecting the optimizer’s choice of execution plan – it’s the thing that has the most significant effect on the optimizer’s decision on whether to choose a table scan or an index, and on which index to choose.
One of the sad things about trying to keep on top of Oracle is that there are so many little things that could go wrong and take a long time to identify. In part this is why I try to accumulate test cases for all the oddities and anomalies I come across as I travel around the world – if I’ve spent the time recreating a problem I’ll probably remember it the next time I see the symptoms.
Just one of those little snippets about 12c that might help someone.
Further to an earlier post, online rebuild works in 12c even when the key is “too long”. The internal code has changed completely, and there is no sign of the problematic journal table that caused the problem in earlier versions.
In my last post I made a comment about how the optimizer will use the new format of the index hint to identify an index that is an exact match if it can, and any index that starts with the same columns (in the right order) if it can’t find an exact match. It’s fairly easy to demonstrate the behaviour in 11g by examining the 10053 (CBO) trace file generated by a simple, single table, query – in fact, this is probably a case that Doug Burns might want to cite as an example of how, sometimes, the 10053 is easy to interpret (in little patches):
I’ll probably have to file this one under “Optimizer ignoring hints” – except that it should also go under “bugs”, and that’s one of the get-out clauses I use in my “hints are not hints” argument.
Sometimes an invisible index isn’t completely invisible.
Cost Based Oracle – Fundamentals (November 2005)
But the most interesting function for our purposes is sys_op_countchg(). Judging from its name, this function is probably counting changes, and the first input parameter is the block ID portion (object_id, relative file number, and block number) of the table’s rowid, so the function is clearly matching our notional description of how the clustering_factor is calculated. But what is that 1 we see as the second parameter?
When I first understood how the clustering_factor was defined, I soon realized that its biggest flaw was that Oracle wasn’t remembering recent history as it walked the index; it only remembered the previous table block so that it could check whether the latest row was in the same table block as last time or in a new table block. So when I saw this function, my first guess (or hope) was that the second parameter was a method of telling Oracle to remember a list of previous block visits as it walked the index.
And finally, Oracle Corp. had implemented an official interface to the second parameter of sys_op_countchg() – provided you install the right patch – through a new table (or schema, or database) preference type available to the dbms_stats.set_table_prefs() procedure.
I don’t think this is likely to happen on a production system (until 12c) – but look what you can do if you try hard enough:
2 index_name, column_name from user_ind_columns
4 table_name = 'T1'
5 order by
6* index_name , column_position
4 rows selected.
That’s a straight cut-n-paste from an Oracle 22.214.171.124 SQL*Plus session. (You can tell I typed it in real time because I missed the return before the FROM, and couldn’t be bothered to go back and do it again ;) )
This is the text of an article I published in the UKOUG magazine a few years ago, but it caught my eye while I was browsing through my knowledge base recently, and it’s still relevant. I haven’t altered the original apart from adding a couple of very brief comments in brackets [Ed: like this].