Make sure you agree on the meaning of the jargon.
If you had to vote would you say that the expressions “more selective” and “higher selectivity” are different ways of expressing the same idea, or are they exact opposites of each other ? I think I can safely say that I have seen people waste a ludicrous amount of time arguing past each other and confusing each other because they didn’t clarify their terms (and one, or both, parties actually misunderstood the terms anyway).
Selectivity is a value between 0 and 1 that represents the fraction of data that will be selected – the higher the selectivity the more data you select.
If a test is “more selective” then it is a harsher, more stringent, test and returns less data (e.g. Oxford University is more selective than Rutland College of Further Education): more selective means lower selectivity.
If there’s any doubt when you’re in the middle of a discussion – drop the jargon and explain the intention.
If I ask: “When you say ‘more selective’ do you mean ….”
The one answer which is absolutely, definitely, unquestionably the wrong reply is: “No, I mean it’s more selective.”
I’ll be buying the tickets for my flight to Seattle and Kaleidoscope 14 some time tomorrow. The cut-off date on my credit card bill is today, so if I get the tickets tomorrow I won’t have to pay for them until the end of March.
When you know you have to pay it’s worth thinking about when you have to pay. It’s a principle that works in Oracle databases, too.
On the flip-side – sometimes you don’t realise that the clever thing you’ve done now is going to make someone else pay later.
From Mohamed Houri, here’s a French translation of all my “Philosophy” notes to date.
It’s important to revisit the questions you think you’ve answered from time to time. You may find that your previous answer was wrong or incomplete; you may find that looking at your past answers may give you ideas for new questions.
I had this thought while staring out of the window earlier on today. When I’m working at home I spend most of my time in a room that looks onto my back garden – and I have five different bird feeders in the garden and a pair of binoculars by my computer. Today I was watching some (Eurasian) Jays that tend to appear fairly promptly when I put out a handful of peanuts.
There’s clearly some sort of pecking order among these jays (and I think there are two different families), and one of the jays is clearly very aggressive and tends to frighten off the others, but a common behaviour pattern when two are down is that the less aggressive jay hops a few steps away from the more aggressive one and turns its back.
For years I’ve assumed that this is just a typical “underdog” behaviour – i.e. “I’m not a threat, I can’t attack, I’m not even looking at you” – but today it suddenly dawned on me that there was another possibility that simply hadn’t crossed my mind: if you’re a bird and thinking about running away you won’t want to take off towards your opponent, the best direction to point in is the direction that’s going to move you away from trouble as quickly as possible.
My point, of course, is that it’s easy to believe that you understand something simply because you’ve accepted a reasonable explanation – coming back to the issue some time later may allow you to come up with other ideas, whether or not those ideas arise by you deliberately questioning your belief, or by an accident of intuition.
Footnote: If this was an example of Oracle behaviour I’d be doing some serious research on it by now; but my birdwatching is only for casual pleasure, so I’m not going to start trawling the internet for theses on Jay behaviour.
We’ve reached that time of year (Autumn, or Fall if you prefer the American term) when I’m reminded that tending a garden is like tending an Oracle database.
This is a picture of the oak tree on my front lawn, taken about 4 hours ago. Looking at it now it shows hardly any sign of the coming winter and little of the colour that let’s you know it’s preparing to drop its huge volume of leaves, but yesterday morning I spent the best part of an hour raking up leaves that had dropped over the course of the previous week.
Over the next six weeks, I’ll be out be out with my leaf rake every few days to clean up the mess – and I’ll look down at the mess that’s on the ground, then look up at the mess that’s waiting to join it, and then I’ll do just enough work to make the lawn look just good enough to keep my wife happy for a few more days until I get sent out to do it all over again.
You can spot the analogy, of course – it’s important to think about how much effort it’s worth spending to get to an end result which is good enough for long enough. There’s no point in spending a huge amount of effort getting a fantastic result that is going to be obliterated almost immediately by the next problem that gets dumped in your lap. When the tree is nearly bare, I’ll do a thorough job of clearing the leaves, until then, 95% is easy enough, and good enough.
Footnote: avid arboriculturalists might wonder why the tree is lop-sided – being a little light on the side towards the road – it’s the sort of thing that happens when a tree gets hit by a lorry.
A question I asked myself recently was this:
Which is the worst offence when publishing an article about some feature of Oracle:
Saying something does work when it doesn’t
Saying something doesn’t work when it does
Saying something does work when in some cases it doesn’t.
Saying something doesn’t work when in some cases it does.
I don’t think it’s an easy question to answer and, of course, it’s not made any easier when you start to consider the number of cases for which a feature does or doesn’t work (how many cases is “some cases”), and the frequency with which different cases are likely to appear.
You need to understand the application and its data.
A recent request on OTN was for advice on making this piece of SQL run faster:
delete from toc_node_rel rel
where not exists (
from toc_rel_meta meta
where rel.rel_id = meta.rel_id
I couldn’t help laughing when I saw this.
Here’s a wonderful lesson from Cary Millsap – be very careful if you ever want to sell him anything – that reminded me of a Powerpoint slide I had produced for a presentation a few years ago. It took me a little time to track it down but I finally found the slide, reproduced below, in a presentation called: “The Burden of Proof” that I had given for the Ann Arbor Oracle User Group in 2002. (The picture of the Earth is the Apollo 17 image from NASA):
Here’s a quote that says it all:
Dr Joseph Lykken of Fermilab – in response to some (negative) results from the Large Hadron Collider that suggest the simplest form of SuperSymmetry is wrong:
“It [supersymmetry] is a beautiful idea. It explains dark matter, it explains the Higgs boson, it explains some aspects of cosmology; but that doesn’t mean it’s right.”
Mind you, Feynmann got there years ago:
“It doesn’t matter how beautiful your theory is, it doesn’t matter how smart you are. If it doesn’t agree with experiment, it’s wrong.”
If you run a query that is supposed to return one row from a large table, and there’s a suitable index in place you would probably expect the optimizer to identify and use the index. If you change the query to return all the data (without sorting) from the table you would probably expect the optimizer to choose a full tablescan.
This leads to a very simple idea that is often overlooked:
Sometimes it takes just one extra row to switch a plan from an indexed access to a full tablescan.
There has to be a point in our thought experiment where the optimizer changes from the “one row” indexed access to the “all the rows” tablescan.
If you’re lucky and the optimizer’s model is perfect there won’t be any significant difference in performance, of course. But we aren’t often that lucky, which is why people end up asking the question: “How come the plan suddenly went bad, nothing changed … except for a little bit of extra data?” All is takes is one row (that the optimizer knows about) to change from one plan to another – and sometimes the optimizer works out the wrong moment for making the change.
“There is no space problem.”
If you saw this comment in the middle of a thread about some vaguely described Oracle problem, which of the following would you think was the intended meaning:
There is a problem – we have no space.
We do not have a problem with space
Wouldn’t it make life so much easier to choose between:
We are not seeing any Oracle errors.
We are seeing Oracle error: “ORA-01653: unable to extend table X by N in tablespace Z”
(That’s just one of many possible space-related errors, of course.)
Paraphrasing Yogi Berra:
“It ain’t committed until it’s committed.”
If you’re wondering why it’s worth remembering this odd comment – it addresses the (commonly asked) question:
“does the redo log contain uncommitted data as well as committed data?”
The answer is: yes.
When a session is creating redo change vectors it doesn’t know whether it is going to commit or rollback. But a session has to be able to store an arbitrarily large list of change vectors somewhere, and that list has to appear in the redo log (ideally “instantly”) if the session commits – so Oracle avoids delays on commit by putting the change vectors into the redo log as they are created***.
If you view the question from the opposite extreme, the recovery mechanism has to be able to deal with uncommitted data anyway because there are, after all, several scenarios where data that definitely was committed cannot be recovered; for example, recovery until end of log file 9998 because log file 9999 was destroyed and simply doesn’t exist – how can the code handle transactions that were not committed until part way through file 9999 if it only knows how to handle committed transactions ?)
*** Not strictly true from 10g onwards where Oracle introduced a delaying effect aimed at reducing competition for the redo allocation and redo copy latches for “small” transactions.
[The Philosophy Series]
If you see a comment like “X is a bad idea” this does not mean “some mechanism that is vaguely ‘not X’ is a good idea”.
If, for example, I say:
“Histograms will not work well on character strings that are more than 32 bytes long and generally similar in the first 32 bytes”
that is absolutely not the same as saying
“It’s a good idea to create histograms on character strings that are less than 32 bytes long.”
If this were a purely mathematical world we could invoke symbolic logic and point out:
(A => B) <=> (¬B => ¬A)
which means my statement is equivalent to:
if you have a histogram that is working well then the data is not character strings of more than 32 bytes with generally similar values in the first 32 bytes”
Of course, being Oracle, you may find that someone, somewhere, has exactly such a histogram that appears to work brilliantly for them – but that will be because the optimizer has messed up the arithmetic so much that they are getting a great execution plan for completely the wrong reason … so they need to watch out for the next upgrade or patch release in case the optimizer gets enhanced.
[The Philsophy Series]
Here’s a useful description I heard recently from philosopher Daniel Dennett:
The canons of good spin:
- It is not a bare-faced lie
- You have to be able to say it with a straight face
- It has to relieve skepticism without arousing curiosity
- It should seem profound
It seems to describe a lot of the stuff that our industry publishes on the internet.
[The Philsophy Series]