When I found that set of slides, I also found the paper copies of another set of slides I had used at another UKOUG SIG event in 1995 where I had been explaining the mechanisms used by the cost based optimizer to decide whether or not to use an index. I thought it might be quite revealing to reproduce those slides to show how much (or how little) things have changed since then.
The original presentation had been done on an overhead projector – the type that uses transparent plastic sheets with ink printed onto them, and a hot lamp – and I didn’t have the original (Word 2.0) file that I had used to create the slides, so I’ve scanned the paper copies and have uploaded them as small jpg files.
Slide 3: descibes a scenario where all 5,000 rows will be in a compact section of the table because the nature of the query matches the pattern of the data arrival, and points out that the number of disc read requests will be small (compared to the tablescan) if you use the index
Slide 4: describes another viable scenario relevant to the same data set where the nature of the query is completely opposed to the pattern of the data arrive, and points out that the number of disc read requests will be large (compared to the tablescan) if you use the index.
The conclusion of the presentation – five slides in total (at least, that was all I’d kept, and the fifth was the conclusion from the above) – was that you couldn’t identify the better execution by considering just the volume of data … the data distribution mattered as well.
I suspect that this set of slides was based on Oracle 7.3 since 4KB isn’t all that common as a database block size nowadays, but things haven’t really changed much. We’ve got CPU costing (also known as System Statistics to add timing information based on the size of read requests, and to allow some time for CPU activity, but apart from that the optimizer is still interested in the same two things: how much data, and how randomly is it scattered.