Oracle Scratchpad

September 17, 2010

SQL Server 5

Filed under: SQL Server — Jonathan Lewis @ 6:24 pm GMT Sep 17,2010

SimpleTalk have just published another of my SQL Server articles – which looks at the way that SQL Server stores data in “clustered indexes”, and the variation that appears depending on the pattern of data.


  1. This question has been bothering me for quite sometime. I couldn’t understand why there would be such a fundamental difference between the two platforms when they both use b-trees and they both use heaps.

    Have you seen this paper?

    I would say that the SQL Server community standard for a rigorous approach to experimenting on an RDBMS is quite different from that of the bulk of the Oracle community; informally lead by you, Tom, Tanel, Richard Foote, Cary Milsap, et al.

    Could you please comment on this paper? It seems that they’ve drawn one very narrow case and have assumed sweeping implications from it.

    My critiques would include:

    They’ve measured impact in terms of wall-time, instead of resources.
    They’ve used a PK that is a considerable portion of the total row size- not atypical but hardly the nominal case.

    That violates Microsoft’s own recommendation that the cluster key be an increasing number.

    They don’t consider the impact to secondary indexes. I’d say to be fair you’d have to show at least 2 second indexes to have a representative case.

    I could go on

    Maybe you could use this as the basis for your future research.

    NOTE [by jpl]: I recreated this comment from an email from Mark Brady, following an accident with his original comments that were originally marked as spam by Akismet, then accidentally deleted by me when I was checking and clearing the spam queue.

    Comment by Mark Brady — September 23, 2010 @ 8:11 am GMT Sep 23,2010 | Reply

  2. Mark,

    I’ve started reading the paper, and one of the early comments it makes is:

    “The behavior of a table without any indexes, or with two or more indexes is outside the scope of this paper.”

    So they do declare that their example is a special case . However, when they get to the closing “Recommendations” paragraph, they don’t, in mu opinion, put enough emphasis on the fact that their test is based on a model that probably doesn’t match the important bits of most real systems.

    They the use a test case for which it’s fairly obvious that a clustered index (or index-organized table) is probably going to be an obvious choice. It’s the classic “stock market quotes” table – which I wrote about in Practical Oracle 8i – the data arrives by day for all possible stocks from all possible markets and then you want to query it by product over a date range. The key is a large fraction of the row, and the order in which the data arrives is completely different from the order in which you want to query it.

    A point I made in my article was that the growth over time and pattern of leaf page splits of a clustered index seems to be affected by the level of repetition of the leading edge of the key – and they don’t make any comment about the number of distinct org_key and prod_key values.

    (We can infer from a later comment that each (org_key, prod_key) caters for 76 rows.)

    For their insert test they don’t tell us how the insert is ordered (but see delete test below) – is it completely randomised data with randomised time-key values, or do we get some sort of “time-key increasing” order to it, or is is even order by org_key, prod_key with constant time-key (as it would, possibly be for the stock-quotes problem – and in a real stock-quotes problem I’d hope to see array inserts anyway). The fact that they do a leaf block split (on the clustered index) every 89 rows when the precreated index held 89 rows per leaf page also makes me wonder whether they were inserting data above the high value all the time.

    Also, we note that the row inserted is only “key plus update stamp” and leaves the non-key values null (which is very helpful if you want to demonstrate the benefits of a clustered index and minimise the number of leaf page splits).

    The update test updates a single column without changing the row length. We probably don’t expect to see any row updates in the stock-quotes table, but in most systems, updating a row often means extending it’s length … and SQL Servers handling of free space varies enormously between heap tables and clustered indexes, and also between single row inserts and array inserts.

    The range based select: (col1 in (x,y,x) and col2 = constant) – but we get told that x, y, x and consecutive (which rather helps the clustered index).

    The delete test describes how they first insert, then delete millions of rows. The insert portion of the test shows that they are inserting the data in groups (one row at a time) of 1,000 rows that are ordered to match the index definition – a fairly uncommon pattern, but one that is friendly to both the heap table and the clustered index, though more favourable to the clustered index given the particular row definition they’ve chosen.

    The delete command used was: DELETE TOP (2000000) FROM Tab1;
    I’d have to check, but this probably deleted different rows from the two tables – the text says it deletes “random” rows – I SUSPECT that the clustered index deletes rows in leaf-page order, and the heap table deletes rows in the order of entry. This could make a big difference to the number of empty (hence reusable) pages produced by the delete – largely because the B-tree on the heap table might produce NO empty pages.

    Conclusion – the test case is sufficiently special that I wouldn’t trust it as a basis for any general conclusion that clustered indexes are more appropriate than heap tables.

    Comment by Jonathan Lewis — September 25, 2010 @ 10:22 am GMT Sep 25,2010 | Reply

  3. Thanks, I’m delighted that you spent the time on this review. I came to pretty much the same conclusions. A very specific case used to prove that every table in your database should be an IOT. I don’t understand how such an unscientific paper makes its way into the MSDN. Where is the SQL Server JL!?

    One MS recommended best practice for choosing a clustered index is to pick a constantly increasing value. This would avoid page split since each new row would be at the ‘end’ of the index. But wouldn’t that basically put an end to concurrent inserts? As you point out in your criticism of the Insert test, an ordered set will perform differently/better than random values. But if I have two sessions insert rows which will update different leaf blocks, can that happen simultaneously? But if the two update the same leaf block, that access will be serialized, correct?

    Comment by Mark Brady — February 2, 2011 @ 4:13 pm GMT Feb 2,2011 | Reply

RSS feed for comments on this post. TrackBack URI

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Powered by