Oracle Scratchpad

July 30, 2009

Philosophy – 3

Filed under: Philosophy — Jonathan Lewis @ 7:37 pm BST Jul 30,2009

The performance of a query should be related to the size of the data set you’re interested in, not to the size of the database.

If this is not the case then you’ve made a mistake in the physical implemenation. Typically this will be poor indexing, but it could be due to errors in other structural choices such as partitioning, clustering, IOTs, or materialized views.

The primary symptom of this error is the complaint: “as time went by the database got slower”.

[The Philosophy Series]

7 Comments »

  1. Jonathan:

    >>”The performance of a query should be related to the size of the data set you’re
    interested in, not to the size of the database.”

    Just a thought: and what about the distribution/demographics?

    Cheers.

    Carlos.

    Comment by carlosal — July 31, 2009 @ 7:07 am BST Jul 31,2009 | Reply

    • Carlos,
      Working out how to handle the “distribution/demographics” is a very important part of the physical implementation. If your performance slows down with the size of the database it’s probably that aspect of the physical implementation that you haven’t catered for properly.

      Comment by Jonathan Lewis — August 3, 2009 @ 9:15 pm BST Aug 3,2009 | Reply

  2. I would add “parallel query” and “compression” to the list of structural choices.

    If your query really needs to access lots of data and that data gets more with the time, then compressing the data and reading it parallel might be your way to scale.

    Comment by Todor Botev — August 3, 2009 @ 11:58 am BST Aug 3,2009 | Reply

    • If the volume of data that you need to access increases with time, then you’re talking about a different problem – and things like paralel query and compression are only temporary solutions (or temporary adjustments to performance). In that circumstance you could equally well say that buying a bigger machine is an option to add to the list of structural choices. (And I mean that quite seriously – I have been involved with a couple systems where the design included specification for the rate at which new hardware would have to be added each year to keep critical response times flat.)

      A different view point, though, for many systems where the “interesting” data volumes are growing is – what about materialized views, or simple summary tables, or even summary rows ?

      Many of the performance problems I see start on day one, because no-one says: “how will this design respond to this query in five years time”. Yet there are many ways in which simple structural feature can ensure that the work the database has to do in five years time is no more than the work it has to do after the first few weeks.

      Comment by Jonathan Lewis — August 3, 2009 @ 9:28 pm BST Aug 3,2009 | Reply

      • I agree with you.

        But how would you “structurally” approach the following: “give me the summary amount, aggregated by month, of all transactions with amount bigger than X” where X can be supplied ad hoc.

        – you need to access every single transaction to check the amount – hence a summary table would not help

        – you could think of range partitioning by amount – but if X is small enough you still would need to access most of the data

        Comment by Todor Botev — August 4, 2009 @ 8:43 pm BST Aug 4,2009 | Reply

        • Todor,

          Remember that my comment was that your performance should be related to the size of the data you’re interested in. In your example your problem arises when you’re interested in (almost) all of the data.

          Your idea of range partitioning by amount would be a relevant suggestion if someone made this a requirement of their system.

          One refinement might be summary tables on (amount, month, sum(amount by month)) range-list partitioned by month; another approach would obviously be to explain the workload associated with the requirement as stated and propose an alternative that used a small set of “amount ranges” rather than one row per amount per month.

          Comment by Jonathan Lewis — August 6, 2009 @ 5:41 pm BST Aug 6,2009

    • Todor,

      An additional thought to clarify the distinction I wanted to make.

      Parallel Query and Compression allow you to handle the same data volume and data distribtion faster – but the correct structural approach allows you to get the results by handling a smaller volume of data and/or a better distribution of data.

      Comment by Jonathan Lewis — August 7, 2009 @ 7:21 am BST Aug 7,2009 | Reply


RSS feed for comments on this post.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

The Rubric Theme. Blog at WordPress.com.

Follow

Get every new post delivered to your Inbox.

Join 3,873 other followers