Clarity, Clarity, Clarity

November 2, 2006

Clarity, Clarity, Clarity

Filed under: Oracle,Performance,Tuning — Jonathan Lewis @ 5:58 pm GMT Nov 2,2006

This piece of SQL was recently quoted on the newsgroup comp.databases.oracle.server with the explanation that it had been downloaded from Metalink some time ago. The purpose of the SQL is to give some indication of how much current undo each session is holding in which undo segment and which objects the session is changing. But how long does it take you to read and understand (a) what it does and (b) whether or not it’s right ?

select
        substr(a.os_user_name,1,8)      "OS User",
        substr(a.oracle_username,1,8)   "DB User",
        substr(b.owner,1,8)             "Schema",
        substr(b.object_name,1,20)      "Object Name",
        substr(b.object_type,1,10)      "Type",
        substr(c.segment_name,1,5)      "RBS",
        substr(d.used_urec,1,12)        "# of Records"
from
        v$locked_object      a,
        dba_objects          b,
        dba_rollback_segs    c,
        v$transaction        d,
        v$session            e
where
        a.object_id = b.object_id
and     a.xidusn    = c.segment_id
and     a.xidusn    = d.xidusn
and     a.xidslot   = d.xidslot
and     d.addr      = e.taddr
;

The problem I have with this is that I don’t see any clue about the author’s intentions; and the aliasing in the from clause, combined with the predicate arrangement doesn’t make it easy to work out what the joins are doing. In part, of course, this is because the author is probably following a convention that I didn’t invent. But there are two details which I think stand out unarguably as ‘poor taste’.

First – aliases like ‘a’, ‘b’, … do not help. I always advise a documented aliasing strategy of three-letter aliases (or four letters for systems with lots of tables) that give some sort of clue about the underlying table name. When a table is used multiple times in a single query, then its alias should have a number appended to it to make each occurrence unique across the query.

Second – when I write predicates like “where colX = 99” the search is against colX, the value 99 is a known input. The same logic should apply to “abc.colX = pqr.colY”. To me this predicate suggests that the query has already reached table (alias) pqr at this point so that colY is a known value, and abc.colX is the thing I am searching.

Third – (and I believe the original author has followed this convention, which is why I cite only two details as poor taste above) I list the table names in the from clause in the order that I think the (cost-based) optimizer ought to visit them. The same ordering pattern appears in the where clause – specifically on the left-hand side of the list of predicates.

So, applying the first two rules to the original query, you get:

select
        substr(lkd.os_user_name,1,8)     "OS User",
        substr(lkd.oracle_username,1,8)  "DB User",
        substr(obj.owner,1,8)            "Schema",
        substr(obj.object_name,1,20      "Object Name",
        substr(obj.object_type,1,10)     "Type",
        substr(rbs.segment_name,1,5)     "RBS",
        substr(trn.used_urec,1,12)       "# of Records"
from
        v$locked_object       lkd,
        dba_objects           obj,
        dba_rollback_segs     rbs,
        v$transaction         trn,
        v$session             ses
where
        obj.object_id  = lkd.object_id
and     rbs.segment_id = lkd.xidusn
and     trn.xidusn     = lkd.xidusn
and     trn.xidslot    = lkd.xidslot
and     trn.xidsqn     = lkd.xidsqn      -- added for completeness
and     ses.taddr      = trn.addr
;

Of course it’s a matter of taste whether or not you feel this makes it a little easier to understand the query. I found that it made a difference the moment I had done a few global find and replaces, and that the predicate swap was just a little extra icing on the cake.

In passing, I make no claims for the efficiency (or correctness, or good sense) of this particular piece of SQL, that wasn’t the point of this posting but, if I thought the query was going to be useful on a regular basis, I would consider adjusting the order of execution (hence the ordering of the from and where clauses) and possibly introduce an in-line view to isolate the processing of the three dynamic performance views before doing the join to real tables.

I might also point out that v$locked_object is a simple join of x$ktadm, x$ksuse, and x$ktcxb, and the last two are just v$session and v$transaction anyway so I’d use the x$ objects to make the two v$ objects redundant; and then I might use undo$ instead of dba_rollback_segs, and I might think about obj$/user$ instead of dba_objects.

Comments (22)

22 Comments »

Interesting post Jonathan.
How do you find the readability of ANSI SQL compared to the traditional Oracle method? I swicthed over to using ANSI a little while ago and for me it has made a difference to readability of my SQL statements.

Comment by Chris — November 2, 2006 @ 9:15 pm GMT Nov 2,2006 | Reply
Not sure I like the idea of using x$ tables if there is an alternative in v$/dba*/sys.*. Why? Please read on.

Sure: for you and other folks deeply into that, it makes total sense. But for something that is going to be downloaded from Metalick for general consumption, this might be slightly less “future-proof”. X$* definitions and purposes can and *will* change between versions of Oracle. Whereas the others tend to be more static, being part of general doco and generally available information.

All this to say that using x$*-based SQL for day-to-day work is fine of course, but *someone* must sit down and work out and test if it all still works that way in the next release. Not everyone can afford the time to do this and most in fact don’t. As such I tend to stay away from x$ if I can get the info from somewhere else more release-stable/better documented.

Comment by Noons — November 2, 2006 @ 10:38 pm GMT Nov 2,2006 | Reply
Oh, found it was Note:1039126.6, there are the clues of the author’s intention. I named it numtransobjects.sql, which I think was included in the older format of the note. Of course, there is the clue “# of Records” :-)

I found others when I started using this in the 8i timeframe for an RBO system, for some reason this one has managed to be carried along with migrations. Probably because of it’s simplicity and tells me the gross information I needed – generally, I’d be doing some big load on what is normally an OLTP system and wondering where in a sequence of sometimes long-lasting steps a 4GL script is, and of course, how much undo. I hope it should be obvious to multiply the number of records [sic] by the average row size to get some gross idea of how much space is being used. Or should that be block size, since undo records are before images of blocks? Could VK’s large undo usage be due to inserts scattered randomly?

I would be interested in a script that more accurately answers VK’s original question.

(Could you put the timezone on the post times in your blog?)

Comment by joel garry — November 2, 2006 @ 11:24 pm GMT Nov 2,2006 | Reply
I am so glad someone else feels the same way I do about writing queries forwards. If the syntax allowed “AND NULL IS deptno” I’m sure somebody somewhere would think it was a good idea. Whenever I see “…AND x.blah = y.blah (+)” I know I’m going to have to rearrange the entire query into a human-readable order before I have any chance of understanding it.

Comment by William Robertson — November 3, 2006 @ 12:44 am GMT Nov 3,2006 | Reply
Yes, readability is a big issue with SQL, and I’m a fan of in-line commentary as well, possibly by using the same sort of technique that you sometimes see in books where lines of interest have a “footnote” number against them which is followed by in a comment block at the end of the query.

The only place where my own method disagrees is in the order of tables in the joins … my preference is that if I expect A to be accessed first, then B, then C etc. then I’ll list the joins as …

Where A.col1 = B.col1 And
B.col2 = C.col2 And
C.col3 = D.col3

… as I think it flows more naturally to the mind than …

Where B.col1 = A.col1 And
C.col2 = B.col2 And
D.col3 = C.col3

Still, they’re both fine methods and personal preference accounts for a lot. I like the “And” at the end of the line also, though it makes commenting out for testing a little more awkward. I just like the blockey look it gives to the code.

Comment by David Aldridge — November 3, 2006 @ 3:36 am GMT Nov 3,2006 | Reply
> Where A.col1 = B.col1 And
Well there we disagree I’m afraid. a.col1 is not the subject of any predicate. If we want all rows from A, then starting with “where a.something = …” suggests we only want rows where a.something matches some previously established value. If it were a literal value you wouldn’t write “where ‘banana’ = a.something” now would you? In the case of NULL the syntax doesn’t even allow it. So do we write literals one way round and joins another?

Also AND begins a predicate, it doesn’t end it. Putting it on the end of the previous one is just perverse. And don’t get me started on mixed-case keywords.

At least nobody’s suggested right-aligning keywords yet. What twisted psycho came up with that?

Comment by William Robertson — November 3, 2006 @ 7:03 am GMT Nov 3,2006 | Reply
Chris,
I know I ought to get familiar with ANSI SQL, and in principle I like the idea of the high visibility of the join predicates compared to the filter predicates. However I don’t see much of it, and don’t want to leave it behind me at sites which don’t use it, so I still haven’t become casually fluent, and I haven’t worked out a good way of presenting it so that complex SQL looks tidy.

Noons,
You’ll note that I preceded those comments with “if I were going to use it regularly” – and then I suggested some ‘standard’ tuning before diving into the x$ and data dictionary. I agree with your philosophy in general (although I believe Dave Ensor pointed out once that when he was working at BMC, the X$ were more stable than the V$).
However, since I always pass every piece of “tricksy” SQL through Explain Plan before I run it, I could see that this happened to be a nice simple example of ‘bad use of views’, so I thought I’d mention it.

Joel,
My choice of the word “intentions” was poor. I was thinking of “intended execution plan”, not “purpose of query”.
Regarding the sizing – I’ll post a note on undo (rollback) space usage, discussing the original news item in a little while. As for timezone – I operate from GMT but I don’t know how to get it into the timestamp, I do Oracle, not blog.

David,
Your twisted, evil, preference (please note – tongue in cheek) is 50% of the reason I wrote this note. The subject of the predicate should be on the left – no question. It’s not just aesthetics, as William points out, it’s logically correct. I also happen to dislike AND at the end, and commas at the beginning of lines, but at least I can tell myself that that’s just a sign of poor taste.

Comment by Jonathan Lewis — November 3, 2006 @ 8:13 am GMT Nov 3,2006 | Reply
I like the ANSI join syntax a lot, although for the life of me I cannot understand how the LEFT and RIGHT outer join keywords were arrived at, and why it could not simply have been OUTER JOIN. (Possibly the committee couldn’t agree, and they just thought to hell with it we’ll allow both, in order to move on to analytics and the MODEL clause before teatime.) Of course RIGHT OUTER JOIN is just LEFT OUTER JOIN written backwards, which brings us back to twisted evil ;)

Comment by William Robertson — November 3, 2006 @ 9:07 am GMT Nov 3,2006 | Reply
Strong feelings there about the order of columns in the predicate :D

I think we agree that it generally makes no difference to Oracle which way round it goes, nor does Oracle care where ANDs (and commas) go. (I agree about the commas — I don’t like that either, but it’s just a preference and doesn’t affect Oracle one jot).

If it does make a difference to Oracle then that would be a different matter, but I think that the logic you are talking about is human usability logic, not optimiser logic, and that means to me that it’s in the eye of the beholder. I like the flow …
A->B, B->C, C ->D, D->E
… more than …
B
From fact_table fct,
predicated_dim_1 pd1,
predicated_dim_2 pd2,
predicated_dim_3 pd3,
nonpredicated_dim_1 nd1,
nonpredicated_dim_2 nd2
Where fct.col1 = pd1.col1
and fct.col2 = pd2.col2
and fct.col3 = pd3.col3
and nd1.col4 = fct.col4
and nd2.col5 = fct.col5
…

whereas the David method would throw it’s hands in the air and just write …

From fact_table fct,
predicated_dim_1 pd1,
predicated_dim_2 pd2,
predicated_dim_3 pd3,
nonpredicated_dim_1 nd1,
nonpredicated_dim_2 nd2
Where fct.col1 = pd1.col1
and fct.col2 = pd2.col2
and fct.col3 = pd3.col3
and fct.col4 = nd1.col4
and fct.col5 = nd2.col5
…

Tricky situation, I would suggest. Feel free to disagree, but my poor taste and me are just victims of my own experience!

Mixed-case keywords — they drive a lot of people crazy funnily enough, but I like ’em.

Here’s a list of my other sins …

* I sometimes write “DBMS_MView.Refresh” … note the uppercase M and V together there. I just like the way it looks, nothing more.

* Yes, I put AND at the end of the line — it looks a lot more readable in fixed pitch fonts, as do all the examples above if my pre tags didn’t work.

* I tend to put OR’s at the front though, as they’re often in parentheses and indented.

* I like to align =’s signs in lists of joins

* When writing a BETWEEN predicate I often place the AND on the next line before the second argument instead of at the end of the first argument.

* I often align aliases and commas in select lists.

Finally I should add that I don’t have a strong preference for which end of a boiled egg I open first, though I’m more littlendian than bigendian in that respect ;)

Comment by David Aldridge — November 3, 2006 @ 6:42 pm GMT Nov 3,2006 | Reply
> I like the flow …
> A->B, B->C, C ->D, D->E

I’m afraid you’ve rather lost me. What I need for readability is for queries to be written in a consistent direction, and that direction should be forwards.

If the intention is to return all rows from A with no filtering, and you begin the query with “where a.col1 = something” just because you rather like the way it looks, then that is completely misleading and confusing.

Comment by William Robertson — November 4, 2006 @ 7:35 am GMT Nov 4,2006 | Reply
It is consistent William, it’s just in a different direction — one that I find clear and pleasing. It’s different to that which you use, and it is different to the way it would be expressed in a 3GL, but SQL isn’t a 3GL and a join is not logically the same as a predicate or a 3GL assignment statement.
>> If the intention is to return all rows from A with no filtering, and you begin the query with “where a.col1 = something” just because you rather like the way it looks, then that is completely misleading and confusing.

If B is a child of A and you just want to join them both with no filtering on either then by that argument expressing the join either as A.col1 = B.col1 or as B.col1 = A.col1 is misleading — no filtering is necessarily implied by a join at all. The fact is that in a relational database a join and a predicate are logically different things, and the misleading and confusing part of this is Oracle’s pre-ANSI compatible join syntax that makes them look similar despite their logical differences. I’ve never had any problem in looking at joins and predicates and “feeling” the difference between them. In the star schema environments that I work in there is rarely a “forwards” direction to respect anyway.

Comment by David Aldridge — November 4, 2006 @ 4:50 pm GMT Nov 4,2006 | Reply
David, I’m off my usual territory here but I think that theoretically all joins are cartesian, and the thing you are calling “the join” is then just a filter predicate that compares two columns in the same relation. (Although, of course, actual implementations behave completely differently and Oracle talks about filter predicates and access predicates). Ultimately there are only predicates, and the ON/WHERE difference is an SQL thing not a relational thing.
As far as your ‘unfiltered’ join is concerned, I would argue that the original programmer should assist their successor by indicating the expected join order in the from clause, at which point they implicitly have a correct ordering for the predicates; hence:
from ____A, B, C, D where ____A.col = {const} -- optionally and B.colx = A.colx and B.colp = {const} and C.coly = B.coly and D.colz = C.colz
When implemented correctly, the left-hand sides of ALL predicates in the where clause will visibly be sorted in order of the tables in the from clause, and ALL predicates will notionally be operated by Oracle in the order they appear. (transformations, transitive closure precedence, and CPU costing excepted).

P.S. I edited your previous comment back into one piece – I hope I did it correctly.

Comment by Jonathan Lewis — November 4, 2006 @ 6:55 pm GMT Nov 4,2006 | Reply
>> … I edited your previous comment back into one piece

And not for the first time I think. Thanks … I’m going to try to have to remember not to write less-than signs.

You could well be right about the joins/predicate thing Jonathan, and maybe we encroach on differences between relational and SQL databases here. I’m pretty sure that in the pure relational sense there is a big difference, with selections, theta-joins, division. antijoins and whatnot, but the lines get blurred when one starts to consider the implementation in the RDBMS — maybe more particularly in a “SQL DBMS”. Maybe my differentiation between joins and predicates in SQL is just a relational pretension, but I’m not at all sure that I can give it up. When visualising queries I always find myself linking up tables with joins and then asa second step imposing the predicates on the result — if it is in one’s nature to imagine predicates on the leading table before joining up to other tables then I can see how “B=A, C=B” makes more sense. Maybe I should just start using the ANSI join syntax, but I’m not sure that I can take that step yet either :(

I like the idea of improving maintainability by documenting expected join order but it depends on an established convention that says “all developers will order joins in this manner” and once you have established a convention then the style of the convention matters less than that everyone stick to it. Compliance to that convention would be very tricky to check by eye, and if one was going to be super-rigourous in this respect then I’d be wanting some form of abbreviated explain plan appended as a comment. Or maybe that’s going too far … the more important part of the convention is that it is respected by all.

Let me just add on a note of self-interest that if someone is paying me cash-money to write code then I’ll adopt almost any convention they like, even to the extent of not using mixed-case. But I won’t do code in all upper-case — I’ll see myself and family destitute first :D

Comment by David Aldridge — November 5, 2006 @ 3:47 pm GMT Nov 5,2006 | Reply
I don’t see how this depends on developers following conventions. I didn’t learn to write my join predicates the way I do by following a convention, I learned it all in the School of Hard Knocks.

The issue doesn’t go away with ANSI join syntax either. There is nothing to stop you writing

FROM a
JOIN b ON a.col = b.col

which to me, though not to everyone else apparently, is inconsistent with

WHERE b.deptno IS NULL

in that both specify something about a column of table B, but that column appears on arbitrarily different sides of the expression.

For ANSI outer joins we could even write

FROM b
RIGHT OUTER JOIN a ON a.col = b.col

which only makes the slightest sense if you read the query backwards.

Comment by William Robertson — November 6, 2006 @ 1:16 pm GMT Nov 6,2006 | Reply
>> Third I list the table names in the from clause in the order that I think the (cost-based) optimizer ought to visit them. The same ordering pattern appears in the where clause.
select from v$locked_object lkd, dba_objects obj, dba_rollback_segs rbs, v$transaction trn, v$session ses where obj.object_id = lkd.object_id and rbs.segment_id = lkd.xidusn and trn.xidusn = lkd.xidusn and trn.xidslot = lkd.xidslot and trn.xidsqn = lkd.xidsqn -- added for completeness and ses.taddr = trn.addr <<

So what is the order ???
“lkd” down to -> “ses”
or the reverse ?
Thanks

Comment by klabu — November 8, 2006 @ 3:55 am GMT Nov 8,2006 | Reply
Klabu, “order of the from clause” … top to bottom, i.e. “lkd down to “ses”. And to reduce the risk of any other ambiguity, this is reading down the left hand side of the predicates.
Of course if I had to write SQL for the Rule Based Optimiser I would put the tables in the from clause in the reverse order because the RBO works from the bottom up (all other things being equal). This means that for RBO, my table ordering in the from and my predicate ordering in the where used (at the global level) to run in opposite directions to each other.
(Within any one table, the same bottom-up to top-down rule switch still applies for filter predicates though, consider :
where colx = 99 and coyY = 'A'
RBO operates these from the bottom up, CBO operates them from the top down.)

Comment by Jonathan Lewis — November 8, 2006 @ 7:56 am GMT Nov 8,2006 | Reply
I also happen to dislike [snip] commas at the beginning of lines, but at least I can tell myself that that’s just a sign of poor taste.
Fair comment.
I’m guilty as charged then … and I’ll stay that way. Mind you, there is a reason for that poor taste. Just curious … what editor do you prefer when writing code?

Comment by Gabe — November 10, 2006 @ 12:08 am GMT Nov 10,2006 | Reply
Gabe, I have heard arguments for the “commas at beginning” – the worst one was that it’s easier for code-generators. There’s also the one about being able to delete lines more easily (which also applies to “and at the end”). Since I have to read code, I’ll always argue for styles that make it easier for readers, rather than writers.
On Unix I use vi, on Windows I use notepad, on VMS I use ed, on mainframes I use whatever they’ve got.

Comment by Jonathan Lewis — November 10, 2006 @ 7:25 am GMT Nov 10,2006 | Reply
Ok, so we agree then: it is an issue of taste … as in, appearances. We just reached different conclusions. In fact, to me, it isn’t as much commas at the beginning vs. commas at the end as the commas being equally indented or better said, being in a predictable, consistent place. It is for reasons of readability that I prefer it.

I don’t want to fill your blog with some ugly, nested SQL; I’m sure you’ve seen enough of that. But here is something small:

select rownum id,
decode(mod(rownum,10),0,null,
decode(rownum-1,0,null,rownum-1)) pid,
trunc(rownum/10) typ
from (select level from dual connect by level bad taste, but as long as I see the pattern, I’m good.

Cheers.

PS. A preview button will be nice here … will my sql come out LTRIMed or not?

Comment by Gabe — November 10, 2006 @ 4:04 pm GMT Nov 10,2006 | Reply
Oh well. Here goes the clarity part.
I did hit the back button and checked that my entire comment was there, formatted and all … but it came out the way it did. So, that’s that then. Sorry about the mess.

Comment by Gabe — November 10, 2006 @ 4:11 pm GMT Nov 10,2006 | Reply
Gabe, there are some instructions about allowing code to survive the comments editor in the page “How to use”.
If you want to email me your source, I’ll reformat it and publish in your original note. In passing one of my “rules” on decode() is that if it’s not a one-liner, then the closing bracket should either be under the opening ‘d’, or under the first letter of the predicate that it completes.

Comment by Jonathan Lewis — November 10, 2006 @ 4:30 pm GMT Nov 10,2006 | Reply
[…] There’s a new kid on the blog — Jonathan Lewis, and his blog is the Oracle Scratchpad. Here’s a post of his on the virtues of clarity. And clarity and clarity. […]

Pingback by Log Buffer #17: a Carnival of the Vanities for DBAs — February 4, 2013 @ 4:15 pm GMT Feb 4,2013 | Reply