Oracle Scratchpad

October 26, 2012

Running Total

Filed under: Oracle,SQL Server — Jonathan Lewis @ 7:02 pm BST Oct 26,2012

Here’s a question on OTN from a  SQL Server user that should prompt a few interesting ideas. Re-arranged and paraphrased it goes something like this:

In SQL Server I can write the following code:

DECLARE @Counter INT
SELECT @Counter = 0

UPDATE TempDB.dbo.TransactionDetail
SET @Counter = AccountRunningCount = @Counter + 1
FROM TempDB.dbo.TransactionDetail WITH (TABLOCKX)

What I want to do is more like this:

DECLARE @Total INT = 0
UPDATE StringOutput
set @Total = SumOfLength = @Total + ColLength
OPTION (MAXDOP 1)

How do I do something similar in Oracle ?


I suspect that there may be an error in the second piece of code if it is to run on SQL Server since it doesn’t have a “FROM StringOutput” to match the “FROM TempDB.dbo.TransactionDetail” that appears in the first piece – but perhaps it’s implied by the reference to StringOutput in the UPDATE clause.

To an Oracle user, the set clause is particularly strange – but this is a feature of how T-SQL (roughly the equivalent of  PL/SQL) can work for SQL Server.  This set clause allows each row to modify, and be modified by, a variable. So, for each row, this double assignment first sets the column sumoflength to the value of the variable total plus the value of the column collength,   then it sets the variable total to this new value of sumoflength. The net effect is that the column sumoflength acquires a running total of collength as you read through the table. (I am basing my explanation on “reasonable guesswork” combined with the title of the thread which was:  “Update Failed for Sum of previous row and current row”.

So how do you do this in Oracle ? The simple answer is “you shouldn’t, but if you do the strategy has to be completely different because the tools are different”.

The reason for the “shouldn’t” is that there is no implicit order in a select statement unless it includes an “order by” clause – so any thoughts of “previous” or “next” for data in a table are erroneous. Assume then that we define an order through a unique, non-null column – we would then look at the SQL Server code and wonder whether its internal implementation was effectively a row by row mechanism, or whether there was some cunning array based processing taking place (perhaps like the trick that PL/SQL uses to turn an implicit cursor for loop into array fetches of 100 rows at a time).

My first thoughts about the task (assuming we actually do want to do it) point me to analytic functions as a way of producing the data – but I fear I may have to use PL/SQL because of the strong flavour of row-based activity that the code suggests. Let’s start with the first stab at an analtyic approach:

create table t1
as
with generator as (
	select	--+ materialize
		rownum id
	from dual
	connect by
		level <= 1e4
)
select
	rownum			id,
	rownum			n1,
	cast(null as number(9))	running_tot
from
	generator	v1
where
	rownum <= 1e3
;

alter table t1 add constraint t1_pk primary key(id);

-- collect stats

This gives us a primary key for ordering, a column to generate a running total for, and a column to store the running total – currently null. Let’s write a query that shows us the data we ultimately want to store in the table:

select
	id,
	n1,
--	running_tot,
	sum(n1) over (order by id)	running_tot
from
	t1
;

        ID         N1 RUNNING_TOT
---------- ---------- -----------
         1          1           1
         2          2           3
         3          3           6
         4          4          10
         5          5          15
         6          6          21
         7          7          28
         8          8          36
         9          9          45
        10         10          55

Since we can calculate the right information in a select statement it would be nice if we could turn that statement into an updatable view; unfortunatley it doesn’t work.

update
	(
	select
		id,
		n1,
		running_tot,
		sum(n1) over (order by id)	running_tot_calc
	from
		t1
	)
set
	running_tot = running_tot_calc
;

        (
        *
ERROR at line 2:
ORA-01732: data manipulation operation not legal on this view

That doesn’t seem entirely reasonable, but we can take the code and wrap it in pl/sql as a cursor for loop and (left as an exercise to the reader) we could even make the code more complex to do array updates rather than using the simplest code to do row by row updates:

declare
	cursor c1 is
		select
			id,
			n1,
			running_tot,
			sum(n1) over (order by id)	running_tot_calc
		from
			t1
		for update of
			running_tot
	;
begin
	for r1 in c1 loop
		update t1 set running_tot = r1.running_tot_calc where current of c1;
	end loop;
	commit;
end;
/

For the small data set I had generated, this runs pretty quickly, and gets the right answer. I find it slightly surprising, though, that I’m allowed to use the “select for update”/”current of” approach when a straight update isn’t legal.

Of course we might consider going for the full, non-procedural, array update – and luckily we’ve got a primary key that will help us to generate an updatable join view, or a simple, update-only, merge.

merge
	into t1	od
using	(
	select
		id,
		n1,
		running_tot,
		sum(n1) over (order by id)	running_tot_calc
	from
		t1
	)	nd
on
	(nd.id = od.id)
when matched then
	update set od.running_tot = nd.running_tot_calc
;

I won’t show you the updatable join view version – because it didn’t work – I got the “non-key-preserved” error ORA-01779. I think the optimizer is having some trouble (or being a little too clever) with the analytic sum(). Again, with the small data set, this update was pretty quick. For larger data sets, the choice of general strategy is left to the reader.

I’d still be interested to know, though, whether the SQL Server code was internally array based or operates row by row.

Update 9th Aug 2013

The update of the analytic view is still illegal in 12c, and the update to a join view including a version of the analytic view still produces Oracle error ORA-01779.  I didn’t show it at the time, but the code I was experimenting with was as follows:

update
	(
	select
		t1_new.id_new,
		t1_new.running_tot_calc,
		t1_old.id,
		t1_old.running_tot
	from
		(
		select
			t1.id		id_new,
			sum(t1.n1) over (order by t1.id)	running_tot_calc
		from
			t1
		)	t1_new,
		(
		select
			t1.id,
			t1.running_tot
		from
			t1
		)	t1_old
	where
		t1_new.id_new = t1_old.id
	)
set
	running_tot = running_tot_calc
;

Since id is declared as a primary key, and generating an analytic sum ordered by id can’t introduce multiple copies of an id value it seems a little surprising that Oracle complains about failure of key preservation (ORA-01779) in this case.

9 Comments »

  1. Your comment about the “shouldn’t” and table order might not apply to SQL Server because the default table structure over there is their version of an IOT. At least that is what I remember from the All Things Oracle webinar you did.So wouldn’t that enable them to do the previous/next type logic without worrying about the order so much?

    Comment by Tony — October 26, 2012 @ 10:29 pm BST Oct 26,2012 | Reply

    • Tony,

      I believe your suggestion about why the “next/previous” problem wouldn’t occur to many SQL Server programmers is correct. That doesn’t mean they shouldn’t worry about it, though, it just means they’re slightly less likely to think of it as a problem.

      Comment by Jonathan Lewis — December 13, 2012 @ 2:19 pm BST Dec 13,2012 | Reply

  2. I think “select for update” is prefer since
    1.”merge” result could be inconsistent(for example with insert and update in another session)
    2. “select for update” optimized and stabilized with “buffer sort” in 11.2.
    3. we can optimize case with “for update” cursor with bulk operations(fetch limit+for all)

    And would be great if “window sort” executed after locking in next oracle versions like subqueries and functions now, which is not used for sorting into “order by” clause.
    Also i want to ask: has the “for update” mechanism for array locking like “array update”(redo array) ?

    Comment by Sayan Malakshinov — October 26, 2012 @ 11:45 pm BST Oct 26,2012 | Reply

    • Sayan,

      Very useful points – especially the reminder about write consistency.

      In this case, of course, we’re going to be updating every row in the table, so I should have suggested “lock table in exclusive mode” as a starting step – which would be much more efficient than select for update.

      Comment by Jonathan Lewis — December 13, 2012 @ 2:21 pm BST Dec 13,2012 | Reply

  3. I think this will work:

    update t1
     set running_tot = (
                        select sum( t1_1.n1) from t1 t1_1
                         where t1_1.id &lt;= t1.id 
                       );
    

    Not the fastest query in the world, but should work.

    Comment by Lev Erusalimskiy — October 29, 2012 @ 12:03 am BST Oct 29,2012 | Reply

    • Besides the fact that your query is much heavier (too many current gets), is much more important that it is dangerous because of possible inconsistencies result.
      Just simple example with your query:

      --creating test table with 10 rows, id and n1 = 1..10:
      create table t1 as 
      select 
         level id
        ,level n1
        ,cast(null as int) running_tot 
      from dual 
      connect by level<=10;
      
      Table created.
      
      -----------------------
      -- in one session we updating n1 to 100 where id=5:
      update t1 set n1=100 where id=5;
      
      1 row updated.
      -----------------------
                  -- in another we updating running_total with your query:
                  update t1
                  set running_tot = (
                                    select sum( t1_1.n1) 
                                    from t1 t1_1
                                    where t1_1.id <= t1.id
                  );
                  -- wait for unlocking..
      -----------------------
      -- now committing in first session:
      commit;
      
      Commit complete.
      -----------------------
                  -- second session unlocked:
                  10 rows updated.
                  -- let's see what we got:
                  select * from t1;
                  
                          ID         N1 RUNNING_TOT
                  ---------- ---------- -----------
                           1          1           1
                           2          2           3
                           3          3           6
                           4          4          10
                           5        100          15
                           6          6          21
                           7          7          28
                           8          8          36
                           9          9          45
                          10         10          55
      
                  10 rows selected.
      

      Comment by Sayan Malakshinov — October 29, 2012 @ 6:58 pm BST Oct 29,2012 | Reply

      • I agree that this query is not the fastest, but I don’t think it is more dangerous than any other update statement
        - it sees committed data as of the begging of the query.

        Buy the way, how did you add this beautiful formatting to your post?

        Comment by Lev Erusalimskiy — October 30, 2012 @ 3:05 am BST Oct 30,2012 | Reply

  4. […] Analytic view update – still not legal / handled […]

    Pingback by 12c Join Views | Oracle Scratchpad — August 9, 2013 @ 6:37 pm BST Aug 9,2013 | Reply


RSS feed for comments on this post. TrackBack URI

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

The Rubric Theme Blog at WordPress.com.

Follow

Get every new post delivered to your Inbox.

Join 3,508 other followers