Oracle Scratchpad

September 9, 2013

Parallel Execution

Filed under: Oracle,Parallel Execution,Performance — Jonathan Lewis @ 6:51 am BST Sep 9,2013

While checking out potential scalability threats recently on a client system, I was directed to a time-critical task that was currently executing the same PL/SQL procedure 16 times (with different parameters) between 6:00 and 7:00 am; as the system went through its next phase of expansion the number of executions of this procedure was likely to grow. An interesting detail, though, was that nothing else was going on while the task was running so the machine (which had 6 cores) was running at 16% CPU.

An obvious strategy for handling the required growth target was to make sure that four (possibly 5) copies of the procedure were allowed to run concurrently. Fortunately the different executions were completely independent of each other and didn’t interfere with each other’s data, so the solution simply required a mechanism to control the parallelism. Conveniently 11gR2 gave us one.

If you look at the package dbms_parallel_execute, you’ll find that it’s described as a mechanism that allows you to define your own method of parallelising a task – this is how the PL/SQL Packages and Types manual describes it:

This package lets you incrementally update table data in parallel, in two high-level steps:

  • Group sets of rows in the table into smaller-sized chunks.
  • Run a user-specified statement on these chunks in parallel, and commit when finished processing each chunk.

Notice how the emphasis seems to be on doing parallel table updates: the worked example, and all the examples I’ve found so far on the Internet, show you how to take a table, break it into multiple chunks using rowed ranges, numeric column ranges, or user-defined methods, and update the table in parallel. (Tim Hall, as he so often does, has a good demonstration). But that’s only a starting point – if you can update a table, you could choose to update the table with a value generated by a call to a  PL/SQL function – so, in the hour I had left before heading out to the airport, I came up with a simple demonstration of a strategy that the client could use to parallelized his task:

  • convert the procedure into a function that returns some sort of status code
  • populate a table with a list of the sets of parameters used to call this function
  • use dbms_parallel_execute to do a parallel update on the table by calling the function for each row, passing in the parameters for that row, and updating the row with the return status
  • to control the granularity of execution, number each row in the table and use the create_chunks_by_number_col function option

Here’s a minimum piece of code to demonstrate the principle. T1 is my control table, F1 is the function I want to call with some input parameters, T2 is just an object that I’ve created so that I’ve got something to mess about with in function F1 (the function inserts 100 rows into the table and returns the number of rows inserted).


create table t1
as
select
	rownum		id,
	rownum		parameter_1,
	0		status
from
	all_objects
where
	rownum <= 10
;

create table t2(
	n1		number(2,0),
	content		number(10,0),
	sessionid	number(10,0)
);

create or replace function f1(i_in number)
return	number
as
begin

	insert	into t2
	select	i_in, rownum, sys_context('userenv','sessionid')
	from	all_objects
	where	rownum <= 100
	;
	return sql%rowcount;
 end;
/
rem
rem	Create a "place holder" for the task.
rem

execute dbms_parallel_execute.create_task(task_name => 'Parallel update')

rem
rem	Set up the chunking of the table I want to update in parallel.
rem	I'm using a number column, populated by sequence (rownum) in my
rem	case, and setting a chunk size of one, so each row becomes a
rem	separate chunk.
rem
rem	The owner, table, and column must match the case shown in
rem	the data dictionary or you get Oracle errors 942 or 904
rem

begin
	dbms_parallel_execute.create_chunks_by_number_col(
		task_name    => 'Parallel update',
		table_owner  => 'TEST_USER',
		table_name   => 'T1',
		table_column => 'ID',
		chunk_size   => 1
	);
end;
/

rem
rem	The call will have created a reference list of chunks,
rem	The start_id, end_id will give the low and high values
rem	for the rows in my table that will be used for each chunk.
rem	You should see 10 rows here, matching my T1 table, with
rem	the same start_id and end_id -- i.e. T1 one row per chunk.
rem

select
	chunk_id, status, start_id, end_id, job_name
from
	user_parallel_execute_chunks
where
	task_name = 'Parallel update'
order by
	chunk_id
;

rem
rem	Now we call the package to update my table in parallel.
rem	The update calls my function, passing in the parameter
rem	values(s) stored in the row of the table, and setting
rem	the status column to the return value of the function.
rem
rem	I've set the level of parallelism to be 4 - so no
rem	more than 4 copies of the function will run at once.
rem

declare
	l_sql_stmt varchar2(32767);
begin
	l_sql_stmt := '
		update t1 set status = f1(parameter_1)
		where id between :start_id and :end_id
	';

	dbms_parallel_execute.run_task(
		task_name      => 'Parallel update',
		sql_stmt       => l_sql_stmt,
		language_flag  => dbms_sql.native,
		parallel_level => 4
	);

end;
/

commit;

execute dbms_parallel_execute.drop_task(task_name => 'Parallel update')

The package operates by creating jobs and dispatching them, so the schema that runs this code has to have the privilege “create job”.

The run_task procedure dispatches the jobs, and then uses the dbms_lock.sleep() procedure to poll for completion every 3 seconds – so this strategy is not really appropriate if you’re trying to deal with very small, fast jobs.

There are procedures to deal with chunks failing – see the manuals or Tim Hall’s article for examples – but for simplicity and clarity I’ve assumed that nothing will go wrong.

6 Comments »

  1. Jonathan,

    Very interesting, thank you.

    Because each “chunk” is treated as a separate transaction – committed as soon as its job is complete – we’d lose the ability to perform rollback on the entire transaction if an issue were to occur. This would also mean other users could potentially see a “partially updated” table. I can definitely see the value in large transactions by minimizing UNDO usage and speeding up the dml time, but this is just be something we’d want to keep in mind when using this technique.

    Very interesting, thanks!

    Thanks,
    Scott

    Comment by Scott — September 9, 2013 @ 3:06 pm BST Sep 9,2013 | Reply

    • Scott,

      Thanks for raising that issue. It’s a point that I should have made in the posting.
      You need to be aware that this will turn a single transaction into one transaction per chunk – and that’s usually a significant change.

      Comment by Jonathan Lewis — September 9, 2013 @ 9:30 pm BST Sep 9,2013 | Reply

  2. Hi Jonathan. Thanks for nice post regarding DBMS_PARALLEL_EXECUTE. Using this feature, I faced an issue with “ORA-01000: maximum open cursors exceeded”, when calling same parallel task sequentialy mutliple times.
    In my application, I needed to process 100-500 bulk payments, each consisting of 1-100 payments. My processing model reads and parses bulk payments in cycle and those 1-100 payments are to be processed using parallel execution (e.g. 5slaves by 10payments) method chunks_by_number_col.

    While we have database setting of max_open_cursors limited to 300, executing 301st, 302nd, … parallelism crashes on ORA-01000.
    Each complete bulk payment processing is one complete parallel execution cycle, i.e.
    .create_task
    .create_chunks_by_number_col
    .run_task
    .drop_task,

    but even after successfull .drop_task, there is unclosed cursor remaing. This cursor seems to be implicit one originating from DBMS_PARALLEL_EXECUTE.run_task call, e.g. “select min(“IDNO”), max(“IDNO”) from “SCOTT”.”TEST_TABLE””.

    Could you please suggest any idea or propose solution, how to close such implicit cursor?

    Comment by Ivan — August 1, 2014 @ 2:27 pm BST Aug 1,2014 | Reply

    • Ivan,
      This sounds a bit buggy to me – like a cursor leak, maybe; but perhaps it’s expected behaviour for some reason. There used to be a parameter “_close_cached_open_cursors” which force cached pl/sql cursors to close on commit. I think it’s probably deprecated by now, but you might want to see what effect it has as a temporary measure. Which version of Oracle are you using.

      Comment by Jonathan Lewis — August 5, 2014 @ 12:45 am BST Aug 5,2014 | Reply

      • Hi Jonathan, I am currently running on 11g (release 11.2.0.2.0). Sure, I would like to try use of that parameter proposed by you, Thanks much.

        Comment by Ivan — August 5, 2014 @ 8:34 am BST Aug 5,2014 | Reply

  3. Jonathan, I was trying to find the number of rows updated by each parallel task. But I couldn’t find it anywhere in the documentation. Do you know whether there is any way to find that? This question applies to the option where chunks created by a user-defined SQL.

    Comment by Guest — August 4, 2014 @ 9:09 pm BST Aug 4,2014 | Reply


RSS feed for comments on this post. TrackBack URI

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

The Rubric Theme. Blog at WordPress.com.

Follow

Get every new post delivered to your Inbox.

Join 4,012 other followers