While checking out potential scalability threats recently on a client system, I was directed to a time-critical task that was currently executing the same PL/SQL procedure 16 times (with different parameters) between 6:00 and 7:00 am; as the system went through its next phase of expansion the number of executions of this procedure was likely to grow. An interesting detail, though, was that nothing else was going on while the task was running so the machine (which had 6 cores) was running at 16% CPU.
An obvious strategy for handling the required growth target was to make sure that four (possibly 5) copies of the procedure were allowed to run concurrently. Fortunately the different executions were completely independent of each other and didn’t interfere with each other’s data, so the solution simply required a mechanism to control the parallelism. Conveniently 11gR2 gave us one.
If you look at the package dbms_parallel_execute, you’ll find that it’s described as a mechanism that allows you to define your own method of parallelising a task – this is how the PL/SQL Packages and Types manual describes it:
This package lets you incrementally update table data in parallel, in two high-level steps:
- Group sets of rows in the table into smaller-sized chunks.
- Run a user-specified statement on these chunks in parallel, and commit when finished processing each chunk.
Notice how the emphasis seems to be on doing parallel table updates: the worked example, and all the examples I’ve found so far on the Internet, show you how to take a table, break it into multiple chunks using rowed ranges, numeric column ranges, or user-defined methods, and update the table in parallel. (Tim Hall, as he so often does, has a good demonstration). But that’s only a starting point – if you can update a table, you could choose to update the table with a value generated by a call to a PL/SQL function – so, in the hour I had left before heading out to the airport, I came up with a simple demonstration of a strategy that the client could use to parallelized his task:
- convert the procedure into a function that returns some sort of status code
- populate a table with a list of the sets of parameters used to call this function
- use dbms_parallel_execute to do a parallel update on the table by calling the function for each row, passing in the parameters for that row, and updating the row with the return status
- to control the granularity of execution, number each row in the table and use the create_chunks_by_number_col function option
Here’s a minimum piece of code to demonstrate the principle. T1 is my control table, F1 is the function I want to call with some input parameters, T2 is just an object that I’ve created so that I’ve got something to mess about with in function F1 (the function inserts 100 rows into the table and returns the number of rows inserted).
create table t1 as select rownum id, rownum parameter_1, 0 status from all_objects where rownum <= 10 ; create table t2( n1 number(2,0), content number(10,0), sessionid number(10,0) ); create or replace function f1(i_in number) return number as begin insert into t2 select i_in, rownum, sys_context('userenv','sessionid') from all_objects where rownum <= 100 ; return sql%rowcount; end; / rem rem Create a "place holder" for the task. rem execute dbms_parallel_execute.create_task(task_name => 'Parallel update') rem rem Set up the chunking of the table I want to update in parallel. rem I'm using a number column, populated by sequence (rownum) in my rem case, and setting a chunk size of one, so each row becomes a rem separate chunk. rem rem The owner, table, and column must match the case shown in rem the data dictionary or you get Oracle errors 942 or 904 rem begin dbms_parallel_execute.create_chunks_by_number_col( task_name => 'Parallel update', table_owner => 'TEST_USER', table_name => 'T1', table_column => 'ID', chunk_size => 1 ); end; / rem rem The call will have created a reference list of chunks, rem The start_id, end_id will give the low and high values rem for the rows in my table that will be used for each chunk. rem You should see 10 rows here, matching my T1 table, with rem the same start_id and end_id -- i.e. T1 one row per chunk. rem select chunk_id, status, start_id, end_id, job_name from user_parallel_execute_chunks where task_name = 'Parallel update' order by chunk_id ; rem rem Now we call the package to update my table in parallel. rem The update calls my function, passing in the parameter rem values(s) stored in the row of the table, and setting rem the status column to the return value of the function. rem rem I've set the level of parallelism to be 4 - so no rem more than 4 copies of the function will run at once. rem declare l_sql_stmt varchar2(32767); begin l_sql_stmt := ' update t1 set status = f1(parameter_1) where id between :start_id and :end_id '; dbms_parallel_execute.run_task( task_name => 'Parallel update', sql_stmt => l_sql_stmt, language_flag => dbms_sql.native, parallel_level => 4 ); end; / commit; execute dbms_parallel_execute.drop_task(task_name => 'Parallel update')
The package operates by creating jobs and dispatching them, so the schema that runs this code has to have the privilege “create job”.
The run_task procedure dispatches the jobs, and then uses the dbms_lock.sleep() procedure to poll for completion every 3 seconds – so this strategy is not really appropriate if you’re trying to deal with very small, fast jobs.
There are procedures to deal with chunks failing – see the manuals or Tim Hall’s article for examples – but for simplicity and clarity I’ve assumed that nothing will go wrong.