Oracle Scratchpad

November 19, 2008

Lateral LOBs

Filed under: Infrastructure,lateral view,LOBs,Oracle — Jonathan Lewis @ 10:20 pm BST Nov 19,2008

The following request appeared on the comp.databases.oracle.server newsgroup a few days ago:

Is it possible to create a view that will split single CLOB column into multiple chunks of VARCHAR2 something like this:

TABLE:
---------------------------
ID              INTEGER
DATA            CLOB

VIEW:
--------------------------------------------------
ID              INTEGER
CHUNK_ID        INTEGER
DATA            VARCHAR(1000 char)

I gave a brief response outlining an idea that I thought might work then, having a bit of spare time on a train journey a couple of days later, I decided to check whether the idea could be made to work. This is what I came up with on a database running 10.2.0.3.

I’m going to use a pipelined function to break a CLOB into a set of varchar2() which will be returned on demand; then I’m going to create a view that joins the table holding the CLOB to a lateral view of the pipelined function. (There’s a note here from the Oracle optimizer group that contains a definition of lateral views)

drop function page_clob;
drop type clob_page;
drop type clob_line;

create or replace type clob_line as object (
	line_id	number,
	content	varchar2(60)
)
/

create or replace type clob_page as table of clob_line
/

create or replace function page_clob(
	i_id	number,
	i_chunk	number default 60
)
return clob_page
pipelined
-- deterministic  -- included incorrectly, see comment #3
as

	m_c1		clob;
	m_length	number(12);
	m_start_point	number(12) := 1;

begin
	select
		c1, length(c1)
	into
		m_c1, m_length
	from
		t1
	where
		id = i_id
	;

	if (m_c1 is null or m_length = 0) then
		pipe row(clob_line(1,to_char(null)));
	else
		for i in 1..ceil(m_length/i_chunk) loop
			pipe row (
				clob_line(
					i,
					substr(m_c1, m_start_point, i_chunk)
				)
			);
			m_start_point := m_start_point + i_chunk;

		end loop;
	end if;
	return;
end;
/

The pipelined function needs an array type defined for its return type, and “pipes” a single row of the underlying scalar type. All I’ve done in the function (and it may not be the most efficient piece of pl/sql – it’s the pipelined function and lateral view that are the significant ideas) is to step my way through the CLOB in chunks, piping a “line number” and the chunk. I’ve put in a default chunk size of 60 bytes – but you can change this in the function definition, or in the calls you make to the function.

drop table t1 purge;
create table t1 (id number, c1 clob);

insert into t1 values(1, rpad('x',173,'a') || 'x');
insert into t1 values(2, rpad('x',179,'b') || 'x');
insert into t1 values(3, empty_clob());
insert into t1 values(4, null);

create or replace view v1
as
select
	/*+ cardinality(p1 10) */
	t1.id,
	p1.line_id,
	p1.content
from
	t1,
	table(page_clob(t1.id)) p1
;

I’ve then created a table to hold just a CLOB and an id column, and created a view that joins the table to a call to the pipelined function. The call to the pipelined function is the “lateral” bit. I’ve cast the function into a table with the table() operator, and passed the id from table t1 as the input parameter to the function call.

You’ll notice that I’ve used the /*+ cardinality */ hint  in the view definition to tell the optimizer that (in my case) the typical LOB will turn into 10 rows of varchar2(). Without this hint, Oracle would use one of its magic numbers to “estimate” the number of rows returned by each call to the table() operator. The number used by default is roughly the same as the number of bytes in the default block size – which may not be very suitable.

You’ll also notice that my sample dataset caters for checking a couple of boundary conditions – what to do with a null or an empty LOB, plus a check that I do get the entire LOB (even in the special case that the LOB is an exact multiple of the size of the chunk I have chosen).

Here’s an example of some code that queries the view – followed by the set of results:

column id format 99
column line_id format 999999
column content format a60

select * from v1 where id = 1;
select * from v1 where id = 2;
select * from v1 where id = 3;
select * from v1 where id = 4;

break on id skip 1

set autotrace on explain
select * from v1 where id <= 3;
set autotrace off

And the results:

 ID LINE_ID CONTENT
--- ------- ------------------------------------------------------------
  1       1 xaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
          2 aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
          3 aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaax

3 rows selected.

 ID LINE_ID CONTENT
--- ------- ------------------------------------------------------------
  2       1 xbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb
          2 bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb
          3 bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbx

3 rows selected.

 ID LINE_ID CONTENT
--- ------- ------------------------------------------------------------
  3       1

1 row selected.

 ID LINE_ID CONTENT
--- ------- ------------------------------------------------------------
  4       1

1 row selected.

 ID LINE_ID CONTENT
--- ------- ------------------------------------------------------------
  1       1 xaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
          2 aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
          3 aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaax

  2       1 xbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb
          2 bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb
          3 bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbx

  3       1

7 rows selected.

Execution Plan
----------------------------------------------------------
Plan hash value: 3082259176

------------------------------------------------------------------------------------------------
| Id  | Operation                          | Name      | Rows  | Bytes | Cost (%CPU)| Time     |
------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                   |           |    30 |   450 |    86   (2)| 00:00:02 |
|   1 |  NESTED LOOPS                      |           |    30 |   450 |    86   (2)| 00:00:02 |
|*  2 |   TABLE ACCESS FULL                | T1        |     3 |    39 |     2   (0)| 00:00:01 |
|   3 |   COLLECTION ITERATOR PICKLER FETCH| PAGE_CLOB |       |       |            |          |
------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - filter("T1"."ID"<=3)

Note
-----
   - dynamic sampling used for this statement

Note, in particular, that we can query the view just as if it were based on simple tables, and the pipelined function behaves properly underneath the view; and that the execution plan when we predict three rows from the base table shows 30 rows (3 * cardinality(10)) from the view.

Of course, any time you create a view, you need to specify how that view should be used – which means listing the “acceptable” entry point. In this case, the view should only be used when you can include a predicate that filters or joins on the id column. It wouldn’t be sensible to write a query that referenced only the line_id or content columns – the query would work, but it wouldn’t be efficient.

11 Comments »

  1. Jonathan,

    Maybe it is worth stressing the fact that the function is deterministic (and declared as such) as well.

    Also, given

    content varchar2(60)

    it would be difficult to use chunk sizes greater than 60 by “chang[ing] this in the function definition, or in the calls you make to the function.” I know this is rather obvious, but…
    On the other hand, this length defines the default format of the v1.content column in SQL*Plus – something to keep in mind when one feels compelled to set it to 4000.
    It doesn’t seem to affect CBO’s estimations, though (13 bytes/row for t1 vs. 15 bytes/row for v1), so on balance it might make sense to bump it to at least the 1000 originally requested and just get in the habit of using the COL[UMN] command in SQL*Plus :-)

    Cheers!

    Flado

    Comment by Flado — November 21, 2008 @ 10:29 am BST Nov 21,2008 | Reply

  2. Jonathan and Flado,

    Declaring the function to be deterministic is wrong! It selects the clob from a database table, so it cannot be guaranteed to return the same result the next time it is called (with the same parameters)…

    Regards,
    Michael

    Comment by Michael Garfield Sørensen — November 21, 2008 @ 12:17 pm BST Nov 21,2008 | Reply

  3. Oops… True, that.

    Well then, declare the function to take the CLOB as an IN parameter instead of the ID. Would be deterministic and save a “context switch”, too. Although one could question the benefit of it being deterministic in this case (don’t know if SQL can cache function results for functions with LOB parameters, or for pipelined functions, for that matter)

    Flado

    Comment by Flado — November 21, 2008 @ 1:19 pm BST Nov 21,2008 | Reply

  4. Flado,
    The benefit (to me) of declaring the function with a return type of (effectively) varchar2(60) was that I could do a highly visible demonstration of the principle of testing a critical boundary condition. If you wanted to implement this on a production system there are various reasons why a much larger return would probably be sensible.

    A function which passes the CLOB as its input could well be more efficient than any other option – it’s something I would want to test before implementing a live solution.

    I will be posting a couple of other notes about the oddities (and overheads) of manipulating LOBs at some future date.

    Michael,
    Good point about the deterministic declaration. I typed that in on auto-pilot without thinking about it – you’re right, it shouldn’t be there and I’ve taken it out.

    Comment by Jonathan Lewis — November 26, 2008 @ 12:04 am BST Nov 26,2008 | Reply

  5. Jonathan,
    In your book Cost-Based Oracle Fundamentals, p. 219, you have also “misused” deterministic in function get_dept_avg.

    I’ve witnessed things such as intermittent ORA-600 errors and even plain wrong (and not just stale) results when declaring functions that aren’t really deterministic as deterministic. Think about it… If a function is deterministic Oracle may or may not call the function a second/third/… time (it may need to call it a second time, if the result is not longer cached, but I believe it may choose to call it again for a number of reasons). When you lie to Oracle “all bets are off” – be careful!

    Comment by Michael Garfield Sørensen — January 7, 2009 @ 9:38 pm BST Jan 7,2009 | Reply

  6. Michael,

    You’re quite right.

    I was using the function to demonstrate a point – but I should have said something to make it clear that it was a deliberate abuse of the “deterministic” key word.

    (In Practical 8i, I made the point very clearly by creating a function that returned the result of a call to the dbms_random package.)

    Comment by Jonathan Lewis — January 12, 2009 @ 3:20 pm BST Jan 12,2009 | Reply

  7. [...] is the result set from a call to dbms_xplan.display_cursor, and the join is a variation of the lateral join mechanism which takes as its inputs some values from a earlier object in the from [...]

    Pingback by Dependent Plans « Oracle Scratchpad — May 5, 2009 @ 6:10 pm BST May 5,2009 | Reply

  8. Very nice! I found my way to this blog entry via your “Dependent Plans” discussion, and I was impressed by your work there.

    Your reference to the cardinality hint was almost in passing, but I found it most interesting and useful in my situation. My developers have recently gotten into the habit of writing queries like this inside of PL/SQL programs:


    select ....
    from mytab
    where job_code in (select code from table(cast(myvar as mytype))) and ....

    where myvar is an Oracle variable of a user-defined table type. It bothered me greatly to see full table scans on the very large mytab table when I knew that the myvar variable contained just a handful of values. But I knew of no way to persuade the CBO to do the preferred index access instead of an FTS. The cardinality hint is just perfect in our case, and now I’ll be recommending something of this form:


    select ....
    from mytab
    where job_code in (select /*+ cardinality (pv 10) */ code 
                       from table(cast(myvar as mytype)) pv) and ....

    I’ve tested this quite a bit and I’m very pleased with the results. Thank you for the suggestion!

    I do, however, have a followup question. In most cases, I can reasonably predict the number of rows in the table variables that I’ll discuss with my developers. Are you aware of any other mechanisms that I could employ to dynamically inform the CBO of the sizes of the table variables so that it could decide the best execution path?

    This may be a non-concern because there are obvious drawbacks to putting more than a few values into any table variable. I’m thinking PGA here, of course.

    Thanks for your time.

    Comment by Tom Gaines — May 7, 2009 @ 10:43 pm BST May 7,2009 | Reply

    • Tom,
      Might be replying very late on this blog.
      You can opt for Extensible Optimizer which has been explained in-depth by Joze Senegacnik, please search in google or refer book Expert Oracle Practices

      -Yasser

      Comment by Yasser — December 23, 2009 @ 4:35 pm BST Dec 23,2009 | Reply

  9. [...] meccanismo del “lateral join” e mette un link a un suo vecchio post intitolato “Lateral LOBs“, che parla in realtà di lateral views . In questo post viene descritta [...]

    Pingback by SQL JOIN – aggiornamento « Oracle and other — May 8, 2009 @ 2:49 pm BST May 8,2009 | Reply

  10. [...] look at the 10053 trace file for the ANSI example you'll find that Oracle has rewritten it with a LATERAL subquery before optimising it. So, presumably, we can appply the same transformation [...]

    Pingback by ANSI Outer « Oracle Scratchpad — January 31, 2011 @ 7:04 pm BST Jan 31,2011 | Reply


RSS feed for comments on this post. TrackBack URI

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

The Rubric Theme. Blog at WordPress.com.

Follow

Get every new post delivered to your Inbox.

Join 3,514 other followers