Oracle Scratchpad

June 11, 2018


Filed under: Infrastructure,Oracle,Performance — Jonathan Lewis @ 8:31 am BST Jun 11,2018

In a recent ODC thread someone had a piece of SQL that was calling dbms_random.string(‘U’,20) to generate random values for a table of 100,000,000 rows. The thread was about how to handle the ORA-30009 error (not enough memory for operation) that is almost inevitable when you use the “select from dual connect by level <= n” strategy for generating very large numbers of rows, but this example of calling dbms_random.string() so frequently prompted me to point out an important CPU saving , and then publicise through this blog a little known fact about the dbms_random.string() function.

If you generate a random string of length 6 using only upper-case letters there are 308,915,766 different combinations (266); so if you’re after “nearly unique” values for 100 million rows then a six character string is probably good enough – it might give you a small percentage of values which appear in a handful rows but most of the values are likely to be unique or have two rows. If you want to get closer to uniqueness then 7 characters will do it, and 8 will make it almost certain that you will get a unique value in every row.

So if you want “nearly unique” and “random 20 character strings” it’s probably sufficient to generate random strings of 6 to 8 characters and then rpad() them up to 20 characters with spaced – the saving in CPU will be significant; roughly a factor of 3 (which is going to matter when you’re trying to generate 100 million rows. As a little demo I supplied the OP with a script to create a table of just one million random strings – first of 20 random characters, then of 6 random characters with 14 spaces appended. The run time (mostly CPU) dropped from 1 minute 55 seconds to 41 seconds.

Why is there such a difference ? Because to generate a random string of 6 characters Oracle generates a random string of one character six times in a row and concatenates them. The difference between 6 calls and 20 calls per row gives you that factor of around 3. For a quick demo, try running the following anonymous PL/SQL block:

rem     Script:         random_speed.sql
rem     Author:         Jonathan Lewis
rem     Dated:          May 2010



Here are the results I got from instances of,, and (from LiveSQL):


I haven’t shown the tests for all the possible dbms_random.string() options but, unsurprisingly, changing the test to use the ‘L’ (lower case alpha) option produces the same effect (and the same 6 letters changed to lower case). The same effect, with different characters, also appeared using the ‘A’ (mixed case alpha), ‘X’ (uppercase alphanumeric) and ‘P’ (all printable characters) options.

I haven’t considered the effect of using a multi-byte character set – maybe Oracle calls its random number generator once per byte rather than once per character. The investigation is left as an exercise to the interested reader.


When generating a very large number of random strings – keep the “operational” part of the string as short as you can and leave the rest to be rpad()‘ed.

Update 24th Feb 2020

Until yesterday I hadn’t considered the possibility that I could just examine the dbms_random package – it’s not wrapped, and it’s generated by the script $ORACLE_HOME/rdbms/admin/dbmsrand.sql; so you just have to read the (very short) source to see how it generates strings (and to answer the question about multibyte character sets).




  1. I wondered if using Linux utilities to generate the random strings might be quicker.

    Based on the first attempt, no it is not quicker


    JKSTILL@p1 > l
      1  declare
      3     s varchar2(12);
      5  begin
      7     for i in 1..1e6
      8     loop
     10             s := dbms_random.string('U',12);
     12     end loop;
     15* end;
    JKSTILL@p1 > /
    PL/SQL procedure successfully completed.
    Elapsed: 00:00:09.83

    I have seen that run in 7 seconds as well when the server is less busy.

    From the OS generating similar data:

     time dd if=/dev/urandom bs=2048  | tr -cd '[:upper:]' | dd iflag=fullblock bs=12 count=1000000 | perl -e 'my $FH=*STDIN; while ( my $l = read $FH, $d, 12 ){ print "$d\n" }' > x
    1000000+0 records in
    1000000+0 records out
    12000000 bytes (12 MB) copied, 8.80371 s, 1.4 MB/s
    real    0m8.808s
    user    0m0.845s
    sys     0m9.979s

    There may be better ways to do this. I used Perl because the sed regex was becoming incomprehensible.

    The amount of time added by Perl is negligible; this is all /dev/urandom.

    So Oracle developers seem to have done quite a good job at optimizing the dbms_random.string() function, even if it is somewhat slow.

    Of course by using /dev/urandom as significant portion of the output is filtered by the tr command.

    If there were a method to get only a certain range of data from something like /dev/urandom, this might be much faster.

    Oracle may be using /dev/urandom as well, though I did not try to discover if that is the case.

    Comment by jkstill — June 11, 2018 @ 5:30 pm BST Jun 11,2018 | Reply

    • I’m just wondering what might happen if this was done as an external table with a pre-processor script that output /dev/urandom directly? I don’t have a system that I can test it on at the moment unfortunately.

      Comment by John — June 14, 2018 @ 2:27 pm BST Jun 14,2018 | Reply

  2. Hi, Jonathan

    Thanks for your this blog for my thread on ODC database forum.

    Apart from I modified my code according to your nice suggestion – using rpad() to fill the left 14 characters for “rpad(dbms_random.string(‘x’, 6), 20)”. Pleas take a look at the code as follows,

    SQL> drop table t1 purge;
    Table dropped.
    SQL> set timing on
    SQL> create table t1
      2  segment creation immediate
      3  nologging
      4  as
      5  with x as ((SELECT level r FROM dual CONNECT BY level <= 1e4))  -->comment to fix WordPress format issue
      6  select r as id
      7            , to_char(sysdate + r/24/3600, 'yyyy-mm-dd hh24:mi:ss') as inc_datetime
      8            , trunc(dbms_random.value(0, 100)) as random_id
      9            , rpad(dbms_random.string('x', 6), 20) random_string
     10  from  (SELECT /*+ no_merge */
     11             rownum r
     12             FROM  x d1,
     13                          x d2
     14             WHERE  rownum <= 1e8   --> comment to fix WordPress format issue
     15  );
    Table T1 created.
    Elapsed: 01:09:30.462
    SQL> set timing off
    SQL> select sum(bytes)/1024/1024 size_mb from user_segments
      2  where segment_name = 'T1';

    I also feel this speed is very slow. Do you have better ideas for improving my code? I am not sure that whether it is more efficient and fast about using parallel hint.

    Best Regards
    Quanwen Zhao

    Comment by Quanwen Zhao — June 12, 2018 @ 8:27 am BST Jun 12,2018 | Reply

    • Quanwen Zhao,

      The first thing to do is check the execution plan then investigate WHERE the time is spent.

      Running your code for 1e5 rows takes my machine 6 seconds which appears to be similar to your hardware (6 seconds * 1e3 = 1 hour 40 minutes) – so the rates are in the same ballpark. Before running 1e8 rows I would experiment with 1e5 or 1e6 a few times.

      Creating one column at a time:

      id: 0.15 seconds
      inc_datetime: 0.24 seconds
      random_id: 1.57 seconds
      random_string: 4.47 seconds

      random_string without padding: 4.21 seconds — I used (cast(dbms_random.string(‘x’, 6) as varchar2(6))

      random_string reduced to 4 random characters with no padding: 3.07 seconds.

      Almost all the time goes on calls to dbms_random() – so there’s no way to reduce the time and still get the same data – unless you run the data creation in parallel.

      There is a problem with parallelism though: including rownum in the query serializes the data generation. (See: and ).

      In this case you can avoid rownum (in the main query – it’s safe in the “with subquery”) by replacing your reference to rownum an expression like “(d1.r-1) * 1e4 + d2.r.” (and get rid of the rownum <= 1e8). If you do this you should find the query goes parallel. To make effective use of parallelism, though, you will have to add a padding column to the driving "with subquery" so that the materialized view takes a larger number of blocks.


      create table t1
      with x as (
              select  /*+ materialize cardinality(1000) */
                      level rn0, rpad('x', 300) padding
              connect by
                      level <= 1e3  -- &gt; comment to fix WordPress format issue
      y as (
                      /*+ parallel(3) cardinality(1e6) */
                      (d1.rn0 - 1) * 1000 + d2.rn0    rn1
              from    x d1, x d2
              rn1 as id,
                      sysdate + rn1 / 86400,
                      'yyyy-mm-dd hh24:mi:ss'
              )                                       as inc_datetime,
              trunc(dbms_random.value(0, 100))        as random_id,
              rpad(dbms_random.string('x', 6), 20)    as random_string

      Running serially this completed in 61 seconds, while running parallel 3 it completed in 26 seconds. (The poor scaling is probably because it's running as VM with a deliberate choke on CPU consumption).

      Note that I've created 1e6 rows by using the same CTE with 1e3 rows twice.

      Comment by Jonathan Lewis — June 14, 2018 @ 1:04 pm BST Jun 14,2018 | Reply

      • Jonathan,

        Thank you very much. -:)

        I’m going to read two posts you provided to me and continuously do some testing according to your good suggestion!

        Comment by Quanwen Zhao — June 16, 2018 @ 9:37 am BST Jun 16,2018 | Reply

  3. […] 14, 2018 : according to Jonathan gave me a comment on his blog post “dbms_random”, I erased all this prompt “SQL> “ in […]

    Pingback by SET SQLFORMAT on SQLcl 18.1.1 | Focusing and Concentrating Oracle Recipes, Tips and Techniques — June 14, 2018 @ 4:14 am BST Jun 14,2018 | Reply

  4. No matter, Jonathan :-)

    You can fix up them on your spare time. I’ve also found like this statement “connect by level <= 1e6", in particularly, “<" symbol often have an issue. It’s best to replace with “<“. If it’s “>” and replace with “>”. All of these is what I find from your “Comments” – Many thanks.

    BTW afterwards I add 2 hints – PARALLEL and NO_MERGE on my SQL code snippet as well, nevertheless its spending time is still 1 hours and 10 minutes. Where is about consuming time? “1e8” rows of data is too much? How to trace these time cost?

    drop table t1 purge;
    Table T1 dropped.
    Elapsed: 00:00:00.337
    create table t1
    segment creation immediate
    with x as ((SELECT level r FROM dual CONNECT BY level <= 1e4))
    select /*+ parallel */ /*+ no_merge */
           r as id
           , to_char(sysdate + r/24/3600, 'yyyy-mm-dd hh24:mi:ss') as inc_datetime
           , trunc(dbms_random.value(0, 100)) as random_id
           , rpad(dbms_random.string('x', 6), 20) random_string
    --     , dbms_random.string('x', 20) random_string
    from  (SELECT rownum r
           FROM  x d1,
                 x d2
           WHERE  rownum &lt;= 1e8
    Table T1 created.
    Elapsed: 01:10:23.118

    Very appreciate you if any help.

    Best Regards
    Quanwen Zhao

    Comment by Quanwen Zhao — June 14, 2018 @ 8:00 am BST Jun 14,2018 | Reply

RSS feed for comments on this post. TrackBack URI

Comments and related questions are welcome.

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Powered by