I don’t have any better knowledge than you do on this aspect of sampling. Your notes prompted me to set up a table with 1,000 partitions and try a few quick tests using the dynamic_sampling hint running through the levels 1 to 9 (which – in the hint form – start Oracle with 32 blocks and then keep doubling up).

The results were interesting – variations on large percentages of a selection of partiitons to very small percentages of the whole table. And some of the results certainly looked consistent with your suggestions.

It could be useful to work out more detail, but I think that I’d probably want to force larger sample sizes anyway when looking at large partitioned tables, so working at the bottom end of the sample size might be counterproductive.

]]>I forgot to mention that the Oracle version where I ran my testing is 10.2.0.5, using ASSM.

Thanks,

Davide

I came across an issue when using dynamic_sampling on partitioned tables where the optimizer selectivity estimation resulted to be very poor.

After detailed analysis I came up with a guess that I explain below.

I was thinking that if you have a chance to read it you might be able to confirm or rectify this theory.

Regards,

Davide

DYNAMIC_SAMPLING BEHAVIOUR ON PARTITIONED TABLES

In the case of a partitioned table the dynamic_sampling engine behaves in a way that will be described below and that can be harmful in specific cases.

The dinamic_sampling level, set at query level using the dynamic_sampling hint or at database level setting the parameter optimizer_dynamic_sampling, indicates in addition to other information how many blocks need to be sampled.

Let’s call this value as N.

I have observed tracing the 10053 event that the actual sampled blocks in the case of a non-partitioned table is N-1, so I am assuming that one of the sampled blocks is the segment header that is probably needed to be read to randomly choose the actual blocks to be sampled.

In the case of a partitioned table I have observed the following behavior of the optimizer.

Assuming that “p” is the number of partitons hit by the query and “N” is the number of blocks to be sampled as indicated by the dynamic_sampling level, the optimizer will calculate the actual number of blocks to be sampled (“n”) as follows:

if p n=(N-p)

if p >= N –> n=N/2

I have explained this with the following theory:

when oracle uses the SAMPLE clause in a select statement (which is used in the sample query generated by the optimizer) it will not consider the segment header, which I assume is needed to be read to randomly choose the sampled blocks, to be included in the required percentage.

In the case of a partitioned table the number of segment headers read is equal to the number of the partitions hit by the query.

When using dynamic_sampling the optimizer doesn’t want to read extra blocks other than what indicated by the dynamic_sampling level, so it will calculate the percentage to be used in the SAMPLE clause considering the number of actual sampled blocks, which is the number of blocks indicated by the dynamic_sampling level (N) less the number of segment headers to be read.

In this case it will meake sure than the sample query will not read more than N blocks (segment headers included).

In the previous formulae if p = N the calculation N-p would give a number less or equal to 0 leading to an inappropriate value to be used in the SAMPLE clause. That’s why it defaults to a number of actual sampled blocks equal to N/2 (and indeed the sample query generated is an union of different SELECT statements hitting different partitons to a total of N/2).

Below there are some excerpts from the various 10053 trace files.

In all the cases the database parameter optimizer_dynamic_sampling was set to 4, which indicates that 32 is the number of blocks to be sampled:

1. Query hitting 1 partition:

total partitions : 1

partitions for sampling : 1

…

max. sample block cnt. : 32

sample block cnt. : 31

2. Query hitting 2 partitions:

total partitions : 1932

partitions for sampling : 2

…

max. sample block cnt. : 32

sample block cnt. : 30

3. Query hitting 31 partitions:

total partitions : 1932

partitions for sampling : 31

…

max. sample block cnt. : 32

sample block cnt. : 1

4. Query hitting 32 partitions:

total partitions : 1932

partitions for sampling : 32

partitions actually sampled from : 16

partitioning pct. : 1.655458

…

max. sample block cnt. : 32

sample block cnt. : 16

5. Query hitting 45 partitions:

total partitions : 1932

partitions for sampling : 45

partitions actually sampled from : 16

partitioning pct. : 2.329193

…

max. sample block cnt. : 32

sample block cnt. : 16

In the above excerpts “max. sample block cnt.” is the number of blocks to be sampled set by the dynamic_samplic level (our “N” variable), the “partitions for sampling” is the number of partitions hit by the query (our “p” variable) and “sample block cnt.” is the actual sampled blocks (our “n” variable).

You will see that when p is very close to N (i.e. excerpt number 3) we have a very small blocks actually used in the sample query, and this can unfortunately lead to an inaccurate estimation by the optimizer.

]]>