Using parallelism with large numbers of partitions

The information in this section also applies when partitioning is configured for manageability, and when partitions are created on physical or logical devices that exhibit little or no parallelism.

For the purpose of this discussion, you have decided to partition a table using range partitioning that represents each week of a year. The issue here is that the query optimizer does not know how the underlying disk system will respond to a 52-way parallel scan. The optimizer must determine the best way to scan the table. If there are enough worker processes configured, the optimizer uses 52 threads to scan the table, which may well cause serious performance issues and be even slower than a serial scan.

To prevent this, first find out exactly how much parallelism is supported. If you know the devices that are used for this table, you can use the following command on a UNIX system, where the underlying device is called /dev/xx:

time dd if=/dev/xx of=/dev/null bs=2k skip=8 count = 102400 &

Assume that time records as x.

Now run two of the same commands concurrently:

time dd if=/dev/xx of=/dev/null bs=2k skip=8 count = 102400 &
time dd if=/dev/xx of=/dev/null bs=2k skip=8 count = 102400 &

This time, assume that time is y. In a linear scale-up, x is the same as y, which is probably impossible to achieve. The following identity may suffice:

x <= y <= (N*x)/k

Where N is the number of simultaneous sessions started and k is a constant that identifies an acceptable improvement level. A good approximation of k might be 1.4, which says that parallel scan is allowed as long as it delivers 40% better metrics than a serial scan.

Table 5-2: Parallel scan metrics

Number of threads

Performance metrics

Acceptable for k=1.4

1

200s

2

245s

245 <= (200*2)/1.4; i.e. 245<=285.71

4

560s

560 <= (200*4)/1.4; i.e. 560<=571.42

5

725s

725 <= (200*5)/1.4; i.e. 725<=714.28

Table 5-2 shows that the disk subsystem did not perform well after four concurrent accesses; the performance numbers went below the acceptable limit established by k. In general, read enough data blocks to allow for any skewed readings.

Having established that 4 threads is optimal, provide this hint by binding it to the object using sp_chgattribute in this way:

sp_chgattribute <tablename>, “plldegree”, 4

This tells the query optimizer to use a maximum of four threads. It may choose fewer than four threads if it does not find enough resources. The same mechanism can be applied to an index. For example, if an index called auth_ind exists on authors and you want to use two threads to access it, use:

sp_chgattribute “authors.auth_ind”, “plldegree”, 4

You must run sp_chgatttribute from the current database.