This set of steps may be useful when:
The table data occupies more than 40 to 45% of the segment.
The table data is not in clustered key order, and you need to create a clustered index.
You do not get satisfactory results trying to load a representative sample of the data, as explained in “Clustered indexes on large tables”.
This set of steps successfully distributes the data in almost all cases, but requires careful attention:
Find the minimum value for the key column for the clustered index:
select min(order_id) from orders
If the clustered index exists, drop it. Drop any nonclustered indexes.
See “Using drop index and create clustered index” or “Using constraints and alter table”.
Execute the command:
set sort_resources on
This command disables create index commands. Subsequent create index commands print information about how the sort will be performed, but do not create the index.
Issue the command to create the clustered index, and record the partition numbers and values in the output. This example shows the values for a table on four partitions:
create clustered index order_cix on orders(order_id)
The Create Index is done using Parallel Sort Sort buffer size: 1500 Parallel degree: 25 Number of output devices: 3 Number of producer threads: 4 Number of consumer threads: 4 The distribution map contains 3 element(s) for 4 partitions. Partition Element: 1 450977 Partition Element: 2 903269 Partition Element: 3 1356032 Number of sampled records: 2449
These values, together with the minimum value from step 1, are the key values that the sort uses as diameters when assigning rows to each partition.
Bulk copy the data out, using character mode.
Unpartition the table.
Truncate the table.
Repartition the table.
In the resulting output data file, locate the minimum key value and each of the key values identified in step 4. Copy these values out to another file, and delete them from the output file.
Copy into the table, using parallel bulk copy to place them on the correct segment. For the values shown above, the file might contain:
1 Jones ... 450977 Smith ... 903269 Harris ... 1356032 Wilder ...
The bcp commands look like this:
bcp testdb..orders:1 in keyrows -F1 -L1 bcp testdb..orders:2 in keyrows -F2 -L2 bcp testdb..orders:3 in keyrows -F3 -L3 bcp testdb..orders:4 in keyrows -F4 -L4
At the end of this operation, you will have one row on the first page of each partition – the same row that creating the index would have allocated to that position.
Turn set sort_resources off, and create the clustered index on the segment, using the with sorted_data option.
Do not include any clauses that force the index creation to copy the data rows.
Use bulk copy to copy the data into the table.
Use a single, nonparallel session. You cannot specify a partition for bulk copy when the table has a clustered index, and running multiple sessions runs the risk of deadlocking.
The clustered index forces the pages to the correct partition.
Use sp_helpartition to check the balance of data pages on the partitions and sp_helpsegment to balance of pages on the segments.
Create any nonclustered indexes.
Dump the database.
While this method can successfully make use of nearly all of the pages in a partition, it has some disadvantages:
The entire table must be copied by a single, slow bulk copy
The clustered index is likely to lead to page splitting on the data pages if the table uses allpages locking, so more space might be required.