Since number of sort buffers controls the amount of data that can be read and sorted in one batch, configuring more sort buffers increases the batch size, reduces the number of merge runs needed, and makes the sort run faster. Changing number of sort buffers is dynamic, so you do not have to restart the server.
Some general guidelines for configuring sort buffers are as follows:
The sort manager chooses serial sorts when the number of pages in a table is less than 8 times the value of number of sort buffers. In most cases, the default value (500) works well for select queries and small indexes. At this setting, the sort manager chooses serial sorting for all create index and worktable sorts of 4000 pages or less, and parallel sorts for larger result sets, saving worker processes for query processing and larger sorts. It allows multiple sort processes to use up to 500 sort buffers simultaneously.
A temporary worktable would need to be very large before you would need to set the value higher to reduce the number of merge runs for a sort. See “Sizing the tempdb” on page 385 in the Performance and Tuning: Basics for more information.
If you are creating indexes on large tables while other users are active, configure the number of sort buffers so that you do not disrupt other activity that needs to use the data cache.
If you are re-creating indexes during scheduled maintenance periods when few users are active on the system, you may want to configure a high value for sort buffers. To speed your index maintenance, you may want to benchmark performance of high sort buffer values, large I/O, and cache bindings to optimize your index activity.
The reduction in merge runs is a logarithmic function. Increasing the value of number of sort buffers from 500 to 600 has very little effect on the number of merge runs. Increasing the size to a much larger value, such as 5000, can greatly speed the sort by reducing the number of merge runs and the amount of I/O needed.
If number of sort buffers is set to less than the square root of the worktable size, sort performance is degraded. Since worktables include only columns specified in the select list plus columns needed for later joins, worktable size for merge joins is usually considerably smaller than the original table size.
Configure enough sort buffers
The sort buffers decides how many pages of data you can sort in each run. That is the basis for the logrithmic function on calculating the number of runs needed to finish the sort.
For example, if you have 500 buffers, then the number of runs is calculated with "log (number of pages in table) with 500 as the log base".
Also note that the number of sort buffers is shared by threads in the parallel sort, if you do not have enough sort buffers, the parallel sort may not work as fast as it should.
When enough sort buffers are configured, fewer intermediate steps and merge runs need to take place during a sort, and physical I/O is required. When number of sort buffers is equal to or greater than the number of pages in the table, the sort can be performed completely in cache, with no physical I/O for the intermediate steps: the only I/O required is the I/O to read and write the data and index pages.
Configure large buffers pools in a named cache, bound the cache to the table, so you can have large I/O.