Before you begin tuning for performance, there are several considerations.
What kinds of transactions are being replicated? Do these transactions affect the same rows and tables? Are these transactions liable to conflict if applied in parallel? Is the transaction profile constant, or does it change, perhaps with the time of day or month. A clear understanding of your transaction profile helps you select those parameters and settings that will be most useful.
Most primary databases have been tuned to minimize contentions through the use of clustered indexes, partitioning, row-level locking, and so on. Make sure that your replicate database has been tuned similarly.
Tuning your parallel DSI environment is an iterative process. You will need to set parameters, run a test, measure performance, compare against previous measurements, and repeat until you have maximized your results.
Set the dsi_serialization_method parameter to no_wait to enable maximum parallelism. Then attempt to reduce contentions by testing other parameters. Because the wait_for_commit (the default) setting supplies minimal parallelism and therefore minimal benefit, only reset dsi_serialization_method to wait_for_commit after all attempts to reduce contention using the no_wait setting have failed to increase performance.
The dsi_num_threads parameter defines the total number of DSI executor threads; the dsi_num_large_xact_threads parameter defines the total number of DSI executor threads reserved for large transactions. Thus, the total number of DSI executor threads (dsi_num_threads) equals the number of DSI threads reserved for large transactions plus the number of threads available for small transactions.
Increase dsi_num_threads if contention does not increase, or
Decrease dsi_num_threads if contention does not decrease.