Performance Tuning Using Partitioning

Example demonstrating how to determine the source of performance issues in an ESP project and increase throughput using automatic partitioning. Automatic partitioning is the creation of parallel instances of an element and splitting input data across these instances. Partitioning project elements this way can result in an overall project throughput increase as the workload is split across the parallel instances.

See Automatic Partitioning in the Programmers Guide for additional information.

While partitioning can increase the project throughput, it also introduces additional elements into the data flow pipeline which can impact the project latency. In production environments however, the overhead should be marginal.

The performance statistics presented in this example are for demostration purposes only and will vary from system to system depending on hardware configuration. The PerformanceTuningUsingPartitioning projects consists of four elements: an input window (Feed) and three output windows (AvgLong, AvgShort, and AvgCompare). The Feed input window receives data from the XML Input adapter, Adapter1.

To improve overall project throughput, follow this general procedure:

Identify the overall project performance and bottlenecks.
Identify your available system resources.
Partition the bottleneck elements.
Repeat steps above until all elements in the project share equal load.

Identify the source of the performance issues in the PerformanceTuningUsingPartitioning project by first running it without any partitioning enabled. Use the esp_monitor command, or alternatively, the Performance Monitor in the ESP Studio:

For additional details on the esp_monitor command, see esp_monitor in the Utilities Guide. For additional details on the Performance Monitor, see the Performance Monitor section in the Studio Users Guide.
- If using the esp_monitor command:
  1. Start Event Stream Processor.
  2. Using the esp_cluster_admin command, add the PerformanceTuningUsingPartitioning project from $ESP_HOME/examples/ccl.
  3. Use the esp_cluster_admin command to start the project.
  4. Use the _ESP_Streams_Monitor stream, which is a part of esp_monitor output, to determine which element has the highest CPU usage. For example:
```
<_ESP_Streams_Monitor ESP_OPS="u" stream_name="Feed" target="Feed" cpu_pct="18.720100" trans_per_sec="13824.000000"…   
<_ESP_Streams_Monitor ESP_OPS="u" stream_name="AvgShort" target="AvgShort" cpu_pct="59.280400" trans_per_sec="13814.000000"…   
<_ESP_Streams_Monitor ESP_OPS="u" stream_name="AvgLong" target="AvgLong" cpu_pct="99.840600" trans_per_sec="13634.000000"…   
<_ESP_Streams_Monitor ESP_OPS="u" stream_name="AvgCompare" target="AvgCompare" cpu_pct="49.920300" trans_per_sec="27437.000000"…
```
    Actual values may vary from system to system depending on your hardware configuration. By looking at the cpu_pct field, you can see that the AvgLong element has the highest CPU consumption (99.840600) and therefore, is the bottleneck in this project. The rate at which data flows through the project is determined by the number of transactions processed by the first element, the input window (Feed), and equals about14,000 transactions per second.
- If using the ESP Studio:
  1. Start Event Stream Processor.
  2. Start the Studio and load the PerformanceTuningUsingPartitioning example from $ESP_HOME/examples/ccl.
  3. Start the PerformanceTuningUsingPartitioning project.
  4. In the SAP Sybase ESP Run-Test perspective, select the Monitor view.
  5. Click Select Running Project ().
  6. Click OK.
  7. Select Rows Processed and leave the default coloring. The adapter starts to load data and fill the windows.
  8. Hover your cursor over the elements in the project to determine which element has the highest CPU usage. Actual values may vary from system to system depending on your hardware configuration. In this case, AvgLong has 99.840 CPU usage and therefore, is the bottleneck in this project.
Stop the PerformanceTuningUsingPartitioning project.
Partition the AvgLong element to improve the project throughput. Since AvgLong is a window that calculates an aggregation grouping by the Symbol column, apply HASH partitioning on the Symbol column.

HASH partitioning ensures that all events for a given symbol are always sent to the same partition, which ensures an accurate result for the average values. See PARTITION BY Clause in the Programmers Reference guide, and Creating a Partition in the Studio Users Guide for details on how to add a partition to a project.
Save your changes and recompile the project.
Ensure your system has sufficient resources (CPUs, CPU cores) available to support the given number of threads the SAP Sybase ESP Server is using.

The SAP Sybase ESP Server uses a new thread for each of the four elements, including the input adapter, Adapter1. Therefore, the project is already using five threads before any partitioning is introduced. Using partitioning to introduce too high a number of new threads compared to the available system resources may cause the project performance to degrade rather than improve.
Start the PerformanceTuningUsingPartitioning project.
Here is sample output from the _ESP_Streams_Monitor stream:
```
<_ESP_Streams_Monitor ESP_OPS="u"  stream_name="Feed" target="Feed" cpu_pct="21.840200" trans_per_sec="21439.000000"… 
<_ESP_Streams_Monitor ESP_OPS="u"  stream_name="AvgShort" target="AvgShort" cpu_pct="87.360600" trans_per_sec="21239.000000"… 
<_ESP_Streams_Monitor ESP_OPS="u"  stream_name="AvgCompare" target="AvgCompare" cpu_pct="53.040400" trans_per_sec="42381.000000"…
<_ESP_Streams_Monitor ESP_OPS="u"  stream_name="AvgLong_Feed" target="AvgLong_Feed" cpu_pct="12.480000" trans_per_sec="21960.000000"…
<_ESP_Streams_Monitor ESP_OPS="u"  stream_name="AvgLong.1" target="AvgLong.1" cpu_pct="93.600700" trans_per_sec="10340.000000"…
<_ESP_Streams_Monitor ESP_OPS="u"  stream_name="AvgLong.2" target="AvgLong.2" cpu_pct="99.840700" trans_per_sec="10828.000000"…
<_ESP_Streams_Monitor ESP_OPS="u"  stream_name="AvgLong" target="AvgLong" cpu_pct="23.400300" trans_per_sec="21168.000000"…
```
The project now has three new elements: AvgLong.1 and AvgLong.2 which are the partitions of the AvgLong element, and AvgLong_Feed, which is the splitter element that partitions the data stream across the two partitions. The AvgLong element is a union that merges the results of the AvgLong.1 and AvgLong.2 partitions.

The overall project throughput has increased from about 14,000 transactions per second to about 22,000 (see the cpu_pct field for the Feed element). The two partitions, AvgLong.1 and AvgLong.2, also receive about half of these transactions seen by the AvgLong_Feed element. However, even after partitioning, the two partitions have the highest CPU utilization within the project. This suggests that the AvgLong element requires additional partitioning.

The output above also suggests that the AvgShort element is also a potential bottleneck as its CPU usage has increased from about 60 to 90 percent.
Stop the project.
Increase the number of partitions for the AvgLong element from two to four.

Save, recompile, and restart the project.

Here is a sample of the esp_monitor output once the number of partitions is increased to four:

<_ESP_Streams_Monitor ESP_OPS="u"  stream_name="Feed" target="Feed" cpu_pct="32.760300" trans_per_sec="28160.000000"…
<_ESP_Streams_Monitor ESP_OPS="u"  stream_name="AvgShort" target="AvgShort" cpu_pct="101.400700" trans_per_sec="27967.000000"…
<_ESP_Streams_Monitor ESP_OPS="u"  stream_name="AvgCompare" target="AvgCompare" cpu_pct="70.200400" trans_per_sec="55875.000000"…
<_ESP_Streams_Monitor ESP_OPS="u"  stream_name="AvgLong_Feed" target="AvgLong_Feed" cpu_pct="21.840100" trans_per_sec="28160.000000"…
<_ESP_Streams_Monitor ESP_OPS="u"  stream_name="AvgLong.1" target="AvgLong.1" cpu_pct="48.360300" trans_per_sec="7932.000000"…
<_ESP_Streams_Monitor ESP_OPS="u"  stream_name="AvgLong.2" target="AvgLong.2" cpu_pct="62.400400" trans_per_sec="8249.000000"…
<_ESP_Streams_Monitor ESP_OPS="u"  stream_name="AvgLong.3" target="AvgLong.3" cpu_pct="45.240300" trans_per_sec="6267.000000"…
<_ESP_Streams_Monitor ESP_OPS="u"  stream_name="AvgLong.4" target="AvgLong.4" cpu_pct="32.760200" trans_per_sec="5534.000000"…
<_ESP_Streams_Monitor ESP_OPS="u"  stream_name="AvgLong" target="AvgLong" cpu_pct="18.720100" trans_per_sec="27982.000000"…

The AvgLong element now has four partitions: AvgLong.1, AvgLong.2, AvgLong.3, and AvgLong.4. The CPU utilization for these partitions is significantly under 100 percent. However, the AvgShort element has reached 100 percent of its CPU utilization, which suggests it is the next candidate for partitioning. Overall, the number of transactions per second has increased from about 22,000 to 28,000.

Stop the project.
Partition the AvgShort element using HASH partitioning over the Symbol column.

Save, recompile, and restart the project.

Here is a sample of the esp_monitor output once the AvgShort element has been partitioned:

<_ESP_Streams_Monitor ESP_OPS="u"  stream_name="Feed" target="Feed" cpu_pct="54.600400" trans_per_sec="42099.000000"…
<_ESP_Streams_Monitor ESP_OPS="u"  stream_name="AvgCompare" target="AvgCompare" cpu_pct="99.840700" trans_per_sec="82653.000000"…
<_ESP_Streams_Monitor ESP_OPS="u"  stream_name="AvgShort_Feed" target="AvgShort_Feed" cpu_pct="60.840400" trans_per_sec="41732.000000"…
<_ESP_Streams_Monitor ESP_OPS="u"  stream_name="AvgShort.1" target="AvgShort.1" cpu_pct="76.440600" trans_per_sec="20526.000000"…
<_ESP_Streams_Monitor ESP_OPS="u"  stream_name="AvgShort.2" target="AvgShort.2" cpu_pct="76.440500" trans_per_sec="20414.000000"…
<_ESP_Streams_Monitor ESP_OPS="u"  stream_name="AvgShort" target="AvgShort" cpu_pct="20.280100" trans_per_sec="40543.000000"…
<_ESP_Streams_Monitor ESP_OPS="u"  stream_name="AvgLong_Feed" target="AvgLong_Feed" cpu_pct="28.080200" trans_per_sec="42108.000000"…
<_ESP_Streams_Monitor ESP_OPS="u"  stream_name="AvgLong.1" target="AvgLong.1" cpu_pct="73.320600" trans_per_sec="11619.000000"…
<_ESP_Streams_Monitor ESP_OPS="u"  stream_name="AvgLong.2" target="AvgLong.2" cpu_pct="81.120500" trans_per_sec="12758.000000"…
<_ESP_Streams_Monitor ESP_OPS="u"  stream_name="AvgLong.3" target="AvgLong.3" cpu_pct="54.600300" trans_per_sec="9952.000000"…
<_ESP_Streams_Monitor ESP_OPS="u"  stream_name="AvgLong.4" target="AvgLong.4" cpu_pct="43.680200" trans_per_sec="8286.000000"…
<_ESP_Streams_Monitor ESP_OPS="u"  stream_name="AvgLong" target="AvgLong" cpu_pct="18.720100" trans_per_sec="42359.000000"…

The overall project throughput has increased from about 28,000 transactions per second to 42,000. Therefore, performance has increased by about 300 percent. Also, the AvgCompare element, which performs a join between the AvgShort and AvgLong results, is now using about 100 percent of its CPU utilization. However, the project is already using 700 percent of CPU meaning that your system requires at least seven cores (hardware threads) to accommodate the current project setup. In production environments, the number of CPU cores may be higher as context switches introduce additional, nonnegligible overhead.

Stop the project.
Partition the AvgCompare element using HASH partitioning for both input windows of the AvgCompare element (AvgShort and AvgLong).

Save, recompile, and restart the project.

Here is a sample of the esp_monitor output once the AvgCompare element has been partitioned:

<_ESP_Streams_Monitor ESP_OPS="u"  stream_name="Feed" target="Feed" cpu_pct="63.960400" trans_per_sec="55063.000000"…
<_ESP_Streams_Monitor ESP_OPS="u"  stream_name="AvgShort_Feed" target="AvgShort_Feed" cpu_pct="67.080400" trans_per_sec="55414.000000"…
<_ESP_Streams_Monitor ESP_OPS="u"  stream_name="AvgShort.1" target="AvgShort.1" cpu_pct="88.920500" trans_per_sec="27194.000000"…
<_ESP_Streams_Monitor ESP_OPS="u"  stream_name="AvgShort.2" target="AvgShort.2" cpu_pct="92.040600" trans_per_sec="28155.000000"…
<_ESP_Streams_Monitor ESP_OPS="u"  stream_name="AvgShort" target="AvgShort" cpu_pct="45.240300" trans_per_sec="55346.000000"…
<_ESP_Streams_Monitor ESP_OPS="u"  stream_name="AvgLong_Feed" target="AvgLong_Feed" cpu_pct="26.520200" trans_per_sec="54978.000000"…
<_ESP_Streams_Monitor ESP_OPS="u"  stream_name="AvgLong.1" target="AvgLong.1" cpu_pct="53.040300" trans_per_sec="15106.000000"…
<_ESP_Streams_Monitor ESP_OPS="u"  stream_name="AvgLong.2" target="AvgLong.2" cpu_pct="73.320500" trans_per_sec="16461.000000"…
<_ESP_Streams_Monitor ESP_OPS="u"  stream_name="AvgLong.3" target="AvgLong.3" cpu_pct="35.880200" trans_per_sec="11683.000000"…
<_ESP_Streams_Monitor ESP_OPS="u"  stream_name="AvgLong.4" target="AvgLong.4" cpu_pct="14.040100" trans_per_sec="10816.000000"…
<_ESP_Streams_Monitor ESP_OPS="u"  stream_name="AvgLong" target="AvgLong" cpu_pct="45.240300" trans_per_sec="53577.000000"…
<_ESP_Streams_Monitor ESP_OPS="u"  stream_name="AvgCompare_AvgShort" target="AvgCompare_AvgShort" cpu_pct="40.560300" trans_per_sec="55346.000000"…
<_ESP_Streams_Monitor ESP_OPS="u"  stream_name="AvgCompare_AvgLong" target="AvgCompare_AvgLong" cpu_pct="39.000300" trans_per_sec="53585.000000"…
<_ESP_Streams_Monitor ESP_OPS="u"  stream_name="AvgCompare.1" target="AvgCompare.1" cpu_pct="98.280600" trans_per_sec="54077.000000"…
<_ESP_Streams_Monitor ESP_OPS="u"  stream_name="AvgCompare.2" target="AvgCompare.2" cpu_pct="96.720700" trans_per_sec="55077.000000"…
<_ESP_Streams_Monitor ESP_OPS="u"  stream_name="AvgCompare" target="AvgCompare" cpu_pct="65.520500" trans_per_sec="109147.000000"…

The overall project throughput has increased from about 42,000 transactions per second to 55,000. This is a significant improvement from a nonpartitioned project. The AvgCompare, AvgShort.1, and AvgShort.2 elements have the highest load in the project. You can further tune the project performance by continuing to partition elements within the project.

Note that the maximum achievable throughput of the project is limited by the maximum throughput of the nonpartitionable elements, such as the partitioner itself. See PARTITION BY Clause in the Programmers Reference guide for a list of nonpartitionable elements. Moreover, the achievable degree of partitioning highly depends on the distribution of data across partitions. In case of uneven distribution, the partition receiving the highest number of events automatically becomes a bottleneck for the project.