Example demonstrating how to determine the source of performance issues in an ESP project and increase throughput using automatic partitioning. Automatic partitioning is the creation of parallel instances of an element and splitting input data across these instances. Partitioning project elements this way can result in an overall project throughput increase as the workload is split across the parallel instances.
See Automatic Partitioning in the Programmers Guide for additional information.
While partitioning can increase the project throughput, it also introduces additional elements into the data flow pipeline which can impact the project latency. In production environments however, the overhead should be marginal.
The performance statistics presented in this example are for demostration purposes only and will vary from system to system depending on hardware configuration. The PerformanceTuningUsingPartitioning projects consists of four elements: an input window (Feed) and three output windows (AvgLong, AvgShort, and AvgCompare). The Feed input window receives data from the XML Input adapter, Adapter1.
For additional details on the esp_monitor command, see esp_monitor in the Utilities Guide. For additional details on the Performance Monitor, see the Performance Monitor section in the Studio Users Guide.
<_ESP_Streams_Monitor ESP_OPS="u" stream_name="Feed" target="Feed" cpu_pct="18.720100" trans_per_sec="13824.000000"… <_ESP_Streams_Monitor ESP_OPS="u" stream_name="AvgShort" target="AvgShort" cpu_pct="59.280400" trans_per_sec="13814.000000"… <_ESP_Streams_Monitor ESP_OPS="u" stream_name="AvgLong" target="AvgLong" cpu_pct="99.840600" trans_per_sec="13634.000000"… <_ESP_Streams_Monitor ESP_OPS="u" stream_name="AvgCompare" target="AvgCompare" cpu_pct="49.920300" trans_per_sec="27437.000000"…Actual values may vary from system to system depending on your hardware configuration. By looking at the cpu_pct field, you can see that the AvgLong element has the highest CPU consumption (99.840600) and therefore, is the bottleneck in this project. The rate at which data flows through the project is determined by the number of transactions processed by the first element, the input window (Feed), and equals about14,000 transactions per second.
HASH partitioning ensures that all events for a given symbol are always sent to the same partition, which ensures an accurate result for the average values. See PARTITION BY Clause in the Programmers Reference guide, and Creating a Partition in the Studio Users Guide for details on how to add a partition to a project.
The SAP Sybase ESP Server uses a new thread for each of the four elements, including the input adapter, Adapter1. Therefore, the project is already using five threads before any partitioning is introduced. Using partitioning to introduce too high a number of new threads compared to the available system resources may cause the project performance to degrade rather than improve.
<_ESP_Streams_Monitor ESP_OPS="u" stream_name="Feed" target="Feed" cpu_pct="21.840200" trans_per_sec="21439.000000"… <_ESP_Streams_Monitor ESP_OPS="u" stream_name="AvgShort" target="AvgShort" cpu_pct="87.360600" trans_per_sec="21239.000000"… <_ESP_Streams_Monitor ESP_OPS="u" stream_name="AvgCompare" target="AvgCompare" cpu_pct="53.040400" trans_per_sec="42381.000000"… <_ESP_Streams_Monitor ESP_OPS="u" stream_name="AvgLong_Feed" target="AvgLong_Feed" cpu_pct="12.480000" trans_per_sec="21960.000000"… <_ESP_Streams_Monitor ESP_OPS="u" stream_name="AvgLong.1" target="AvgLong.1" cpu_pct="93.600700" trans_per_sec="10340.000000"… <_ESP_Streams_Monitor ESP_OPS="u" stream_name="AvgLong.2" target="AvgLong.2" cpu_pct="99.840700" trans_per_sec="10828.000000"… <_ESP_Streams_Monitor ESP_OPS="u" stream_name="AvgLong" target="AvgLong" cpu_pct="23.400300" trans_per_sec="21168.000000"…
The project now has three new elements: AvgLong.1 and AvgLong.2 which are the partitions of the AvgLong element, and AvgLong_Feed, which is the splitter element that partitions the data stream across the two partitions. The AvgLong element is a union that merges the results of the AvgLong.1 and AvgLong.2 partitions.
The overall project throughput has increased from about 14,000 transactions per second to about 22,000 (see the cpu_pct field for the Feed element). The two partitions, AvgLong.1 and AvgLong.2, also receive about half of these transactions seen by the AvgLong_Feed element. However, even after partitioning, the two partitions have the highest CPU utilization within the project. This suggests that the AvgLong element requires additional partitioning.
The output above also suggests that the AvgShort element is also a potential bottleneck as its CPU usage has increased from about 60 to 90 percent.
<_ESP_Streams_Monitor ESP_OPS="u" stream_name="Feed" target="Feed" cpu_pct="32.760300" trans_per_sec="28160.000000"… <_ESP_Streams_Monitor ESP_OPS="u" stream_name="AvgShort" target="AvgShort" cpu_pct="101.400700" trans_per_sec="27967.000000"… <_ESP_Streams_Monitor ESP_OPS="u" stream_name="AvgCompare" target="AvgCompare" cpu_pct="70.200400" trans_per_sec="55875.000000"… <_ESP_Streams_Monitor ESP_OPS="u" stream_name="AvgLong_Feed" target="AvgLong_Feed" cpu_pct="21.840100" trans_per_sec="28160.000000"… <_ESP_Streams_Monitor ESP_OPS="u" stream_name="AvgLong.1" target="AvgLong.1" cpu_pct="48.360300" trans_per_sec="7932.000000"… <_ESP_Streams_Monitor ESP_OPS="u" stream_name="AvgLong.2" target="AvgLong.2" cpu_pct="62.400400" trans_per_sec="8249.000000"… <_ESP_Streams_Monitor ESP_OPS="u" stream_name="AvgLong.3" target="AvgLong.3" cpu_pct="45.240300" trans_per_sec="6267.000000"… <_ESP_Streams_Monitor ESP_OPS="u" stream_name="AvgLong.4" target="AvgLong.4" cpu_pct="32.760200" trans_per_sec="5534.000000"… <_ESP_Streams_Monitor ESP_OPS="u" stream_name="AvgLong" target="AvgLong" cpu_pct="18.720100" trans_per_sec="27982.000000"…
The AvgLong element now has four partitions: AvgLong.1, AvgLong.2, AvgLong.3, and AvgLong.4. The CPU utilization for these partitions is significantly under 100 percent. However, the AvgShort element has reached 100 percent of its CPU utilization, which suggests it is the next candidate for partitioning. Overall, the number of transactions per second has increased from about 22,000 to 28,000.
<_ESP_Streams_Monitor ESP_OPS="u" stream_name="Feed" target="Feed" cpu_pct="54.600400" trans_per_sec="42099.000000"… <_ESP_Streams_Monitor ESP_OPS="u" stream_name="AvgCompare" target="AvgCompare" cpu_pct="99.840700" trans_per_sec="82653.000000"… <_ESP_Streams_Monitor ESP_OPS="u" stream_name="AvgShort_Feed" target="AvgShort_Feed" cpu_pct="60.840400" trans_per_sec="41732.000000"… <_ESP_Streams_Monitor ESP_OPS="u" stream_name="AvgShort.1" target="AvgShort.1" cpu_pct="76.440600" trans_per_sec="20526.000000"… <_ESP_Streams_Monitor ESP_OPS="u" stream_name="AvgShort.2" target="AvgShort.2" cpu_pct="76.440500" trans_per_sec="20414.000000"… <_ESP_Streams_Monitor ESP_OPS="u" stream_name="AvgShort" target="AvgShort" cpu_pct="20.280100" trans_per_sec="40543.000000"… <_ESP_Streams_Monitor ESP_OPS="u" stream_name="AvgLong_Feed" target="AvgLong_Feed" cpu_pct="28.080200" trans_per_sec="42108.000000"… <_ESP_Streams_Monitor ESP_OPS="u" stream_name="AvgLong.1" target="AvgLong.1" cpu_pct="73.320600" trans_per_sec="11619.000000"… <_ESP_Streams_Monitor ESP_OPS="u" stream_name="AvgLong.2" target="AvgLong.2" cpu_pct="81.120500" trans_per_sec="12758.000000"… <_ESP_Streams_Monitor ESP_OPS="u" stream_name="AvgLong.3" target="AvgLong.3" cpu_pct="54.600300" trans_per_sec="9952.000000"… <_ESP_Streams_Monitor ESP_OPS="u" stream_name="AvgLong.4" target="AvgLong.4" cpu_pct="43.680200" trans_per_sec="8286.000000"… <_ESP_Streams_Monitor ESP_OPS="u" stream_name="AvgLong" target="AvgLong" cpu_pct="18.720100" trans_per_sec="42359.000000"…
The overall project throughput has increased from about 28,000 transactions per second to 42,000. Therefore, performance has increased by about 300 percent. Also, the AvgCompare element, which performs a join between the AvgShort and AvgLong results, is now using about 100 percent of its CPU utilization. However, the project is already using 700 percent of CPU meaning that your system requires at least seven cores (hardware threads) to accommodate the current project setup. In production environments, the number of CPU cores may be higher as context switches introduce additional, nonnegligible overhead.
<_ESP_Streams_Monitor ESP_OPS="u" stream_name="Feed" target="Feed" cpu_pct="63.960400" trans_per_sec="55063.000000"… <_ESP_Streams_Monitor ESP_OPS="u" stream_name="AvgShort_Feed" target="AvgShort_Feed" cpu_pct="67.080400" trans_per_sec="55414.000000"… <_ESP_Streams_Monitor ESP_OPS="u" stream_name="AvgShort.1" target="AvgShort.1" cpu_pct="88.920500" trans_per_sec="27194.000000"… <_ESP_Streams_Monitor ESP_OPS="u" stream_name="AvgShort.2" target="AvgShort.2" cpu_pct="92.040600" trans_per_sec="28155.000000"… <_ESP_Streams_Monitor ESP_OPS="u" stream_name="AvgShort" target="AvgShort" cpu_pct="45.240300" trans_per_sec="55346.000000"… <_ESP_Streams_Monitor ESP_OPS="u" stream_name="AvgLong_Feed" target="AvgLong_Feed" cpu_pct="26.520200" trans_per_sec="54978.000000"… <_ESP_Streams_Monitor ESP_OPS="u" stream_name="AvgLong.1" target="AvgLong.1" cpu_pct="53.040300" trans_per_sec="15106.000000"… <_ESP_Streams_Monitor ESP_OPS="u" stream_name="AvgLong.2" target="AvgLong.2" cpu_pct="73.320500" trans_per_sec="16461.000000"… <_ESP_Streams_Monitor ESP_OPS="u" stream_name="AvgLong.3" target="AvgLong.3" cpu_pct="35.880200" trans_per_sec="11683.000000"… <_ESP_Streams_Monitor ESP_OPS="u" stream_name="AvgLong.4" target="AvgLong.4" cpu_pct="14.040100" trans_per_sec="10816.000000"… <_ESP_Streams_Monitor ESP_OPS="u" stream_name="AvgLong" target="AvgLong" cpu_pct="45.240300" trans_per_sec="53577.000000"… <_ESP_Streams_Monitor ESP_OPS="u" stream_name="AvgCompare_AvgShort" target="AvgCompare_AvgShort" cpu_pct="40.560300" trans_per_sec="55346.000000"… <_ESP_Streams_Monitor ESP_OPS="u" stream_name="AvgCompare_AvgLong" target="AvgCompare_AvgLong" cpu_pct="39.000300" trans_per_sec="53585.000000"… <_ESP_Streams_Monitor ESP_OPS="u" stream_name="AvgCompare.1" target="AvgCompare.1" cpu_pct="98.280600" trans_per_sec="54077.000000"… <_ESP_Streams_Monitor ESP_OPS="u" stream_name="AvgCompare.2" target="AvgCompare.2" cpu_pct="96.720700" trans_per_sec="55077.000000"… <_ESP_Streams_Monitor ESP_OPS="u" stream_name="AvgCompare" target="AvgCompare" cpu_pct="65.520500" trans_per_sec="109147.000000"…
The overall project throughput has increased from about 42,000 transactions per second to 55,000. This is a significant improvement from a nonpartitioned project. The AvgCompare, AvgShort.1, and AvgShort.2 elements have the highest load in the project. You can further tune the project performance by continuing to partition elements within the project.
Note that the maximum achievable throughput of the project is limited by the maximum throughput of the nonpartitionable elements, such as the partitioner itself. See PARTITION BY Clause in the Programmers Reference guide for a list of nonpartitionable elements. Moreover, the achievable degree of partitioning highly depends on the distribution of data across partitions. In case of uneven distribution, the partition receiving the highest number of events automatically becomes a bottleneck for the project.