High Availability

Event Stream Processor supports a set of high availability features that promote failure recovery and data redundancy.

High Availability Features
Feature		Where Set	Description
Clusters		Cluster configuration	Though a single-node cluster provides project-level failure recovery, it does not protect against server failure. A multinode cluster can protect against server failure. When a server in such a cluster fails, the projects running on the failed server restart on other servers if their affinities allow it. (Affinities control which server or servers a project can run on.)
Cold Failover		Project configuration	In cold failover, an ESP node detects when a project stops running unexpectedly and, if the configuration allows, restarts the project on the same node or a different node.
Active-Active Mode		Project configuration	When you deploy a project in active-active HA mode, two instances of the same project run in the cluster, preferably on separate machines. One version of the project is designated the primary instance, and the other is designated the secondary instance. All connections from outside the cluster (adapters, clients, Studio) are directed to the primary project server. If the primary instance fails, a hot failover occurs and all connections are automatically directed to the secondary instance. Data between primary and secondary instances is continuously synchronized. The primary instance receives each message first. To maintain redundancy, the secondary instance must also acknowledge receipt of the message before the primary instance begins processing.
Zero Data Loss			Using the three zero data loss features—guaranteed delivery, consistent recovery, and auto checkpoint—you can protect a project against data loss in the event of a server crash or loss of connection.
	Guaranteed delivery	Window properties in Studio Adapter CNXML files Client applications (via SDKs) Binding parameters in CCR files	Guaranteed delivery (GD) uses log stores to ensure that a GD subscriber registered with a GD window receives all the data processed by that window even if the client is not connected when the data is produced. GD is supported only on windows (not on streams or delta streams) and each GD window requires a log store.
	Consistent recovery	Project configuration	The consistent recovery feature can restore all the windows in a project to a consistent state after a server or connection failure. (Recovery consistency depends on following guidelines for log stores.) When consistent recovery is enabled, the server uses coordinated checkpoints to save data in log stores. When any log store fails to complete a checkpoint, all the log stores for that project roll back to their state as of the previous successful checkpoint. This rule ensures that even if a server or connection fails, all log stores in a project are consistent with one another. However, any input data that has not been checkpointed is not recovered upon restart.
	Auto checkpoint	Project configuration	Auto checkpoint lets you control how often log store checkpoints occur across all input streams and windows in the project. More frequent checkpoints mean less data is lost if the server crashes. At the maximum checkpoint frequency of every input transaction (value of 1), all input data is protected except the data from the last transaction, which might not be checkpointed before a crash. When you set checkpoint frequency, you make a trade-off: with frequent checkpoints you can reduce the amount of data at risk, but performance and latency may suffer as a result. The alternative is to increase performance but risk a larger amount of data loss by setting infrequent checkpoints.
Persistent subscribe pattern (PSP)		Shape context menu in Studio	Persistent subscribe pattern is an early HA feature similar to guaranteed delivery. SAP recommends that you use guaranteed delivery if possible. However, you might prefer PSP if: You need to guarantee delivery of data from a stream, from a delta stream, or from a window assigned to a memory store. You do not want the guaranteed delivery store to be a log store for performance reasons. Using a memory store allows recovery when the client restarts, but not when the project restarts.