Failover Handling

The SDK supports either fully transparent or automatic failover in a number of situations.

  • Cluster failovers – the URIs used to connect to a back-end component can include a list of cluster manager specifications. The SDK maintains connections to these transparently. So, if any one manager in the cluster goes down, the SDK tries to reconnect to another instance. The SDK returns an error only if connections to all known instances fail. If working in callback or select access modes, you can configure the SDK with an additional level of tolerance for loss of connectivity. In this case, the SDK does not disconnect an EspServer instance even if all known manager instances are down. Instead, it generates an ESP_SERVER_EVENT_STALE event. If it manages to reconnect after a (configurable) number of attempts, it generates an ESP_SERVER_EVENT_UPTODATE. Otherwise, it disconnects and generates an ESP_SERVER_EVENT_DISCONNECTED event.
  • Project failovers – an Event Stream Processor cluster allows a project to be deployed with failover. Based on the configuration settings, a cluster restarts a project if it detects that it has exited (however, projects are not restarted if they are explicitly closed by the user). To support this, you can have EspProject instances monitor the cluster for project restarts and then reconnect. This works only in callback or select modes. An ESP_PROJECT_EVENT_STALE event is generated when the SDK detects that the project has gone down. If it is able to reconnect, it generates an ESP_PROJECT_EVENT_UPTODATE event. Otherwise, it generates an ESP_PROJECT_EVENT_DISCONNECTED event.
  • Active-active deployments – you can deploy a project in active-active mode. In this mode, the cluster starts two instances of the project, a primary instance and a secondary instance. Any data published to the primary instance is automatically mirrored to the secondary instance. The SDK supports such active-active deployments. When connected to an active-active deployment, if the currently connected instance goes down, EspProject tries to reconnect to the alternate instance. Unlike failovers, this happens transparently. Therefore, if the reconnection is successful, there is no indication generated to the user. In addition to EspProject, there is support for this mode when publishing and subscribing. If subscribed to a project in an active-active deployment, the SDK does not disconnect the subscription if the instance goes down. Instead, it generates an ESP_SUBSCRIBER_EVENT_DATA_LOST event. It then tries to reconnect to the peer instance. If it is able to reconnect, the SDK resubscribes to the same streams. Subscription clients then receive an ESP_SUBSCRIBER_EVENT_SYNC_START event, followed by the data events, and finally an ESP_SUBSCRIBER_EVENT_SYNC_END event. Clients can use this sequence to maintain consistency with their view of the data if needed. Reconnection during publishing is also supported but only if publishing in synchronous mode. It is not possible for the SDK to guarantee data consistency otherwise. Reconnection during publishing happens transparently; there are no external user events generated.