Using coordination modules

The default behavior of OpenSwitch is to migrate failed client connections as they fail. For example, if a connection fails, OpenSwitch immediately migrates it to the next available Adaptive Server® according to the mode of the pool in which the connection resides.

However, you may want to coordinate the switching process for certain OpenSwitch operations or business requirements. For example, when an Adaptive Server fails, you may want the client to reconnect to the failed server. Or, if a single connection fails unexpectedly, you may want to switch all connections to the next available server.

More importantly, you may need to coordinate the switching process with an external high availability (HA) solution such as Sybase® Replication Server®. In this case, failover should not occur until the HA service has completed the necessary steps to bring the backup server online, such as waiting until replication queues are synchronized between servers.

For these situations, OpenSwitch provides a simple application programming interface (API) that allows you to develop an external coordination module (CM). When connected to an OpenSwitch server, a coordination module receives event notifications based on connection state changes.

OpenSwitch provides a sample replication coordination module (RCM), which is a coordination module created using CM APIs. You can use the sample to coordinate failover of a high availability, warm standby system that uses Replication Server. See Chapter 4, “Using the Replication Coordination Module.”

For example, if a user attempts to log in, or a connection is lost to a server, the coordination module notifies OpenSwitch of the actions it should take, as illustrated in Figure 1-1.

Figure 1-1: Coordination module example

In this example:

Server 1 goes down unexpectedly, for example, due to a power outage or an explicit shutdown.
As soon as a connection is lost, the coordination module receives a message indicating which connection was lost, and the server with which that connection was communicating. The lost connection is suspended in the OpenSwitch server until the coordination module responds with the action to be taken for the connection.
The coordination module now communicates with the high availability solution, in this case, a Replication Server, to ensure that Server 2 is in a state that all users can rely on, such as ensuring that all transactions have been successfully migrated through the Replication Agent™. The coordination module can, at this point, attempt to automatically recover Server 1 before attempting to switch users to Server 2.
The coordination module responds to OpenSwitch that all connections that were using Server 1 should now switch to the next available server, in this case, Server 2.
All connections are switched, as requested by the coordination module, to the next available server. Connections are issued a deadlock message, if necessary.

Because the coordination module can intercept and respond to every connection state change, including client logins, you can also use the CM to override built-in OpenSwitch pooling and routing mechanisms with application- or business-specific logic.

The coordination module can:

Determine if a failed connection is due to a remote Adaptive Server being unavailable
Determine if the backup Adaptive Server is available
Coordinate itself with third-party high availability tools
Switch all connections in tandem
Mark an Adaptive Server as unavailable in OpenSwitch
Manage multiple instances of OpenSwitch

If the OpenSwitch server is configured to use a coordination module and one is not available when a connection changes state, the connection suspends until a coordination module comes online, at which time all pending notifications are delivered.