External coordination using coordination modules

The default behavior of OpenSwitch is to automatically attempt to migrate individual failed connections as they fail. That is, if a single connection fails, it is immediately migrated to the next available server according to the mode of the pool in which the connection resides. However, you may want to coordinate the switching process around certain operation or business requirements.

For example, rather than immediately migrating a connection to the next available Adaptive Server, you may first want to attempt to reconnect to the failed server to ensure that it has failed. Or, you may want to switch all connections to the server if a single connection fails unexpectedly.

More importantly, you may need to coordinate the switching process with an external high-availability (HA) solution such as Replication Server^®. In this case, a failover should not occur until the HA service has completed the necessary steps to bring the backup server online, such as waiting until replication queues are synchronized between servers.

For these situations, OpenSwitch provides a simple application programming interface (API) for developing an external coordination module (CM). When connected to an OpenSwitch, a coordination module receives event notifications based on connection state changes (for example, a user attempts to log in, or a connection is lost to a server), and is expected to respond to OpenSwitch, informing it of any actions to take, as illustrated in Figure 1-7.

Figure 1-7: Coordination module example

In this example:

Server 1 goes down unexpectedly; for example, due to a power outage or an explicit shutdown.
As soon as the connection is lost, the coordination module receives a message indicating which connection was lost, and to which Adaptive Server that connection was communicating. The connection that was lost suspends within the OpenSwitch until the coordination module responds with what should happen to the connection.
The coordination module now communicates with the high-availability solution, in this case, a replication agent, to ensure that Server 2 is in a state that all users can rely on, such as ensuring that all transactions have been successfully migrated through the replication agent. The coordination module could, at this point, attempt to automatically recover Server 1 before attempting to switch users to Server 2.
The coordination module responds to the OpenSwitch server that all connections that are using Server 1 should now switch to the next available Adaptive Server, in this case, Server 2.
All connections are switched, as requested by the coordination module, to the next available server. Connections are issued a “deadlock” message, if necessary.

Because the coordination module can intercept and respond to every connection state change, including client login, you can also use it to override any of the built-in OpenSwitch pooling and routing mechanisms with application- or business-specific logic.

If the OpenSwitch is configured to use a coordination module and one is not available when a connection changes state, the connection suspends until a coordination module comes online, at which time all pending notifications are delivered.

See the OpenSwitch Coordination Module Reference Manual to develop and use OpenSwitch coordination modules.