Failover processing

When notified of a failed connection, the RCM performs the following tasks:

  1. Before starting the failover process, RCM pings the active Adaptive Server. If the RCM can ping the Adaptive Server server, it is not down, so the RCM issues a kill command to end the current connection. The end user must manually reconnect.

  2. The RCM changes the status of the active Adaptive Server in the OpenSwitch log to LOCKED. This stops new users from connecting to the active Adaptive Server.

  3. The RCM issues a stop command to suspend all current connections to the active Adaptive Server.

  4. The RCM does not fail over immediately but waits to see if the system recovers. The Adaptive Server might automatically recover, or the network might stabilize. The RCM pings the active Adaptive Server at a configurable interval. If the RCM successfully pings the server, it unlocks the server, restarts the connections, and allows users to connect.

  5. When RCM determines that a failover is necessary, it performs the following steps:

    • If RS_FAILOVER_MODE is set to SWITCH, the RCM connects to the Replication Server and issues the switch active command for each logical connection defined by the LOGICAL_CONN configuration parameter.

    • If RS_FAILOVER_MODE is set to QUIESCE, the RCM connects to Replication Server and issues the suspend log transfer from all and admin quiesce_force_rsi commands.

    • If the RS_FAILOVER_MODE is set to NONE, the RCM does not connect to Replication Server, but locks out user connections to the Adaptive Server.

  6. When RS_FAILOVER_MODE is not set to NONE, because both the switch active command and the quiesce commands are asynchronous, the RCM monitors the process to determine when the commands have completed. The RCM issues a monitoring command at a configurable interval until a configurable amount of time is reached. At that time, or when Replication Server finishes the failover process, whichever occurs first, the RCM switches the users to the standby Adaptive Server.

    NoteThe monitoring commands the RCM issues are different for switch active and quiesce modes. In switch active mode, the RCM issues the admin logical status command. In quiesce mode, the RCM issues the admin health command.

  7. If RS_FAILOVER_MODE is set to SWITCH, the RCM starts the Replication Agent on the standby Adaptive Server for each database defined by the DATABASES configuration parameter.

    NoteWith this step, the RCM completes the reversal of replication flow in the environment.

  8. The RCM disconnects DSS users from the standby Adaptive Server. Typically, DSS users can be off-loaded to the standby Adaptive Server to execute read-only queries. You may decide to disconnect these users if a failover from the active to the standby Adaptive Server occurs. If you set the DISCONNECT_STBY_USERS configuration parameter, the RCM disconnects all users from the standby Adaptive Server before switching the users from the active Adaptive Server. The DSS users must wait to be reconnected when the active Adaptive Server is back online.

  9. OpenSwitch switches end users from the active to the standby Adaptive Server. The RCM sets the server status to DOWN, switches the server connections from the active Adaptive Server to the standby Adaptive Server, and restarts all existing connections that were suspended at the active Adaptive Server.