When an Adaptive Server fails, the OpenSwitch configuration tries to reconnect to the Adaptive Server several times before initiating failover. The following checks are performed to ensure that a failover is really necessary:
First, the OpenSwitch server that detected the failure tries to connect to the primary Adaptive Server using the CMON user name and password. If the connection succeeds, the failure is treated as an isolated or client-specific incident, and no server-wide failover is performed. The threads that encounter the failure are terminated, and all future incoming clients are directed to the same primary Adaptive Server.
If step 1 fails, but the host of the first OpenSwitch can still communicate with the host of the primary Adaptive Server, the primary Adaptive Server is assumed to have stopped responding, and the user-specified behavior for the SVR_FAIL_ACTION parameter in the configuration file is performed.
If step 1 fails because the host of the first OpenSwitch cannot communicate with the host of the primary Adaptive Server, the first OpenSwitch checks with the companion OpenSwitch to see if the latter also has trouble communicating with the primary Adaptive Server host.
If step 3 succeeds, and the companion OpenSwitch host has no problem communicating with the primary Adaptive Server host, the local network of the first OpenSwitch becomes a suspect, and the user-specified behavior for the NET_FAIL_ACTION parameter in the configuration file is performed on the first OpenSwitch, which allows its clients to fail over to its companion OpenSwitch. The clients must reconnect, and are directed to the companion OpenSwitch via the Client-Library failover feature.
However, if the companion OpenSwitch also cannot communicate with the primary Adaptive Server host, a failure at the primary Adaptive Server site or network is assumed, and the user-specified behavior for the SVR_FAIL_ACTION parameter in the configuration file is performed on the first OpenSwitch.
If step 3 fails because the first OpenSwitch host cannot communicate with the companion OpenSwitch host, the first OpenSwitch attempts to ping the secondary Adaptive Server host to determine whether its own host has completely gone off the network. If the communication is also broken between the first OpenSwitch and the secondary Adaptive Server host, the first OpenSwitch assumes that it is experiencing a local network failure, and the user-specified behavior for the NET_FAIL_ACTION parameter in the configuration file is performed.
However, if communication still exists with the secondary Adaptive Server host, the first OpenSwitch performs the user-specified behavior for the SVR_FAIL_ACTION parameter to fail over the clients to the secondary Adaptive Server.
See “Invoking custom and manual scripts” for more information about *_FAIL_ACTION functionality.