SVR_FAIL_ACTION is used when an Adaptive Server does not respond in a timely manner, or when the Adaptive Server host cannot be pinged by either OpenSwitch server in a cluster.
SVR_FAIL_ACTION can be used if a CM is connected to the OpenSwitch.
For example, you have two OpenSwitch servers, OSW1 and OSW2. OSW1 detects a failure first, but if it does not have a CM connection, OSW1 cannot handle the failure. If OSW2 has a CM connection, it handles the failure. If OSW1 is a primary server, and OSW2 is a secondary server, and both detect the failure, OSW2 handles the failure only if it has a CM and OSW1 does not.
With OpenSwitch 15.1 and later, if you are using a CM or RCM, you can set SVR_FAIL_ACTION to CUSTOM, MANUAL, CUSTOM_MANUAL, or DEFAULT.
With OpenSwitch 15.0 and earlier, you cannot run custom or manual scripts for a server failure because the failover procedures for CMs and RCMs differ from each other, and may contradict the actions invoked by a custom or manual script.
When you specify DEFAULT for SVR_FAIL_ACTION, OpenSwitch checks whether any CMs or RCMs are connected. When there are CM or RCM connections, and:
If SVR_FAIL_ACTION is set to either DEFAULT or MANUAL, the CM or RCM performs the normal failover.
If SVR_FAIL_ACTION is set to CUSTOM and the script returns an exit code of either 0 or an invalid code, the CM or RCM performs the normal failover.
If SVR_FAIL_ACTION is set to CUSTOM and the script returns an exit code of either 1 or 3, the CM or RCM performs the failover without pinging the Adaptive Server host.
If SVR_FAIL_ACTION is set to CUSTOM and the script returns an exit code of 2, the CM or RCM does not perform any action and OpenSwitch terminates all the existing connections.
When there are no CM or RCM connections, OpenSwitch:
Marks the failed primary Adaptive Server as locked.
Stops all clients on the failed Adaptive Server.
Marks the primary Adaptive Server as DOWN.
Marks the secondary Adaptive Server as UP.
Switches clients from the primary Adaptive Server to the secondary Adaptive Server.
Restarts all clients.
Directs all new connections to the secondary Adaptive Server.
Use:
DEFAULT – to mark the Adaptive Server as not running and initiate a failover process.
The status of the Adaptive Server is SUSPENDED when SVR_FAIL_ACTION is invoked and the status changes to UNSUSPENDED after SVR_FAIL_ACTION completes.
CUSTOM, MANUAL, or CUSTOM_MANUAL – to execute the specified custom or manual script with the reason code 1004, unless you are using a CM or RCM, in which case the action and reason code are ignored, and OpenSwitch allows the CM or RCM to handle the failover.
It is important that the scripts on both OpenSwitch companions perform the same actions because during SVR_FAIL_ACTION, only one of the companions executes the script. For example, if the script for OSW1 restarts the server or notifies the administrator, the script for OSW2 should also restart the server or notify the administrator. Although the actions must be the same in both scripts, the commands that invoke those actions can be different; that is, you could use different commands to restart the server as long as the commands produce the same result.
When you specify MANUAL or CUSTOM_MANUAL, OpenSwitch is suspended and waits indefinitely until the system administrator executes rp_go.
See “User-specified actions” for additional details about these actions.
Valid exit codes are:
0 – the script is successful and OpenSwitch should reconnect all existing clients to the same primary Adaptive Server. The script should return this exit code if it restarts the primary Adaptive Server successfully.
OpenSwitch does not change the status of the primary Adaptive Server to DOWN and OpenSwitch continues to route future connections to the same primary Adaptive Server.
1 – the script is successful and OpenSwitch should fail over all existing clients to the secondary Adaptive Server. The script should return this exit code if it sends a notification about the server error but does not restart the server that is not responding.
OpenSwitch changes the status of the primary Adaptive Server to DOWN and OpenSwitch routes future connections to the next available Adaptive Server in the pool.
2 – the script is unsuccessful and OpenSwitch should terminate all existing client connections. The script should return this exit code if the script fails and OpenSwitch does not perform an automatic failover.
The primary Adaptive Server status changes to LOCKED. If the primary Adaptive Server does not restart, OpenSwitch blocks new client connections until the status changes to either UP or DOWN, or the client times out. Client connections reconnect to the primary or secondary Adaptive Server, depending on the actions of the administrator:
If the primary Adaptive Server restarts and the administrator changes the status of the primary Adaptive Server to UP, client connections reconnect to the primary Adaptive Server.
If the primary Adaptive Server does not restart but the administrator changes the status of the primary Adaptive Server to UP, client connections connect to the secondary Adaptive Server.
If the primary Adaptive Server restarts but the administrator changes the status of the primary Adaptive Server to DOWN, client connections connect to the secondary Adaptive Server.
If the primary Adaptive Server does not restart and the administrator changes the status of the primary Adaptive Server to DOWN, client connections connect to the secondary Adaptive Server.
3 – the script is unsuccessful and OpenSwitch should fail over all existing clients to the secondary Adaptive Server. This script should return this exit code if the script fails, and you want to perform an automatic failover to the next available server.
OpenSwitch changes the status of the primary Adaptive Server to DOWN and OpenSwitch routes all future connections to the next available Adaptive Server in the pool.
Any exit code that is more than three (3) – perform the default operation. For example, setting the primary Adaptive Server status to DOWN.
Valid exit codes are:
0 – the script is successful and OpenSwitch should reconnect all existing clients to the same primary Adaptive Server. The script should return this exit code if it restarts the primary Adaptive Server successfully.
OpenSwitch does not change the status of the primary Adaptive Server to DOWN and OpenSwitch continues to route future connections to the same primary Adaptive Server.
Any exit code that is more than zero (0) – perform the default operation. For example, setting the primary Adaptive Server status to DOWN.