Reconnecting clients when a node loses power

If the network cable is removed from a machine or if a node to which a client is connected loses its power, the client-side socket becomes unreachable. The client socket waits, without results, for a reply from the server or waits for the cluster to issue a send operation.

In the situation in Figure 2-1, a client application is connected to “big_cluster”, which consists of “Node1” and “Node2” on which instances “ASECE1” and “ASECE2” are running, respectively. A client application is connected to instance “ASECE1” running on “Node1”.

If the power is disconnected from “Node1”, the client application waits for contact from the node. The only way to avoid this situation is to configure the client application to assume the node is down after a specified amount of time. It then connects to another node in the cluster.

Figure 2-1: Unplugged node

Image shows a client connecting to an unplugged cluster

The operating system network detects a crash, disconnects the clients, and fails over the sockets from the remote side of the connection.

To reduce the time required to detect when a cluster loses a host or when a public network is disconnected from a node running an instance, you can:

Set TCP keepalive to a reasonable value on the host on which the client is running.
Set the client application’s timeout value.

Setting TCP keepalive to a shorter value

The TCP keepalive parameter eventually marks the client socket as failed. However, because the default value of the TCP keepalive value is a long amount of time (in some systems it may be set to as long as two hours), it may be three or more hours before the client-side sockets fail over. Setting keepalive to a small value (several minutes) may not be practical for large organizations, but you can set keepalive to a period of time that is appropriate for your site, that works with the HAFAILOVER capabilities.

Set TCP keepalive on client machines. The appropriate values vary, depending on the operating system you use. See your client’s operating system documentation for more information.

If you are testing for client timeouts, set the values for the parameters in the first two columns of Table 2-1 to a few minutes, and set the values for the parameters in the third column to a low number.

**Table 2-1: Setting TCP keepalive**
Operating system	Amount of time parameter waits before probing the connection	Amount of time between probes for parameter	Maximum amount of time or attempts for parameter to probe connections before dropping them
Solaris	N/A	tcp_keepalive_interval Measured in milliseconds	N/A
Linux	tcp_keepalive_time Measured in seconds	tcp_keepalive_intvl Measured in seconds	tcp_keepalive_probes Measured as absolute number
Windows XP	KeepAliveTime Measured in seconds	KeepAliveInterval Measured in seconds	TCPMaxDataRetransmissionsions Measured as absolute number
HP-UX	tcp_time_wait_interval Measured in milliseconds	tcp_keepalive_interval Measured in milliseconds	tcp_keepalive_kill Measured in milliseconds

Set client’s timeout value

There are two different timeout properties you can set for Client-Library program connections:

CS_LOGIN_TIMEOUT – determines how long the client waits to connect to an unreachable host.
CS_TIMEOUT – determines how long a client waits for commands to complete.

Based on how you configure the timeout event, the client either fails or fails over to another node.

You can configure clients to set the Client-Library CS_TIMEOUT parameter to determine how long to wait before they time out.

You must set the CS_TIMEOUT and CS_LOGIN_TIMEOUT parameters or the isql -t and -l parameters for clients to fail over during a sudden loss of power to the node.

For more information about the Client-Library parameters CS_TIMEOUT and CS_LOGIN_TIMEOUT, see the Client-Library/C Reference Manual. For information about CS_HAFAILOVER, see “Client/server interaction”.

For information about using the isql -t and -l parameters, see the Adaptive Server Enterprise Utility Guide.