Coordinator Failure

If the current coordinator node fails, or must be shut down for maintenance, clients on other nodes can be affected. Read-write operations are rolled back on the failed node.

Clients connected to the failed coordinator experience an outage. When the clients try to reconnect, they can be redirected to a node that is up using the login redirection feature or using a third party redirector. Depending on the severity of the failure, the failed node can be restarted if it is a software issue or restarted after fixing the hardware or disk issue.

Client Location	Result
Reader node where DQP is not enabled	Not affected by coordinator failure
Reader node where DQP is enabled	These nodes periodically require space on IQ_SHARED_TEMP. When that happens, these DQP transactions are suspended. (See Global Transaction Resiliency.) The clients experience a pause until the coordinator is brought back up or failed over. If the coordinator cannot be brought back up or failed over within a user controlled time out period, then these DQP transactions roll back and the clients experience an outage.
Writer node	The clients on writer nodes that are doing read-write operations periodically need more space in shared main dbspaces or require global locks on tables they modify. When that happens, these transactions suspend. The clients experience a pause until the coordinator is brought back up or failed over. If the coordinator cannot be brought back up or failed over within a user controlled time out period, then these read-write transactions roll back and the clients experience an outage.

Client Location

Result

Reader node where DQP is not enabled

Not affected by coordinator failure

Reader node where DQP is enabled

These nodes periodically require space on IQ_SHARED_TEMP. When that happens, these DQP transactions are suspended. (See Global Transaction Resiliency.) The clients experience a pause until the coordinator is brought back up or failed over.

If the coordinator cannot be brought back up or failed over within a user controlled time out period, then these DQP transactions roll back and the clients experience an outage.

Writer node

The clients on writer nodes that are doing read-write operations periodically need more space in shared main dbspaces or require global locks on tables they modify. When that happens, these transactions suspend.

The clients experience a pause until the coordinator is brought back up or failed over. If the coordinator cannot be brought back up or failed over within a user controlled time out period, then these read-write transactions roll back and the clients experience an outage.

These dependencies make it critical that the coordinator stay up at all times. If the coordinator fails, restart the node immediately or promote another server to be the coordinator, also called manual failover.

Designated Failover Node
A multiplex requires a designated failover node to take over as coordinator if the current coordinator is not running.
Replacing the Coordinator (Manual Failover)
Make sure that the coordinator is no longer running before you replace it.
Coordinator Failure and Restart
If the coordinator restarts during a global transaction, due to shutdown, failover or server failure, transaction behavior depends on the user-defined timeout and the command being executed.

Parent topic: High Availability

Related tasks

Dropping Multiplex Servers

Related reference

ALTER LS POLICY Statement

DROP MULTIPLEX SERVER Statement

MPX_LIVENESS_TIMEOUT Option