Coordinator Failure

If the current coordinator node fails, or must be shut down for maintenance, clients on other nodes can be affected. Read-write operations are rolled back on the failed node.

Clients connected to the failed coordinator experience an outage. When the clients try to reconnect, they can be redirected to a node that is up using the login redirection feature or using a third party redirector. Depending on the severity of the failure, the failed node can be restarted if it is a software issue or restarted after fixing the hardware or disk issue.

Client Location Result
Reader node where DQP is not enabled Not affected by coordinator failure
Reader node where DQP is enabled

These nodes periodically require space on IQ_SHARED_TEMP. When that happens, these DQP transactions are suspended. (See Global Transaction Resiliency.) The clients experience a pause until the coordinator is brought back up or failed over.

If the coordinator cannot be brought back up or failed over within a user controlled time out period, then these DQP transactions roll back and the clients experience an outage.

Writer node

The clients on writer nodes that are doing read-write operations periodically need more space in shared main dbspaces or require global locks on tables they modify. When that happens, these transactions suspend.

The clients experience a pause until the coordinator is brought back up or failed over. If the coordinator cannot be brought back up or failed over within a user controlled time out period, then these read-write transactions roll back and the clients experience an outage.

These dependencies make it critical that the coordinator stay up at all times. If the coordinator fails, restart the node immediately or promote another server to be the coordinator, also called manual failover.

Related tasks
Dropping Multiplex Servers
Related reference
ALTER LS POLICY Statement
DROP MULTIPLEX SERVER Statement
MPX_LIVENESS_TIMEOUT Option