If the current coordinator node fails, or must be shut down for maintenance, clients on other nodes can be affected. Read-write operations are rolled back on the failed node.
Clients connected to the failed coordinator experience an outage. When the clients try to reconnect, they can be redirected to a node that is up using the login redirection feature or using a third party redirector. Depending on the severity of the failure, the failed node can be restarted if it is a software issue or restarted after fixing the hardware or disk issue.
Client Location | Result |
---|---|
Reader node where DQP is not enabled | Not affected by coordinator failure |
Reader node where DQP is enabled |
These nodes periodically require space on IQ_SHARED_TEMP. When that happens, these DQP transactions are suspended. (See Global Transaction Resiliency.) The clients experience a pause until the coordinator is brought back up or failed over. If the coordinator cannot be brought back up or failed over within a user controlled time out period, then these DQP transactions roll back and the clients experience an outage. |
Writer node |
The clients on writer nodes that are doing read-write operations periodically need more space in shared main dbspaces or require global locks on tables they modify. When that happens, these transactions suspend. The clients experience a pause until the coordinator is brought back up or failed over. If the coordinator cannot be brought back up or failed over within a user controlled time out period, then these read-write transactions roll back and the clients experience an outage. |
These dependencies make it critical that the coordinator stay up at all times. If the coordinator fails, restart the node immediately or promote another server to be the coordinator, also called manual failover.