Leader and Worker Nodes

In distributed query processing, leader nodes pass work units to worker nodes and the work is performed by threads running on both the leader and worker nodes.

The leader node can be any node in the cluster where a query originates. A worker node can be any node in the cluster that is capable of accepting distributed query processing work. Do not confuse these nodes with multiplex coordinator, writer, and reader nodes.

You can view details of distributed query processing thread usage using the sp_iqcontext system stored procedure.

When a query is submitted to a node, work units may be distributed, but only to those nodes that are members of the logical server of the current connection. Multiplex nodes that are not members of the current connection's logical server do not take part in the distributed query processing for that query. The leader node automatically chooses worker nodes for the distributed query from within the same logical server as the leader node. If you exclude multiplex nodes from a logical server, no distributed query processing occurs on those nodes for that logical server.

If a leader node fails, query processing ends, as it would on a single server. You can connect to another server to run the query, but this does not happen automatically.

Many types of queries can survive failures on worker nodes, either due to disconnect or timeout. If a worker fails, the leader executes pending work for the worker and assigns no further work from the current query fragment to that worker. The MPX_WORK_UNIT_TIMEOUT database option specifies the timeout duration in seconds (default 60).

Some queries support worker node failures at any time during the query, while others cannot once any intermediate results have been sent. The query plan detail displays statistics about work units that have been assumed by the leader. Queries that cannot support work retry on the leader are cancelled immediately.