Recovery from Partition Loss or Failure

When a Replication Server detects a failed or missing partition, it shuts down the stable queues that are using the partition and logs messages about the failure. Restarting Replication Server does not correct the problem. You must drop the damaged partition and rebuild the stable queues.

Complete recovery depends on the volume of messages cleared from the queue and on how soon you apply the recovery procedure after the failure occurs. If a Replication Server maintains minimal latency in the replication system, only the most recent messages are lost when its queues are rebuilt.

If a partition fails in a primary Replication Server, you can usually resend lost messages from their source using an off-line database log. If partitions fail in a replicate Replication Server, you need to recover from the stable queue of the upstream Replication Server.

In some cases, using an off-line log may be the only way you can recover your messages. If the Replication Server has suspended routes or connections, or if a network or data server connection goes down, a backlog may have accumulated in the Replication Server stable queues. Unless you have specified a save interval setting that can cover the backlog, your chance of recovering these messages decreases with time. Source Replication Servers may have already deleted messages from their stable queues and may have truncated the database logs.

Note: You can set save intervals for recovery.

Symptoms of and Relevant Recovery Procedures for Partition Loss or Failure
Learn when to use and where to locate the appropriate recovery procedure for partition loss or failure.
Recovering from Partition Loss or Failure
Recover from Replication Server partition loss or failure when Replication Server detects a lost, damaged,or failed stable queue.
Recovering Messages from Off-line Database Logs
Recover messages from off-line logs after a partition failure.
Recovering Messages from the Online Database Log
Recover messages that are still in the online log at the primary database.

Parent topic: Replication System Recovery

Related concepts

Save Interval for Recovery