Procedure for recovering from partition loss or failure

To recover from Replication Server partition loss or failure, perform the following steps:

  1. Log in to the Replication Server and drop the failed partition:

    drop partition logical_name
    

    Replication Server does not immediately drop a partition that was in use. If the partition is undamaged, Replication Server drops it only after all of the messages it holds are delivered and deleted.

    Refer to Chapter 3, “Replication Server Commands,” in the Replication Server Reference Manual for more information about drop partition command.

  2. If the failed partition was the only one available to the Replication Server, add another one to replace it:

    create partition logical_name
    on 'physical_name' with size size
    [starting at vstart]
    

    Refer to the Replication Server Reference Manual for more information.

  3. Since the partition is damaged, you must rebuild the stable queues:

    rebuild queues
    

    See “Rebuilding queues online” for a description of this process.

    When all stable queues on the partition are removed, Replication Server drops the failed partition from the system and rebuilds the queues using the remaining partitions.

  4. After rebuilding the queues, check the Replication Server logs for loss detection messages.

    See “Loss detection after rebuilding stable queues” for background and details.

  5. If Replication Server detected message loss, you can:

NoteIf you specify that Replication Server ignore message losses and you have rebuilt the queues of a Replication Server that is part a route, you must re-create subscriptions at the destination or use the rs_subcmp program with the -r flag to reconcile primary and replicate data.