Basic RSSD recovery procedure

Use the basic RSSD recovery procedure to restore the RSSD if you have executed no DDL commands since the last RSSD dump. DDL commands in RCL include those for creating, altering, or deleting routes, replication definitions, subscriptions, function strings, functions, function-string classes, or error classes.

Certain steps in this procedure are also referenced by other RSSD recovery procedures in this chapter.

WARNING! Do not execute any DDL commands until you have completed this recovery procedure.

To perform basic RSSD recovery, follow these steps:

  1. Shut down all RepAgents that connect to the current Replication Server.

  2. Since its RSSD has failed, the current Replication Server is down. If for some reason it is not down, log in to it and use the shutdown command to shut it down.

    NoteSome messages may still be in the Replication Server stable queues. Data in those queues may be lost when you rebuild these queues in later steps.

  3. Restore the RSSD by loading the most recent RSSD database dump and all transaction dumps.

  4. Restart the Replication Server in standalone mode, using the -M flag.

    You must start the Replication Server in standalone mode, because the stable queues are now inconsistent with the RSSD state. When the Replication Server starts in standalone mode, reading of the stable queues is not automatically activated.

  5. Log in to the Replication Server, and get the generation number for the RSSD, using the admin get_generation command:

    admin get_generation, data_server, rssd_name
    

    For example, the Replication Server may return a generation number of 100.

  6. In the Replication Server, rebuild the queues with the following command:

    rebuild queues
    

    See “Rebuilding queues online” for a description of this process.

  7. Start all RepAgents (except the RSSD RepAgent) that connect to the current Replication Server in recovery mode.

    Wait until each RepAgent logs a message in the Adaptive Server log that it is finished with the current log.

  8. Check the loss messages in the Replication Server log, and in the logs of all the Replication Servers with direct routes from the current Replication Server.

  9. Shut down RepAgents for all primary databases managed by the current Replication Server.

  10. Execute the dbcc settrunc command at the Adaptive Server for the restored RSSD. Move up the secondary truncation point.

    use rssd_name
    go
    dbcc settrunc('ltm', 'ignore'
    dump tran rssd_name with truncate_only
    go
    begin tran commit tran
    go 40
    

    NoteThe begin tran commit tran go 40 command moves the Adaptive Server log onto the next page.

    After completing step 10 and before continuing with step 11, run the following command to clear the locater information.

    rs_zeroltm rssd_server, rssd_name 
    go
    
  11. Execute the dbcc settrunc command at the Adaptive Server for the restored RSSD to set the generation number to one higher than the number returned by admin get_generation in step 5.

    dbcc settrunc('ltm', 'valid')
    go
    

    Make a record of this generation number and of the current time, so that you can return to this RSSD recovery procedure, if necessary. Or, you can dump the database after setting the generation number.

  12. Restart the Replication Server in normal mode.

    If you performed this procedure as part of the subscription comparison or subscription re-creation procedure, the upstream RSI outbound queue may contain transactions, bound for the RSSD of the current Replication Server, that have already been applied using rs_subcmp. If this is the case, after starting the Replication Server, the error log may contain warnings referring to duplicate inserts. You can safely ignore these warnings.

  13. Restart RepAgents for the RSSD and for user databases in normal mode.

    If you performed this procedure as part of the subscription comparison or subscription re-creation RSSD recovery procedure, you should expect to see messages regarding RSSD losses being detected in all Replication Servers that have routes from the current Replication Server.