Using the Basic RSSD Recovery Procedure

Restore the RSSD if you have executed no DDL commands since the last RSSD dump. DDL commands in RCL include those for creating, altering, or deleting routes, replication definitions, subscriptions, function strings, functions, function-string classes, or error classes.

Certain steps in this procedure are also referenced by other RSSD recovery procedures.

Warning!  Do not execute any DDL commands until you have completed this recovery procedure.
  1. Shut down all RepAgents that connect to the current Replication Server.
  2. Since its RSSD has failed, the current Replication Server is down. If for some reason it is not down, log in to it and use the shutdown command to shut it down.
    Note: Some messages may still be in the Replication Server stable queues. Data in those queues may be lost when you rebuild these queues in later steps.
  3. Restore the RSSD by loading the most recent RSSD database dump and all transaction dumps.
  4. Restart the Replication Server in standalone mode, using the -M flag.

    You must start the Replication Server in standalone mode, because the stable queues are now inconsistent with the RSSD state. When the Replication Server starts in standalone mode, reading of the stable queues is not automatically activated.

  5. Log in to the Replication Server, and get the generation number for the RSSD.
    Enter:
    admin get_generation, data_server, rssd_name

    For example, the Replication Server may return a generation number of 100.

  6. In the Replication Server, rebuild the queues.
    Enter:
    rebuild queues
  7. Start all RepAgents (except the RSSD RepAgent) that connect to the current Replication Server in recovery mode.
    Enter:
    sp_start_rep_agent dbname, recovery

    Wait until each RepAgent logs a message in the Adaptive Server log that it is finished with the current log.

  8. Check the loss messages in the Replication Server log, and in the logs of all the Replication Servers with direct routes from the current Replication Server.
    • If all your routes were active at the time of failure, you probably will not experience any real data loss.

    • However, loss detection may indicate real loss. Real data loss may be detected if the database logs were truncated at the primary databases, so that the rebuild process did not have enough information to recover. If you have real data loss, reload database logs from old dumps using the procedure to recover from truncated primary database logs.

  9. Shut down RepAgents for all primary databases managed by the current Replication Server.
    Enter:
    sp_stop_rep_agent dbname
  10. Shut down Replication Server.
  11. Move up the secondary truncation point.
    Execute the dbcc settrunc command at the Adaptive Server for the restored RSSD:
    use rssd_name
    go
    dbcc settrunc('ltm', 'ignore')
    go
    dump tran rssd_name with truncate_only
    go
    begin tran commit tran
    go 40
    Note: The begin tran commit tran go 40 command moves the Adaptive Server log onto the next page.
  12. Clear the locator information.
    Enter:
    rs_zeroltm rssd_server, rssd_name 
    go
  13. Execute the dbcc settrunc command at the Adaptive Server for the restored RSSD to set the generation number to one higher than the number returned by admin get_generation in step 5.
    Enter:
    dbcc settrunc ('ltm', 'gen_id', generation_number)
    go
    dbcc settrunc('ltm', 'valid')
    go

    Make a record of this generation number and of the current time, so that you can return to this RSSD recovery procedure, if necessary. Or, you can dump the database after setting the generation number.

  14. Restart the Replication Server in normal mode.

    If you performed this procedure as part of the subscription comparison or subscription re-creation procedure, the upstream RSI outbound queue may contain transactions, bound for the RSSD of the current Replication Server, that have already been applied using rs_subcmp. If this is the case, after starting the Replication Server, the error log may contain warnings referring to duplicate inserts. You can safely ignore these warnings.

  15. Restart RepAgents for the RSSD and for user databases in normal mode.

    If you performed this procedure as part of the subscription comparison or subscription re-creation RSSD recovery procedure, you should expect to see messages regarding RSSD losses being detected in all Replication Servers that have routes from the current Replication Server.

Related concepts
Rebuild Queues Online
Recovery from Truncated Primary Database Logs
Loss Detection After Rebuilding Stable Queues