This chapter describes how to prevent or recover from certain kinds of system failures in a replication system.
While Replication Server tolerates most failure conditions and recovers from them automatically, some failures require user intervention. This chapter identifies those failures and provides procedures for recovery. These procedures are designed to maintain the integrity of the replication system by recovering lost and corrupted data and restoring that data to its previous state.
You should design, install, and administer your replication system with backup and recovery in mind. We assume that dumps are performed on a regular basis and that appropriate tools and settings for handling recovery are in place. See “Creating coordinated dumps” for details on performing dumps.
In this chapter, the current Replication Server refers to the one with a database (for example, RSSD) that you are recovering. An upstream Replication Server has a direct or indirect route to the current Replication Server. A downstream Replication Server is one to which the current Replication Server has a direct or indirect route.
You can resynchronize the replicate databases in your replication environment if for example, there is replication latency between primary and replicate databases such that to recover a database using replication alone is not feasible. See “Resynchronizing replicate databases” for different scenarios and the corresponding procedures to follow for database resynchronization.