Routes between Replication Servers

If the Replication Server has suspended routes, or if a network or data server connection is down, a backlog of messages may accumulate in the Replication Server stable queues. The chance of recovering these messages decreases with time. Source Replication Servers may already have deleted messages from their stable queues and database logs may already have been truncated.

When you set the save_interval for each route between Replication Servers, you allow each Replication Server to retain messages for a minimum period of time after the next site in the route acknowledges that it has received the messages. The availability of these messages increases the chance of recovering online messages after queues are rebuilt.

For example, in Figure 7-1, Replication Server TOKYO_RS maintains a direct route to MANILA_RS, and MANILA_RS maintains a direct route to SYDNEY_RS.

TOKYO_RS retains messages for a period of time after MANILA_RS has received them. If MANILA_RS experiences a partition failure, it requires that TOKYO_RS to resend the backlogged messages. MANILA_RS can also retain messages to allow SYDNEY_RS to recover from failures.

When all of the messages stored on a stable queue segment are at least as old as the save_interval setting, Replication Server deletes the segment so it can be reused.

Figure 7-1: Save interval example

Figure 7-1 illustrates an example of save interval in a Replication System. In this example, there is a primary and a replicate data server, as well as, three Replication Servers in different locations. The Replication Server tokyo underscore R S maintains a direct route to manila underscore R S, and manila underscore R S maintains a direct route to sydney underscore R S. tokyo underscore R S retains messages for a period of time after manila underscore R S has received them. If manila underscore R S experiences a partition failure, it requires that tokyo underscore R S to resend the backlogged messages. manila underscore R S can also retain messages to allow sydney underscore R S to recover from failures.

Setting the save interval for routes

To set the save_interval for a route, execute the alter route command at the source Replication Server. Using as an example the replication system in Figure 7-1, here is the command to set Replication Server TOKYO_RS to save for one hour any messages destined for MANILA_RS:

alter route to MANILA_RS
    set save_interval to '60'

By default, the save_interval is set to 0 (minutes). For systems with low volume, this may be an acceptable setting for recovery, since Replication Server does not delete messages immediately after receiving acknowledgment from destination servers. Rather, messages are deleted periodically in large chunks.

However, to accommodate the volume and activity of sites that receive distributions from the Replication Server and to increase the chance of full recovery from database or partition failures, you may want to change the save_interval setting.

In case of a partition failure on the stable queues, be sure your setting allows adequate time to restore your system. Consider also the size of the partitions that are allocated for backlogged messages. Partitions must be large enough to hold the extra messages.

Refer to the Replication Server Design Guide capacity planning guidelines for help in determining queue space requirements.