Error 13045

Replication has been suspended because Replication Server System Database (RSSD) restarted.

Symptom

These messages are reported in the Replication Server error log:
E. 2006/06/13 14:50:16. ERROR #13045 SQT(101:1 DIST westss.eastlp) -
seful/cm.c(3914)
Failed to connect to server 'westss' as user 'westrs_rssd_prim'. See CT-Lib
and/or server error messages for more information.
I. 2006/06/13 14:50:17. Trying to connect to server 'westss' as user
'westrs_rssd_prim' ......
After the Adaptive Server with the RSSD has restarted, these messages are reported in the Replication Server error log:
E. 2006/06/13 17:04:52. ERROR #1027 dSUB( ) -
seful/cm.c(3909)
Open Client Client-Library error: Error: 84083972,
Severity 5 -- 'ct_connect():
network packet layer: internal net library error: Net-
Lib protocol driver call to connect two endpoints
failed', Operating System error 0 -- 'Socket connect
failed - errno 146 Connection refused'.
E. 2006/06/13 17:04:52. ERROR #13045 dSUB( ) -
seful/cm.c(3914)
Failed to connect to server 'westss' as user 'amerttp'.
See CT-Lib and/or server error messages for more
information.
I. 2006/06/13 17:04:52. Trying to connect to server
'westss' as user 'westrs_rssd_prim' ......
E. 2006/06/13 17:04:57. ERROR #1027 dSUB( ) -
seful/cm.c(3909)
Open Client Client-Library error: Error: 84083972,
Severity 5 -- 'ct_connect():
network packet layer: internal net library error: Net-
Lib protocol driver call to connect two endpoints
failed', Operating System error 0 -- 'Socket
connectfailed - errno 146 Connection refused'.
E. 2006/06/13 17:05:56. ERROR #13043 USER(westss_ra) - ul/cmapp.c(888)
Failed to execute the 'USE westss_rssd' command on
server 'westss'. See CT-Lib and SQL Server error
messages for more information.
E. 2006/06/13 17:05:56. ERROR #1028 USER(westss_ra) -
ul/cmapp.c(888)
Message from server: Message: 911, State 2, Severity 11
-- 'Attempt to locate entry in sysdatabases for database
'westss_rssd' by name failed - no entry found under that
name. Make sure that name is entered properly.'.
I. 2006/06/13 17:05:56. Message from server: Message:
5701, State 1, Severity 10 -- 'Changed database context to 'master'.'.
E. 2006/06/13 17:05:56. ERROR #13045 USER(westss_ra) - seful/cm.c(3318)
Failed to connect to server 'westss' as user
'westrs_rssd_prim'. See CT-Lib and/or server error
messages for more information.
E. 2006/06/13 17:05:56. ERROR #1028 USER(westss_ra) -
seful/cm.c(3318)
Message from server: Message: 911, State 2, Severity 11
-- 'Attempt to locate entry in sysdatabases for database
'westss_rssd' by name failed - no entry found under that
name. Make sure that name is entered properly.'.
I. 2006/06/13 17:05:56. Message from server: Message:
5701, State 1, Severity 10
-- 'Changed database context to 'master'.'.
E. 2006/06/13 17:05:56. ERROR #13043 dREC(dREC)--
ul/cmapp.c(888)
Failed to execute the 'USE westss_rssd' command on
server 'westss'. See CT-Lib and SQL Server error
messages for more information.

Explanation

The Adaptive Server that controls the Replication Server System Database (RSSD) was shut down and restarted while the Replication Server was running. The Distributor (DIST) and Stable Queue Transaction (SQT) threads to the databases controlled by the Replication Server were terminated. Replication to those databases was terminated and does not resume even after the RSSD becomes available again.

Running the admin who_is_down command at the Replication Server shows that both DIST and SQT threads are down:
Spid    Name      State      Info
----    ------    -------    ----------------------
        DIST      Down       westernDS.westDB
        SQT       Down       105:1 westernDS.westDB

Solution

  1. At the Replication Server, execute resume distributor for each database to resume SQT and DIST threads.

  2. Run admin who_is_down at each database to verify that the SQT and DIST threads are up.