Chapter 14: Configuring Adaptive Server for Failover on SGI IRIX

Errors from resource groups running on two nodes

The following errors occur when FailSafe determines that part of the resource group is running on at least two different nodes in the cluster. This may be caused by a failed start followed by the server being forced offline:

A SPLIT RESOURCE error.
An error caused by the resource groups starting on wrong nodes.
An srmd executable error.

An error caused by the resource group starting on the wrong node may also occur when you have recently started the FailSafe Services, but they have not yet quiesced. Try waiting a minute or two after starting the FailSafe services before moving a resource group.

Perform the following to clear this error:

Force the Resource Group Offline:

Select Tasks from fstask.
Select Resource Groups.

Select Bring a Resource Group Offline and complete the following fields:

Field	Sample value	Description
Detach Only	Unchecked	Stops monitoring the resource group. The resource group will not be stopped, but FailSafe will not have any control over the group.
Detach Force	Unchecked	Same as Detach Only. In addition, FailSafe clears all errors.
Force Offline	Check	Stops all resources in the group and clear all errors.
Group to Take Offline	Unchecked	Select the name of the resource group you want to take offline. The menu displays only resource groups that are currently online.

If you are using the command line, enter:

cluster_mgr –f pri_offline_rg_force_hard
cluster_mgr –f sec_offline_rg_force_hard

Verify that no resources are online and running on any node. Adaptive Server should not be running, and any logical volumes should be unmounted. You can verify that they are unmounted with the df(1) command.
Verify that Adaptive Server is not running on either node. If Adaptive Server is still running, determine its process ID number to stop the process. If you have configured multiple engines, terminate them as well.
Make sure the volumes are still mounted on each node. Use the umount(1M) command to unmount any volumes that need to be unmounted.
Verify that the volumes are disassembled on each node:
1. Make sure that volumes listed in the resource are not in the kernel’s memory. At the command line, enter:
```
xlv_mgr –c ‘show kernel’
```
2. If volumes that belong to the offline resource group are listed, disassemble them. The xlv_mgr command lists the volume names which can be fed to the xlv_shutdown command. For example, xlv_mgr displays something similar to the following:
```
VOL xlv1								flags=0x1, [complete]													(node=NULL)
DATA					flags=0x0()						open_flag=0x0() device=(192, 5)
```
  The volume name is x1v1. To shut it down, enter:
```
xlv_shutdown –n xlv1
```
3. Check that the volumes have the ownership set to none. For example, the following shows the volumes before their ownership is set to none:
```
#xlv_mgr -c 'show all_objects'
#Volume:							xlv2 (complete)
#Volume:							xlv1 (complete; node=none)
#
#Vol: 2; Standalone Plex: 0; Standalone Ve: 0
```
  and then after their ownership is set to none:
```
#xlv_mgr -c 'show all_objects'
#Volume:							xlv2 (complete; node=none)
#Volume:							xlv1 (complete; node=none)
#
# Vol: 2; Standalone Plex: 0; Standalone Ve: 0
#
```
4. From the command line, enter:
```
xlv_mgr –c ‘show all_objects’
```
5. Set xlv2’s node name to be none:
```
xlv_mgr –c ‘change nodename none xlv2’
```
6. Verify that all works correctly:
```
xlv_mgr –c ‘show all_objects’
```

View this book as PDF