The following errors occur when FailSafe determines that part of the resource group is running on at least two different nodes in the cluster. This may be caused by a failed start, then a forced offline:
SPLIT RESOURCE
error.
Resource groups starting on wrong nodes.
srmd
executable
error.
A resource group that starts on the wrong node may also occur when the FailSafe Services have recently started, and they haven’t quiesced. Try waiting a minute or two after starting the FailSafe Services on a node before moving a resource group.
Perform the following to clear the error:
Force the Resource Group Offline:
Select Tasks.
Select Resource Groups from the pull-down menu.
Select Bring a Resource Group Offline and fill in the following fields:
Detach Only |
Sample value |
Description |
---|---|---|
Detach Only |
Unchecked |
Stops monitoring the resource group. The resource group will not be stopped, but FailSafe will not have any control over the group. |
Detach Force |
Unchecked |
Same as Detach Only. In addition, FailSafe clears all errors. |
Force Offline |
Check |
Stops all resources in the group and clear all errors. |
Group to Take Offline |
Unchecked |
Select the name of the resource group you want to take offline. The menu displays only resource groups that are currently online. |
If you are using the command line, enter the following:
cluster_mgr –f pri_offline_rg_force_hard cluster_mgr –f sec_offline_rg_force_hard
Verify that no resources are still online and running on any node. Adaptive Server should not be running, and any logical volumes should be dismounted – check with the df(1) command.
Verify that Adaptive Server is not running on either node. If Adaptive Server is still running, determine its process id number and kill(1) it. If you have configured multiple engines, terminate them as well.
Make sure the volumes are still mounted on each node. Use the umount(1M) command to dismount any volumes that need to be dismounted.
Verify that the volumes are disassembled on each node. Perform the following:
Make sure that volumes listed in the resource are not in the kernel’s memory. Enter the following at the command line:
xlv_mgr –c ‘show kernel’
If volumes are listed which belong to the offline resource group, disassemble them. The xlv_mgr command lists the volume names which can be fed to the xlv_shutdown command. For example, xlv_mgr displays something similar to the following:
VOL xlv1 flags=0x1, [complete] (node=NULL) DATA flags=0x0() open_flag=0x0() device=(192, 5)
The volume name is x1v1. To shut it down, enter:
xlv_shutdown –n xlv1
Check that the volumes have the ownership set to none. For example, the following shows the volumes before their ownership is set to none:
#xlv_mgr -c 'show all_objects' #Volume: xlv2 (complete) #Volume: xlv1 (complete; node=none) # #Vol: 2; Standalone Plex: 0; Standalone Ve: 0
and then after their ownership is set to none:
#xlv_mgr -c 'show all_objects' #Volume: xlv2 (complete; node=none) #Volume: xlv1 (complete; node=none) # # Vol: 2; Standalone Plex: 0; Standalone Ve: 0 #
Run the following from the command line:
xlv_mgr –c ‘show all_objects’
Set xlv2’s node name to be none:
xlv_mgr –c ‘change nodename none xlv2’
Verify that all works correctly:
xlv_mgr –c ‘show all_objects’