Errors from resource groups running on two nodes

The following errors occur when FailSafe determines that part of the resource group is running on at least two different nodes in the cluster. This may be caused by a failed start, then a forced offline:

A resource group that starts on the wrong node may also occur when the FailSafe Services have recently started, and they haven’t quiesced. Try waiting a minute or two after starting the FailSafe Services on a node before moving a resource group.

Perform the following to clear the error:

  1. Force the Resource Group Offline:

    1. Select Tasks.

    2. Select Resource Groups from the pull-down menu.

    3. Select Bring a Resource Group Offline and fill in the following fields:

      Detach Only

      Sample value

      Description

      Detach Only

      Unchecked

      Stops monitoring the resource group. The resource group will not be stopped, but FailSafe will not have any control over the group.

      Detach Force

      Unchecked

      Same as Detach Only. In addition, FailSafe clears all errors.

      Force Offline

      Check

      Stops all resources in the group and clear all errors.

      Group to Take Offline

      Unchecked

      Select the name of the resource group you want to take offline. The menu displays only resource groups that are currently online.

    If you are using the command line, enter the following:

    cluster_mgr –f pri_offline_rg_force_hard
    cluster_mgr –f sec_offline_rg_force_hard
    
  2. Verify that no resources are still online and running on any node. Adaptive Server should not be running, and any logical volumes should be dismounted – check with the df(1) command.

  3. Verify that Adaptive Server is not running on either node. If Adaptive Server is still running, determine its process id number and kill(1) it. If you have configured multiple engines, terminate them as well.

  4. Make sure the volumes are still mounted on each node. Use the umount(1M) command to dismount any volumes that need to be dismounted.

  5. Verify that the volumes are disassembled on each node. Perform the following:

    1. Make sure that volumes listed in the resource are not in the kernel’s memory. Enter the following at the command line:

      xlv_mgr –c ‘show kernel’
      
    2. If volumes are listed which belong to the offline resource group, disassemble them. The xlv_mgr command lists the volume names which can be fed to the xlv_shutdown command. For example, xlv_mgr displays something similar to the following:

      VOL xlv1								flags=0x1, [complete]													(node=NULL)
      DATA					flags=0x0()						open_flag=0x0() device=(192, 5)
      

      The volume name is x1v1. To shut it down, enter:

      xlv_shutdown –n xlv1
      
    3. Check that the volumes have the ownership set to none. For example, the following shows the volumes before their ownership is set to none:

      #xlv_mgr -c 'show all_objects'
      #Volume:							xlv2 (complete)
      #Volume:							xlv1 (complete; node=none)
      #
      #Vol: 2; Standalone Plex: 0; Standalone Ve: 0
      

      and then after their ownership is set to none:

      #xlv_mgr -c 'show all_objects'
      #Volume:							xlv2 (complete; node=none)
      #Volume:							xlv1 (complete; node=none)
      #
      # Vol: 2; Standalone Plex: 0; Standalone Ve: 0
      #
      
    4. Run the following from the command line:

      xlv_mgr –c ‘show all_objects’
      
    5. Set xlv2’s node name to be none:

      xlv_mgr –c ‘change nodename none xlv2’
      
    6. Verify that all works correctly:

      xlv_mgr –c ‘show all_objects’