Replacing a faulty node with a spare node

You can use the SAN Volume Controller Console and the SAN Volume Controller front panel to replace a faulty node in a cluster.

Before you attempt to replace a faulty node with a spare node you must ensure that you meet the following requirements:
  • SAN Volume Controller version 3.1.0 or higher is installed on the cluster and on the spare node.
  • You know the name of the cluster that contains the faulty node.
  • A spare node is installed in the same rack as the cluster that contains the faulty node.
  • Record of the last five characters of the original worldwide node name (WWNN) of the spare node.
    Note: A repaired faulty node, which has been successfully replaced in the cluster with a spare node using the original WWPN of the faulty node, must be assigned a new unique WWNN. You can use the original WWNN of the spare node as the new WWNN of the repaired node.
Attention: Never connect a node with a WWNN of 00000 to the cluster. If this node is no longer required as a spare and is to be used for normal attachment to a cluster, you must change the WWNN to the number you recorded when a spare was created. Using any other number might cause data corruption.

If a node fails, the cluster continues to operate with degraded performance, until the faulty node is repaired. If the repair operation takes an unacceptable amount of time, it is useful to replace the faulty node with a spare node. However, the appropriate procedures must be followed and precautions must be taken so you do not interrupt I/O operations and compromise the integrity of your data.

The following table describes the changes that are made to your configuration when you replace a faulty node in the cluster:
Node attributes Description
Front panel ID This is the number that is printed on the front of the node and is used to select the node that is added to a cluster.
Node ID This is the ID that is assigned to the node. A new node ID is assigned each time a node is added to a cluster; the node name remains the same following service activity on the cluster. You can use the node ID or the node name to perform management tasks on the cluster. However, if you are using scripts to perform those tasks, use the node name rather than the node ID. This ID will change during this procedure.
Node name This is the name that is assigned to the node. If you do not specify a name, the SAN Volume Controller assigns a default name. The SAN Volume Controller creates a new default name each time a node is added to a cluster. If you choose to assign your own names, you must type the node name on the Adding a node to a cluster panel. You cannot manually assign a name that matches the naming convention used for names assigned automatically by SAN Volume Controller. If you are using scripts to perform management tasks on the cluster and those scripts use the node name, you can avoid the need to make changes to the scripts by assigning the original name of the node to a spare node. This name might change during this procedure.
Worldwide node name This is the WWNN that is assigned to the node. The WWNN is used to uniquely identify the node and the fibre-channel ports. During this procedure, the WWNN of the spare node is changed to that of the faulty node. The node replacement procedures must be followed exactly to avoid any duplication of WWNNs. This name does not change during this procedure.
Worldwide port names These are the WWPNs that are assigned to the node. WWPNs are derived from the WWNN that is written to the spare node as part of this procedure. For example, if the WWNN for a node is 50050768010000F6, the four WWPNs for this node are derived as follows:
WWNN                          50050768010000F6
WWNN displayed on front panel 000F6
WWPN Port 1                   50050768014000F6
WWPN Port 2                   50050768013000F6
WWPN Port 3                   50050768011000F6
WWPN Port 4                   50050768012000F6
These names do not change during this procedure.

This task assumes that you have already launched the SAN Volume Controller Console.

Complete the following steps to replace a faulty node in the cluster:

  1. Verify the name and ID of the node that you want to replace.

    Complete the following steps to verify the name and ID:

    1. Make sure that the SAN Volume Controller Console application is running on the cluster that contains the faulty node.
    2. Click Work with Nodes > Nodes in the portfolio. The Viewing Nodes panel is displayed. If the node is faulty, it is shown as offline.
    3. Ensure the partner node in the I/O group is online.
    • If the other node in the I/O group is offline, start the Directed Maintenance Procedures (DMPs) to determine the fault.
    • If you have been directed here by the DMPs, and subsequently the partner node in the I/O group has failed, recover the offline VDisks.
    • If you are replacing the node for other reasons, determine the node that you want to replace and ensure that the partner node in the I/O group is online.
    • If the partner node is offline, you will lose access to the VDisks that belong to this I/O group. Start the DMPs and fix the other node before proceeding to the next step.
  2. Click the name of the faulty (offline) node. The Viewing General Details panel is displayed.
  3. Click General and record the following attributes for the faulty node:
    • ID
    • WWNN
    • I/O Group
    • UPS Serial Number
    • Uninterruptible power supply serial number
  4. Click Close. Click Fibre Channel Port and record the following attribute for the faulty node:
    • WWPNs
  5. Click Close. Click Vital Product Data and record the following attribute for the faulty node:
    • System Serial Number
  6. Ensure that the faulty node has been powered off.
  7. Use the SAN Volume Controller Console to delete the faulty node from the cluster.
    Remember: You must record the following information to avoid data corruption when this node is re-added to the cluster:
    • Node serial number
    • WWNN
    • All WWPNs
    • I/O group that contains the node
  8. Disconnect all four fibre-channel cables from the node.
    Important: Do not plug the fibre-channel cables into the spare node until the spare node is configured with the WWNN of the faulty node.
  9. Connect the power and signal cables from the spare node to the uninterruptible power supply that has the serial number you recorded in step 3.
    Note: For 2145 UPS units, you can plug the signal cable into any vacant position on the top row of serial connectors on the 2145 UPS. If no spare serial connectors are available on the 2145 UPS, disconnect the cables from the faulty node. For 2145 UPS-1U units, you must disconnect the cables from the faulty node.
  10. Power on the spare node.
  11. You must change the WWNN of the spare node to that of the faulty node. The procedure for doing this depends on the SAN Volume Controller version that is installed on the spare node. Press and release the down button until the Node: panel displays. Then press and release the right button until the WWNN: panel displays. If repeated pressing of the right button returns you to the Node: panel, without displaying a WWNN: panel, go to step 13; otherwise, continue with step 12.
  12. Change the WWNN of the spare node (with SAN Volume Controller V4.3 and above installed) to match the WWNN of the faulty node by performing the following steps:
    1. With the Node WWNN: panel displayed, press and hold the down button, press and release the select button, and then release the down button. The display switches into edit mode. Edit WWNN is displayed on line 1. Line 2 of the display contains the last five numbers of the WWNN.
    2. Change the WWNN that is displayed to match the last five numbers of the WWNN that you recorded in step 3. To edit the highlighted number, use the up and down buttons to increase or decrease the numbers. The numbers wrap F to 0 or 0 to F. Use the left and right buttons to move between the numbers.
    3. When the five numbers match the last five numbers of the WWNN that you recorded in step 3, press the select button to accept the numbers.
  13. Change the WWNN of the spare node (with SAN Volume Controller versions prior to V4.3 installed) to match the WWNN of the faulty node by performing the following steps:
    1. Press and release the right button until the Status: panel is displayed.
    2. With the node status displayed on the front panel, press and hold the down button; press and release the select button; release the down button. WWNN is displayed on line 1 of the display. Line 2 of the display contains the last five numbers of the WWNN.
    3. With the WWNN displayed on the front panel; press and hold the down button; press and release the select button; release the down button. The display switches into edit mode.
    4. Change the WWNN that is displayed to match the last five numbers of the WWNN that you recorded in step 3. To edit the highlighted number, use the up and down buttons to increase or decrease the numbers. The numbers wrap F to 0 or 0 to F. Use the left and right buttons to move between the numbers.
    5. When the five numbers match the last five numbers of the WWNN that you recorded in step 3, press the select button to accept the numbers.
    6. Press the select button to retain the numbers that you have updated and return to the WWNN panel.
  14. Connect the four fibre-channel cables that you disconnected from the faulty node and connect them to the spare node.

    If the spare node has less Ethernet cables connected than the faulty node, move the Ethernet cables from the faulty node to the spare node. Ensure you connect the cable into the same port on the spare node as it was in on the faulty node.

  15. Use the SAN Volume Controller Console to add the spare node to the cluster. If possible, use the same node name that was used for the faulty node. If necessary, the spare node is updated to the same SAN Volume Controller version as the cluster. This update can take up to 20 minutes.
  16. Use the tools that are provided with your multipathing device driver on the host systems to verify that all paths are now online. See the documentation that is provided with your multipathing device driver for more information. For example, if you are using the subsystem device driver (SDD), see the IBM® System Storage® Multipath Subsystem Device Driver User's Guide for instructions on how to use the SDD management tool on host systems. It might take up to 30 minutes for the paths to come online.
  17. Repair the faulty node.
    Attention: When the faulty node is repaired, do not connect the fibre-channel cables to it. Connecting the cables might cause data corruption because the spare node is using the same WWNN as the faulty node.

    If you want to use the repaired node as a spare node, perform the following steps.

    For SAN Volume Controller V4.3 and above:

    1. With the Node WWNN: panel displayed, press and hold the down button, press and release the select button, and then release the down button.
      The display switches into edit mode. Edit WWNN is displayed on line 1. Line 2 of the display contains the last five numbers of the WWNN.
    2. Change the displayed number to 00000. To edit the highlighted number, use the up and down buttons to increase or decrease the numbers. The numbers wrap F to 0 or 0 to F. Use the left and right buttons to move between the numbers.
    3. Press the select button to accept the numbers.

      This node can now be used as a spare node.

    For SAN Volume Controller versions prior to V4.3:

    1. Press and release the right button until the Status: panel is displayed.
    2. With the node status displayed on the front panel, press and hold the down button; press and release the select button; release the down button. WWNN is displayed on line 1 of the display. Line 2 of the display contains the last five numbers of the WWNN.
    3. With the WWNN displayed on the front panel; press and hold the down button; press and release the select button; release the down button. The display switches into edit mode.
    4. Change the displayed number to 00000. To edit the highlighted number, use the up and down buttons to increase or decrease the numbers. The numbers wrap F to 0 or 0 to F. Use the left and right buttons to move between the numbers.
    5. Press the select button to accept the numbers.
    6. Press the select button to retain the numbers that you have updated and return to the WWNN panel.

      This node can now be used as a spare node.

Library | Support | Terms of use | Feedback
© Copyright IBM Corporation 2003, 2009. All Rights Reserved.