Before you attempt to replace a faulty node with a spare
node you must ensure that you meet the following requirements:
- You know the name of the cluster that contains the faulty node.
- A spare node is installed in the same rack as the cluster that
contains the faulty node.
- You must
make a record of the last five characters of the original worldwide
node name (WWNN) of the spare node. If you repair a faulty node, and
you want to make it a spare node, you can use the WWNN of the node.
You do not want to duplicate the WWNN because it is unique. It is
easier to swap in a node when you use the WWNN.
Attention: Never connect a node with a WWNN of 00000 to
the cluster. If this node is no longer required as a spare and is
to be used for normal attachment to a cluster, you must change the
WWNN to the number you recorded when a spare was created. Using any
other number might cause data corruption.
If a node fails, the cluster continues to operate with
degraded performance until the faulty node is repaired. If the repair
operation takes an unacceptable amount of time, it is useful to replace
the faulty node with a spare node. However, the appropriate procedures
must be followed and precautions must be taken so you do not interrupt
I/O operations and compromise the integrity of your data.
The
following table describes the changes that are made to your configuration
when you replace a faulty node in the cluster:
| Node attributes |
Description |
| Front panel ID |
This is the number that is printed on the front
of the node and is used to select the node that is added to a cluster. |
| Node ID |
This is the ID that is assigned to the node.
A new node ID is assigned each time a node is added to a cluster;
the node name remains the same following service activity on the cluster.
You can use the node ID or the node name to perform management tasks
on the cluster. However, if you are using scripts to perform those
tasks, use the node name rather than the node ID. This ID will change
during this procedure. |
| Node name |
This is the name that is assigned to the node. If you are using SAN Volume Controller version
5.1.0 nodes, the SAN Volume Controller automatically
re-adds nodes that have failed back to the cluster. If the cluster
reports an error for a node missing (error code 1195) and that node
has been repaired and restarted, the cluster automatically re-adds
the node back into the cluster. For releases
prior to 5.1.0, if you
do not specify a name, the SAN Volume Controller assigns
a default name. The SAN Volume Controller creates
a new default name each time a node is added to a cluster. If you
choose to assign your own names, you must type the node name on the
Adding a node to a cluster panel. You cannot manually assign a
name that matches the naming convention used for names assigned automatically
by SAN Volume Controller. If
you are using scripts to perform management tasks on the cluster and
those scripts use the node name, you can avoid the need to make changes
to the scripts by assigning the original name of the node to a spare
node. This name might change during this procedure. |
| Worldwide node name |
This is the WWNN that is assigned to the node.
The WWNN is used to uniquely identify the node and the fibre-channel
ports. During this procedure, the WWNN of the spare node
changes to that of the faulty node. The node replacement procedures
must be followed exactly to avoid any duplication of WWNNs. This
name does not change during this procedure. |
| Worldwide port names |
These are the WWPNs that are assigned to the
node. WWPNs are derived from the WWNN that is written to the spare
node as part of this procedure. For example, if the WWNN for a node
is 50050768010000F6, the four WWPNs for
this node are derived as follows: WWNN 50050768010000F6
WWNN displayed on front panel 000F6
WWPN Port 1 50050768014000F6
WWPN Port 2 50050768013000F6
WWPN Port 3 50050768011000F6
WWPN Port 4 50050768012000F6
These names
do not change during this procedure. |
Complete the following steps to replace a faulty node
in the cluster:
- Verify the name and ID of the node that you
want to replace.
Complete the following step to verify
the name and ID:
- Issue the svcinfo lsnode CLI command to ensure
that the partner node in the I/O group is online.
- If the other node in the I/O group is offline, start Directed
Maintenance Procedures (DMPs) to determine the fault.
- If you have been directed here by the DMPs, and subsequently
the partner node in the I/O group has failed, see the procedure for
recovering from offline VDisks after a node or an I/O group failed.
- If you are replacing the node for other reasons, determine
the node you want to replace and ensure that the partner node in the
I/O group is online.
- If the partner node is offline, you will lose access to the
VDisks that belong to this I/O group. Start the DMPs and fix the other
node before proceeding to the next step.
- Find and record the following information
about the faulty node using Steps 2a through 2h:
- Node serial number
- Worldwide node name
- All of the worldwide port names
- Name or ID of the I/O group that contains the node
- Front panel ID
- Uninterruptible power supply serial number
- Issue the svcinfo lsnode CLI command
to find and record the node name and I/O group name. The faulty node
will be offline.
- Issue the following CLI command:
svcinfo lsnodevpd nodename
Where nodename is
the name that you recorded in step 2a.
- Find the WWNN field in
the output.
- Record
the last five characters of the WWNN.
- Find the front_panel_id field
in the output.
- Record the front panel ID.
- Find the UPS_serial_number field
in the output.
- Record
the uninterruptible
power supply serial
number.
- Ensure that the faulty node has been powered off.
- Issue the following CLI command to remove the faulty node
from the cluster:
svctask rmnode nodename/id
Where nodename/id is
the name or ID of the faulty node.
- Disconnect all four fibre-channel cables from the node.
Important: Do not plug the fibre-channel cables
into the spare node until the spare node is configured with the WWNN
of the faulty node.
- Connect the power and signal cables from
the spare node to the uninterruptible
power supply that
has the serial number you recorded in step 2.h.
Note: For 2145 UPS-1U units,
you must disconnect the cables from the faulty node.
- Disconnect the faulty node's power and
serial cable from the 2145 UPS-1U and connect the new node's
power and signal cable in their place.
- Power on the spare node.
- Display the node status on the front-panel display.
- You must change the WWNN of the spare node
to that of the faulty node. The procedure for doing this depends
on the SAN Volume Controller version
that is installed on the spare node. Press and release the down
button until the Node: panel displays. Then press and release
the right button until the WWNN: panel displays. If repeated pressing
of the right button returns you to the Node: panel, without displaying
a Node WWNN: panel, go to step 12; otherwise,
continue with step 11.
- Change the WWNN of the spare node (with SAN Volume Controller V4.3
and above installed) to match the WWNN of the faulty node by completing
the following steps:
- With the Node WWNN: panel displayed, press and
hold the down button, press and release the select button, and then
release the down button.The display switches into edit mode. Edit
WWNN is displayed on line 1. Line 2 of the display
contains the last five numbers of the WWNN.
- Change the WWNN that is displayed to match the
last five numbers of the WWNN that you recorded in step 13. To edit the highlighted number, use the up and down buttons
to increase or decrease the numbers. The numbers wrap F to 0 or 0
to F. Use the left and right buttons to move between the numbers.
- When the five numbers match the last five numbers
of the WWNN that you recorded in step 2.d, press the
select button to accept the numbers.
- Change the WWNN of the spare node (with SAN Volume Controller versions
prior to V4.3 installed) to match the WWNN of the faulty node
by performing the following steps:
- Press and release the right button until the
Status: panel is displayed.
- With the node status displayed on the front panel,
press and hold the down button; press and release the select button;
release the down button. WWNN is displayed
on line 1 of the display. Line 2 of the display contains the last
five numbers of the WWNN.
- With the WWNN displayed on the front panel; press
and hold the down button; press and release the select button; release
the down button. The display switches into edit mode.
- Change the WWNN that is displayed to match the
last five numbers of the WWNN that you recorded in step 2.d. To edit the highlighted number, use the up and down buttons
to increase or decrease the numbers. The numbers wrap F to 0 or 0
to F. Use the left and right buttons to move between the numbers.
- When the five numbers match the last five numbers
of the WWNN that you recorded in step 2.d, press the select button to retain the numbers
that you have updated and return to the WWNN panel.
- Press the select button to apply the numbers
as the new WWNN for the node.
- Connect the four fibre-channel cables
that you disconnected from the faulty node to the spare node.
If
the spare node has less Ethernet cables connected than the faulty
node, move the Ethernet cables from the faulty node to the spare node.
Ensure you connect the cable into the same port on the spare node
as it was in on the faulty node.
- Issue the following command to add the spare node to the
cluster:
svctask addnode -wwnodename WWNN -iogrp iogroupname/id
where WWNN and iogroupname/id are
the values that you recorded for the original node.
The SAN Volume Controller V5.1
automatically reassigns the node with the name that was used originally.
For versions prior to V5.1, use the name parameter
with the svctask addnode command to assign a name. If
the original node's name was automatically assigned by SAN Volume Controller,
it is not possible to reuse the same name. It was automatically assigned
if its name starts with node. In this
case, either specify a different name that does not start with node or
do not use the name parameter so that SAN Volume Controller automatically
assigns a new name to the node.
If necessary, the
new node is updated to the same SAN Volume Controller software
version as the cluster. This update can take up to 20 minutes.
- Use the tools that are provided with your multipathing
device driver on the host systems to verify that all paths are now
online. See the documentation that is provided with your multipathing
device driver for more information. For example, if
you are using the subsystem device driver (SDD), see the IBM® System Storage® Multipath
Subsystem Device Driver User's
Guide for
instructions on how to use the SDD management tool on host systems. It
might take up to 30 minutes for the paths to come online.
- Repair the faulty node.
Attention: When
the faulty node is repaired, do not connect the fibre-channel cables
to it. Connecting the cables might cause data corruption because the
spare node is using the same WWNN as the faulty node.
If
you want to use the repaired node as a spare node, perform the following
steps.
For SAN Volume Controller V4.3
and later versions:
- With the Node WWNN: panel displayed, press and
hold the down button, press and release the select button, and then
release the down button.
- The display switches into edit mode. Edit
WWNN is displayed on line 1. Line 2 of the display
contains the last five numbers of the WWNN.
- Change the displayed number to 00000. To edit the highlighted number, use the up and down buttons
to increase or decrease the numbers. The numbers wrap F to 0 or 0
to F. Use the left and right buttons to move between the numbers.
- Press the
select button to accept the numbers.
This node can now be used as a spare node.
For SAN Volume Controller versions
prior to V4.3:
- Press and release the right button until the
Status: panel is displayed.
- With the node status displayed on the front panel,
press and hold the down button; press and release the select button;
release the down button. WWNN is displayed
on line 1 of the display. Line 2 of the display contains the last
five numbers of the WWNN.
- With the WWNN displayed on the front panel; press
and hold the down button; press and release the select button; release
the down button. The display switches into edit mode.
- Change the displayed number to 00000. To edit the highlighted number, use the up and down buttons
to increase or decrease the numbers. The numbers wrap F to 0 or 0
to F. Use the left and right buttons to move between the numbers.
- Press the
select button to accept the numbers.
- Press
the select button to retain the numbers that you have updated and
return to the WWNN panel.
This
node can now be used as a spare node.