Before you attempt to replace a faulty node with a spare
node you must ensure that you meet the following requirements:
- SAN Volume Controller version
3.1.0 or higher is installed on the cluster and on the spare node.
- You know the name of the cluster that contains the faulty node.
- A spare node is installed in the same rack as the cluster that
contains the faulty node.
- Record of
the last five characters of the original worldwide node name (WWNN)
of the spare node.
Note: A repaired faulty node, which has been successfully
replaced in the cluster with a spare node using the original WWPN
of the faulty node, must be assigned a new unique WWNN. You can use
the original WWNN of the spare node as the new WWNN of the repaired
node.
Attention: Never connect a node with a WWNN of 00000 to
the cluster. If this node is no longer required as a spare and is
to be used for normal attachment to a cluster, you must change the
WWNN to the number you recorded when a spare was created. Using any
other number might cause data corruption.
If a node fails, the cluster continues to operate with
degraded performance, until the faulty node is repaired. If the repair
operation takes an unacceptable amount of time, it is useful to replace
the faulty node with a spare node. However, the appropriate procedures
must be followed and precautions must be taken so you do not interrupt
I/O operations and compromise the integrity of your data.
The
following table describes the changes that are made to your configuration
when you replace a faulty node in the cluster:
| Node attributes |
Description |
| Front panel ID |
This is the number that is printed on the front
of the node and is used to select the node that is added to a cluster. |
| Node ID |
This is the ID that is assigned to the node.
A new node ID is assigned each time a node is added to a cluster;
the node name remains the same following service activity on the cluster.
You can use the node ID or the node name to perform management tasks
on the cluster. However, if you are using scripts to perform those
tasks, use the node name rather than the node ID. This ID will
change during this procedure. |
| Node name |
This is the name that is assigned to the node.
If you do not specify a name, the SAN Volume Controller assigns
a default name. The SAN Volume Controller creates
a new default name each time a node is added to a cluster. If you
choose to assign your own names, you must type the node name on the
Adding a node to a cluster panel. You cannot manually assign a
name that matches the naming convention used for names assigned automatically
by SAN Volume Controller. If
you are using scripts to perform management tasks on the cluster and
those scripts use the node name, you can avoid the need to make changes
to the scripts by assigning the original name of the node to a spare
node. This name might change during this procedure. |
| Worldwide node name |
This is the WWNN that is assigned to the node.
The WWNN is used to uniquely identify the node and the fibre-channel
ports. During this procedure, the WWNN of the spare node
is changed to that of the faulty node. The node replacement procedures
must be followed exactly to avoid any duplication of WWNNs. This
name does not change during this procedure. |
| Worldwide port names |
These are the WWPNs that are assigned to the
node. WWPNs are derived from the WWNN that is written to the spare
node as part of this procedure. For example, if the WWNN for a node
is 50050768010000F6, the four WWPNs for
this node are derived as follows: WWNN 50050768010000F6
WWNN displayed on front panel 000F6
WWPN Port 1 50050768014000F6
WWPN Port 2 50050768013000F6
WWPN Port 3 50050768011000F6
WWPN Port 4 50050768012000F6
These
names do not change during this procedure. |
This task assumes that you have already launched the SAN Volume Controller Console.
Complete
the following steps to replace a faulty node in the cluster:
- Verify the name and ID of the
node that you want to replace.
Complete the following
steps to verify the name and ID:
- Make sure that the SAN Volume Controller Console application
is running on the cluster that contains the faulty node.
- Click in the portfolio. The Viewing
Nodes panel is displayed. If the node is faulty, it is shown as offline.
- Ensure the partner node in the I/O group is online.
- If the other node in the I/O group is offline, start the Directed
Maintenance Procedures (DMPs) to determine the fault.
- If you have been directed here by the DMPs, and subsequently
the partner node in the I/O group has failed, recover the offline
VDisks.
- If you are replacing the node for other reasons, determine
the node that you want to replace and ensure that the partner node
in the I/O group is online.
- If the partner node is offline, you will lose access to the
VDisks that belong to this I/O group. Start the DMPs and fix the other
node before proceeding to the next step.
- Click the name of the faulty (offline) node.
The Viewing General Details panel is displayed.
- Click General and
record the following attributes for the faulty node:
- ID
- WWNN
- I/O Group
- UPS Serial Number
- Uninterruptible power
supply serial
number
- Click Close. Click Fibre Channel
Port and record the following attribute for the faulty node:
- Click Close. Click Vital Product
Data and record the following attribute for the faulty node:
- Ensure that the faulty node has been powered off.
- Use the SAN Volume Controller Console to
delete the faulty node from the cluster.
Remember: You
must record the following information to avoid data corruption when
this node is re-added to the cluster:
- Node serial number
- WWNN
- All WWPNs
- I/O group that contains the node
- Disconnect all four fibre-channel cables from the node.
Important: Do not plug the fibre-channel cables
into the spare node until the spare node is configured with the WWNN
of the faulty node.
- Connect the power and signal cables from the spare node
to the uninterruptible
power supply that
has the serial number you recorded in step 3.
Note: For 2145
UPS units,
you can plug the signal cable into any vacant position on the top
row of serial connectors on the 2145
UPS.
If no spare serial connectors are available on the 2145
UPS,
disconnect the cables from the faulty node. For 2145 UPS-1U units,
you must disconnect the cables from the faulty node.
- Power on the spare node.
- You must change the WWNN of the spare node to that of the
faulty node. The procedure for doing this depends on the SAN Volume Controller version
that is installed on the spare node. Press and release the down
button until the Node: panel displays. Then press and release
the right button until the WWNN: panel displays. If repeated pressing
of the right button returns you to the Node: panel, without displaying
a WWNN: panel, go to step 13;
otherwise, continue with step 12.
- Change the WWNN of the spare node (with SAN Volume Controller V4.3
and above installed) to match the WWNN of the faulty node by performing
the following steps:
- With the Node WWNN: panel displayed, press and
hold the down button, press and release the select button, and then
release the down button. The display switches into edit mode. Edit
WWNN is displayed on line 1. Line 2 of the display
contains the last five numbers of the WWNN.
- Change the WWNN that is displayed to match the
last five numbers of the WWNN that you recorded in step 3. To edit the highlighted number, use the up and down buttons
to increase or decrease the numbers. The numbers wrap F to 0 or 0
to F. Use the left and right buttons to move between the numbers.
- When the five numbers match the last five numbers
of the WWNN that you recorded in step 3, press the
select button to accept the numbers.
- Change the WWNN of the spare node (with SAN Volume Controller versions
prior to V4.3 installed) to match the WWNN of the faulty node
by performing the following steps:
- Press and release the right button until the
Status: panel is displayed.
- With the node status displayed on the front panel,
press and hold the down button; press and release the select button;
release the down button. WWNN is displayed
on line 1 of the display. Line 2 of the display contains the last
five numbers of the WWNN.
- With the WWNN displayed on the front panel; press
and hold the down button; press and release the select button; release
the down button. The display switches into edit mode.
- Change the WWNN that is displayed to match the
last five numbers of the WWNN that you recorded in step 3. To edit the highlighted number, use the up and down buttons
to increase or decrease the numbers. The numbers wrap F to 0 or 0
to F. Use the left and right buttons to move between the numbers.
- When the five numbers match the last five numbers
of the WWNN that you recorded in step 3, press the
select button to accept the numbers.
- Press
the select button to retain the numbers that you have updated and
return to the WWNN panel.
- Connect the four fibre-channel cables that you disconnected
from the faulty node and connect them to the spare node.
If
the spare node has less Ethernet cables connected than the faulty
node, move the Ethernet cables from the faulty node to the spare node.
Ensure you connect the cable into the same port on the spare node
as it was in on the faulty node.
- Use the SAN Volume Controller Console to
add the spare node to the cluster. If possible, use the same node
name that was used for the faulty node. If necessary, the spare node
is updated to the same SAN Volume Controller version
as the cluster. This update can take up to 20 minutes.
- Use the tools that are provided with your multipathing
device driver on the host systems to verify that all paths are now
online. See the documentation that is provided with your multipathing
device driver for more information. For example, if
you are using the subsystem device driver (SDD), see the IBM® System Storage® Multipath
Subsystem Device Driver User's
Guide for
instructions on how to use the SDD management tool on host systems. It
might take up to 30 minutes for the paths to come online.
- Repair the faulty node.
Attention: When
the faulty node is repaired, do not connect the fibre-channel cables
to it. Connecting the cables might cause data corruption because the
spare node is using the same WWNN as the faulty node.
If
you want to use the repaired node as a spare node, perform the following
steps.
For SAN Volume Controller V4.3
and above:
- With the Node WWNN: panel displayed, press and
hold the down button, press and release the select button, and then
release the down button.
The display switches into edit mode. Edit
WWNN is displayed on line 1. Line 2 of the display
contains the last five numbers of the WWNN.
- Change the displayed number to 00000. To edit the highlighted number, use the up and down buttons
to increase or decrease the numbers. The numbers wrap F to 0 or 0
to F. Use the left and right buttons to move between the numbers.
- Press the
select button to accept the numbers.
This node can now be used as a spare node.
For SAN Volume Controller versions
prior to V4.3:
- Press and release the right button until the
Status: panel is displayed.
- With the node status displayed on the front panel,
press and hold the down button; press and release the select button;
release the down button. WWNN is displayed
on line 1 of the display. Line 2 of the display contains the last
five numbers of the WWNN.
- With the WWNN displayed on the front panel; press
and hold the down button; press and release the select button; release
the down button. The display switches into edit mode.
- Change the displayed number to 00000. To edit the highlighted number, use the up and down buttons
to increase or decrease the numbers. The numbers wrap F to 0 or 0
to F. Use the left and right buttons to move between the numbers.
- Press the
select button to accept the numbers.
- Press
the select button to retain the numbers that you have updated and
return to the WWNN panel.
This
node can now be used as a spare node.