| 1 | Maintenance Activities due to Faulty Blade Introduction |
| 1.1 | Prerequisites |
2 | Maintenance Activities due to Faulty Blade Procedure |
| 2.1 | System Controller Blades |
| 2.2 | Payload Blades |
1 Maintenance Activities due to FaultyBlade Introduction
This instruction explains the steps to fulfill the blade replacement after a faulty blade has been detected in a blade system.
1.1 Prerequisites
This section provides the prerequisites, which must be addressed before using the procedure.
Conditions
The following conditions must apply:
- The operator must be familiar with SAPC PNF Deployment Instruction.
- The troubleshooting that has detected the problematic blade or blades has finished. Troubleshooting the blade and detecting the problem is not part of the scope of this document.
- The SAPC blade system is accessible. The OAM virtual IP address (VIP_OAM) is known.
- Access to the installation SLES software must be provided.
2 Maintenance Activities due to Faulty Blade Procedure
There are two different scenarios depending on the blade to replace.
2.1 System Controller Blades
System controller blades are the only blades virtualized.
2.1.1 Lock CBA Node
- SC-x:~ # cmw-node-lock SC-x
Further information in SAPC Troubleshooting Guide.
2.1.2 Stop DHCP Services
- Stop the DHCP service in both SC.
SC-1:~ # systemctl stop dhcpd.service
- Repeat for the other SC
SC-2:~ # systemctl stop dhcpd.service
The SAPC cluster is now ready to procedure with the Blade replacement.
2.1.3 Blade Hardware Replacement
- Shut down the problematic blade, if it is still running despite the malfunction.
- Disconnect all interfaces and power switches from the blade. Remove the blade from the blade system.
- Insert a new blade replacing the one removed.
- Connect all interfaces and power switch on the new blade. This blade is accessible from the ILOM interface in this point.
2.1.4 Host Operating System Installation and Configuration
Follow the SAPC PNF Deployment Instruction to install the SLES12 Operating System and the updates needed. Once the updates have been applied, copy files from the other System Controller. In the following example, the SC-1 is considered the faulty blade.
- Access the SC-2 host machine. Check that
you have access to Host_1 from there to copy the files.
InstallationServer:# ssh root@Host_2
Host_2:# ssh root@Host_1
Host_1:# exit
- Copy the files. If the destination directories do not
exist, create them before.
Host_2:# scp /mnt/images/adapt_cluster.cfg root@Host_1:/mnt/images/
Host_2:# scp /mnt/images/adapt_cluster.iso root@Host_1:/mnt/images/
Host_2:# scp /mnt/images/reboot.img root@Host_1:/mnt/images/
Host_2:# scp -r /mnt/store/SAPC/host-config/ root@Host_1:/mnt/store/SAPC/
- Define and boot the Virtual Machine.
Host_2:# ssh root@Host_1
Host_1:# virsh define /mnt/store/SAPC/host-config/VM/vms/sc01.xml
Host_1:# qemu-img create -f qcow2 /mnt/images/originalImage/sapc_sc-1_cxp9030138.qcow2 100G
Host_1:# cat /mnt/store/SAPC/host-config/VM/vms/sc01.xml | grep "<name>"
<name>SC-1.Host_1</name>
Host_1:# virsh start SC-1.Host_1 --console
- Wait for the SC-1 to synchronize.
Host_1:# ssh root@192.168.100.126
SC-1:# drbd-overview
The output must have the following line Connected Primary/Secondary UpToDate/UpToDate like in the example:
0:drbd0/0 Connected Primary/Secondary UpToDate/UpToDate C r----- lvm-pv: lde-cluster-vg 41.87g 23.09g
2.1.5 Start DHCP Service
- Start the DHCP service in both SC.
SC-1:~ # systemctl start dhcpd.service
- Repeat for the other SC
SC-2:~ # systemctl start dhcpd.service
2.1.6 Unlock CBA Node
- SC-x:~ # cmw-node-unlock SC-x
Further information in SAPC Troubleshooting Guide.
2.2 Payload Blades
2.2.1 Stop SAPC Components
If the faulty blade is powered off, skip this task. In case it is running, stop all processes.
- SC-x:~ # sapcPcrfProc status PL-x
- If the payload is running, then execute the following.
SC-x:~ # sapcPcrfProc stop PL-x
The PL-x is the blade that is going to be replaced.
2.2.2 Lock CBA Node
- SC-x:~ # cmw-node-lock PL-x
Further information in SAPC Troubleshooting Guide.
2.2.3 Blade Hardware Replacement
- Shut down the problematic blade, if it is still running despite the malfunction.
- Disconnect all interfaces and power switches from the blade. Remove the blade from the blade system.
- Insert a new blade replacing the one removed.
- Connect all interfaces but do not power on the blade. This blade is accessible from the ILOM interface in this point.
2.2.4 Prepare The Blade Before Power On
Depending on the payload number, this task changes.
PL-3 PL-4
- PL-3 and PL-4 are fixed traffic processors, so add the
MAC addresses of the new blade to the cluster.conf file.
To obtain the MAC addresses, create the PL_interfaces file as it is described in the SAPC PNF Deployment Instruction. Use the values
of that file to edit the /cluster/etc/cluster.conf file
and reload the values.
SC-1:# vi /cluster/etc/cluster.conf
# PL-x interface x eth0 ethernet 74:c9:9a:4f:65:44 interface x eth1 ethernet 74:c9:9a:4f:65:45 interface x eth2 ethernet 74:c9:9a:4f:65:40 interface x eth3 ethernet 74:c9:9a:4f:65:41
SC-1:# cluster config -r -a
PL-5 Onwards
- Scale in the payload because it was scaled out during
the deployment of the SAPC.
SC-1:# sapcScaleIn <PL-X>
2.2.5 Power On The Blade
Now it is time to power on the blade.
- Follow the SAPC PNF Scale Out procedure.
2.2.6 Unlock CBA Node
- SC-x:~ # cmw-node-unlock PL-x
Further information in SAPC Troubleshooting Guide.

Contents