1 Alarm Description
The alarm is raised when one or more Ethernet interfaces belonging to a bonded interface fail.
|
Alarm Cause |
Description |
Fault Reason |
Fault Location |
Impact |
|---|---|---|---|---|
|
Failed Ethernet interface on bond0 |
The physical link state of one or both Ethernet interfaces is down |
Faulty physical Ethernet interface |
Physical Ethernet interface |
If one Ethernet interface is down, there is a loss in resilience. If both Ethernet interfaces are down, then internal cluster services such as booting and logging are affected. |
|
Faulty external device (that is, Ethernet switch) |
External device (that is, Ethernet switch) | |||
|
Failed Ethernet interface on bond1 |
The physical link state of one or both Ethernet interfaces is down |
Faulty physical Ethernet interface |
Physical Ethernet interface |
If one Ethernet interface is down, there is a loss in resilience. If both Ethernet interfaces are down, then external network traffic is down. |
|
Faulty external device (that is, Ethernet switch) |
External device (that is, Ethernet switch) | |||
- Note:
- This alarm can appear as a result of a maintenance activity.
2 Procedure
2.1 Handle Alarm LOTC Ethernet Bonding
Prerequisites
- This instruction references the following document:
- No tools are required.
- The following conditions must apply:
- The alarm is raised.
- It is known how to map the HostName (part of alarm attribute Source), to the physical address (slot) in the Blade Server Platform (BSP).
- An Ericsson Command-Line Interface (ECLI) session in Exec mode is in progress.
Steps
- Is the alarm present on multiple blades, that is, are
there at least two LOTC Ethernet Bonding alarms with different values
of alarm attribute Source?
Yes: Proceed with Step 8.
No: Continue with the next step.
- Is the alarm severity critical?
Yes: Continue with the next step.
No: Proceed with Step 11.
- Log on to the BSP to access
a Linux shell, for example:
ssh <user>@<hostname> -p 7022
The hostname is part of alarm attribute Source.
- Collect the kernel messages:
dmesg > $(hostname)-$(date +%d-%m-%y_%H-%M-%S).dmesg.log
If there is no network connectivity to a blade, gather kernel information using a serial console using copy and paste.
The following is an example output:
[ 0.000000] Initializing cgroup subsys cpuset [ 0.000000] Initializing cgroup subsys cpu [ 0.000000] Linux version 3.0.80-0.7.1.5895.4.PTF-default (geeko@buildhost) (gcc version ⇒ 4.3.4 [gcc-4_3-branch revision 152973] (SUSE Linux) ) #1 SMP Mon Aug 26 13:53:40 UTC 2013 () [ 0.000000] Command line: panic=10 console=tty0 cluster=(type=control,disk_cache=0,⇒ clean_rootfs=0) [ 0.000000] BIOS-provided physical RAM map: [ 0.000000] BIOS-e820: 0000000000000000 - 000000000009fc00 (usable) [ 0.000000] BIOS-e820: 000000000009fc00 - 00000000000a0000 (reserved) [ 0.000000] BIOS-e820: 00000000000f0000 - 0000000000100000 (reserved) [ 0.000000] BIOS-e820: 0000000000100000 - 000000003fff0000 (usable) [ 0.000000] BIOS-e820: 000000003fff0000 - 0000000040000000 (ACPI data) [ 0.000000] BIOS-e820: 00000000fffc0000 - 0000000100000000 (reserved) [...]
- Using the ECLI on the BSP, reset the blade:
>ManagedElement=<hw_ME_name>,Equipment=1,Shelf=<shelf>,Slot=<slot>,Blade=1,reset --resetType HARD --gracefulReset FALSE
For example, when the issue is on shelf 2 and slot 3:
>ManagedElement=BSP04ST,Equipment=1,Shelf=2,Slot=3,Blade=1,reset --resetType HARD --gracefulReset FALSE
- Wait 5 minutes.
- Is the alarm cleared?
Yes: Proceed with Step 13.
No: Proceed with Step 11.
- Are the alarms reporting from
the same bond from all blades, for example, bond0 or bond1?
Yes: With high probability, the fault is in the BSP. Continue with the next step.
No: Proceed with Step 11.
- Is the BSP switch alive and functioning
properly?
Yes: Proceed with Step 11.
No: Troubleshoot the BSP switch.
- Once the switch problem is solved, is the alarm cleared?
Yes: Proceed with Step 13.
No: Continue with the next step.
- Perform data collection,
refer to Data Collection Guideline.
- Note:
- The status of the bounded interfaces must be collected.
- Consult next level of support. Further actions are outside the scope of this instruction.
- Job is completed.

Contents