LOTC Ethernet Bonding

Contents

1Introduction
1.1Alarm Description
1.2Prerequisites

2

Procedure

1   Introduction

This instruction concerns alarm handling.

1.1   Alarm Description

The alarm is raised when one or more Ethernet interfaces belonging to a bonded interface fail.

The possible alarm causes and fault locations are explained in Table 1.

Table 1    Alarm Causes

Alarm Cause

Description

Fault Reason

Fault Location

Impact

Failed Ethernet interface on bond0

The physical link state of one or both Ethernet interfaces is down

Faulty physical Ethernet interface

Physical Ethernet interface

If one Ethernet interface is down, there is a loss in resilience. If both Ethernet interfaces are down, then internal cluster services such as booting and logging are affected.

Faulty external device (that is, Ethernet switch)

External device (that is, Ethernet switch)

Failed Ethernet interface on bond1

The physical link state of one or both Ethernet interfaces is down

Faulty physical Ethernet interface

Physical Ethernet interface

If one Ethernet interface is down, there is a loss in resilience. If both Ethernet interfaces are down, then external network traffic is down.

 
 

Faulty external device (that is, Ethernet switch)

External device (that is, Ethernet switch)

 
Note:  
This alarm can appear as a result of a maintenance activity.

The alarm attributes are listed and explained in Table 2.

Table 2    Alarm Attributes

Attribute Name

Attribute Value

Major Type

193

Minor Type

3341942786

Source

One of the following:


  • ManagedElement=<node_name>,HostName=<hostname>,ERIC-LINUX_CONTROL-*

  • ManagedElement=<node_name>,HostName=<hostname>,ERIC-LINUX_PAYLOAD-*

Specific Problem

LOTC Ethernet Bonding

Event Type

environmentalAlarm (6)

Probable Cause

x736UnspecifiedReason (418)

Additional Text

Bonding failure on <bond> (links down on <slave> and <slave>)

One of the following:


  • Bonding degraded on <bond> (link down on <slave>)

  • Bonding degraded on <bond> (primary link on <slave> available but not active)

Perceived Severity

critical (3): both of the bonded interfaces have failed

major (4): one of the bonded interfaces has failed

1.2   Prerequisites

This section provides information on the documents, tools, and conditions that apply to the procedure.

1.2.1   Documents

This instruction references the following documents:

1.2.2   Tools

No tools are required.

1.2.3   Conditions

Before starting this procedure, ensure that the following conditions are met:

2   Procedure

Do the following:

  1. Check the active alarm list.

    For information how to check the active alarm list, refer to Check Alarm Status.

  2. Is the alarm present on multiple blades, that is, are there at least two LOTC Ethernet Bonding alarms with different values of alarm attribute Source?

    Yes: Proceed with Step 9.

    No: Continue with the next step.

  3. Is the alarm severity Critical?

    Yes: Continue with the next step.

    No: Proceed with Step 12.

  4. Log on to the BSP to access a Linux shell, for example:

    ssh <user>@<hostname> -p 7022

    The hostname is part of alarm attribute Source.

  5. Collect the kernel messages:

    dmesg > $(hostname)-$(date +%d-%m-%y_%H-%M-%S).dmesg.log

    If there is no network connectivity to a blade, then gather kernel information using a serial console using copy/paste.

    The following is an example output:

    [    0.000000] Initializing cgroup subsys cpuset
    [    0.000000] Initializing cgroup subsys cpu
    [    0.000000] Linux version 3.0.80-0.7.1.5895.4.PTF-default (geeko@buildhost) (gcc version ⇒
    4.3.4 [gcc-4_3-branch revision 152973] (SUSE Linux) ) #1 SMP Mon Aug 26 13:53:40 UTC 2013 ()
    [    0.000000] Command line: panic=10 console=tty0 cluster=(type=control,disk_cache=0,⇒
    clean_rootfs=0)
    [    0.000000] BIOS-provided physical RAM map:
    [    0.000000]  BIOS-e820: 0000000000000000 - 000000000009fc00 (usable)
    [    0.000000]  BIOS-e820: 000000000009fc00 - 00000000000a0000 (reserved)
    [    0.000000]  BIOS-e820: 00000000000f0000 - 0000000000100000 (reserved)
    [    0.000000]  BIOS-e820: 0000000000100000 - 000000003fff0000 (usable)
    [    0.000000]  BIOS-e820: 000000003fff0000 - 0000000040000000 (ACPI data)
    [    0.000000]  BIOS-e820: 00000000fffc0000 - 0000000100000000 (reserved)
    [...]
  6. Using the ECLI on the BSP, reset the blade:

    >ManagedElement=<hw_ME_name>,Equipment=1,Shelf=<shelf>,Slot=<slot>,Blade=1,reset --resetType HARD --gracefulReset FALSE

    For example, when the issue is on shelf 2 and slot 3:

    >ManagedElement=BSP04ST,Equipment=1,Shelf=2,Slot=3,Blade=1,reset --resetType HARD --gracefulReset FALSE

  7. Wait 5 minutes.
  8. Is the alarm cleared?

    Yes: Proceed with Step 14.

    No: Proceed with Step 12.

  9. Are the alarms reporting from the same bond from all blades, for example, bond0 or bond1?

    Yes: With high probability, the fault is in the BSP. Continue with the next step.

    No: Proceed with Step 12.

  10. Is the BSP switch alive and functioning properly?

    Yes: Proceed with Step 12.

    No: Troubleshoot the BSP switch.

  11. Once the switch problem is solved, is the alarm cleared?

    Yes: Proceed with Step 14.

    No: Continue with the next step.

  12. Perform data collection, refer to Data Collection Guideline.
    Note:  
    The status of the bounded interfaces must be collected.

  13. Consult next level of support. Further actions are outside the scope of this instruction.
  14. Job is completed.


Copyright

© Ericsson AB 2014–2016. All rights reserved. No part of this document may be reproduced in any form without the written permission of the copyright owner.

Disclaimer

The contents of this document are subject to revision without notice due to continued progress in methodology, design and manufacturing. Ericsson shall have no liability for any error or damage of any kind resulting from the use of this document.

Trademark List
All trademarks mentioned herein are the property of their respective owners. These are shown in the document Trademark Information.

    LOTC Ethernet Bonding