LOTC Disk Replication Communication

Contents

1Introduction
1.1Alarm Description
1.2Prerequisites

2

Procedure

1   Introduction

This instruction concerns alarm handling.

1.1   Alarm Description

The alarm is raised when the control nodes have lost connection to each other for more than 20 minutes, and are no longer in redundant mode. The control node pair is in a non-redundant mode when the control nodes have no connection with each other.

The possible alarm causes and fault locations are explained in Table 1.

Table 1    Alarm Causes

Alarm Cause

Description

Fault Reason

Fault Location

Impact

Loss of connection between control nodes for more than 20 minutes

The control nodes have lost connection to each other for more than 20 minutes. The Linux® service Distributed Replicated Block Device (DRBD) is not in connected mode.

Network failure leading to communication problems between the control nodes

Network

Both controllers take the primary role and no data is transferred between the nodes

Hardware failure on the secondary control node

Secondary control node

If one of the controller nodes is down, the cluster does not have a controller node to which it can fail over

Note:  
This alarm can appear as a result of a maintenance activity.

The alarm attributes are listed and explained in Table 2.

Table 2    Alarm Attributes

Attribute Name

Attribute Value

Major Type

193

Minor Type

3341942788

Source

ManagedElement=<node_name>,HostName=<hostname>,ERIC-LINUX_CONTROL-*

Specific Problem

LOTC Disk Replication Communication

Event Type

environmentalAlarm (6)

Probable Cause

x736UnspecifiedReason (418)

Additional Text

One of the following:


  • Disk not replicated for <value> minutes

  • Status unknown

Perceived Severity

major (4)

1.2   Prerequisites

This section provides information on the documents, tools, and conditions that apply to the procedure.

1.2.1   Documents

This instruction references the following documents:

1.2.2   Tools

No tools are required.

1.2.3   Conditions

Before starting this procedure, ensure that the following condition is met:

2   Procedure

Do the following:

  1. Log on to the host to access a Linux shell:

    ssh <user>@<hostname> -p 22

    The hostname is part of alarm attribute Source.

  2. Is the alarm raised during initial installation or replacement of a control node?

    Yes: Continue with the next step.

    No: Proceed with Step 5.

  3. Wait for DRDB connection to be established. Check if the following command results in output cs:Connected:

    cat /proc/drdb

    The following is an example output in a normal situation. The connection state (cs) is Connected. The alarm is cleared within 5 seconds.

    version: 8.4.2 (api:1/proto:86-101)
    GIT-hash: 7ad5f850d711223713d6dcadc3dd48860321070c build by root@lixia, 2012-09-19 16:40:30
     0: cs:Connected ro:Primary/Secondary ds:UpToDate/UpToDate C r-----
        ns:438816 nr:0 dw:372 dr:440669 al:11 bm:40 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f oo

    The following is an example output in a faulty situation. The connection state (cs) is WFConnection (Waiting For Connection).

    version: 8.4.2 (api:1/proto:86-101)
    GIT-hash: 7ad5f850d711223713d6dcadc3dd48860321070c build by root@lixia, 2012-09-19 16:40:30
     0: cs:WFConnection ro:Primary/Unknown ds:UpToDate/DUnknown C r-----
        ns:143396 nr:0 dw:448 dr:147057 al:17 bm:28 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f oos:84
  4. Does the output contain cs:Connected and is the alarm cleared?

    Yes: Proceed with Step 9.

    No: Perform data collection, refer to Data Collection Guideline. Contact the deployment organization. Proceed with Step 9.

  5. Check the active alarm list.

    For information on how to check the active alarm list, refer to Check Alarm Status.

  6. Is the LOTC Ethernet Bonding alarm raised?

    Yes: Clear the LOTC Ethernet Bonding alarm, refer to LOTC Ethernet Bonding. Further actions are outside the scope of this instruction. Proceed with Step 9.

    No: Continue with the next step.

  7. Perform data collection, refer to Data Collection Guideline.
  8. Consult the next level of maintenance support. Further actions are outside the scope of this instruction.
  9. Job is completed.


Copyright

© Ericsson AB 2014, 2015. All rights reserved. No part of this document may be reproduced in any form without the written permission of the copyright owner.

Disclaimer

The contents of this document are subject to revision without notice due to continued progress in methodology, design and manufacturing. Ericsson shall have no liability for any error or damage of any kind resulting from the use of this document.

Trademark List
All trademarks mentioned herein are the property of their respective owners. These are shown in the document Trademark Information.

    LOTC Disk Replication Communication