Module Failure
Cloud Execution Environment

Contents

1Introduction
1.1Alarm Description
1.2Prerequisites

2

Procedure
2.1Actions

Reference List

1   Introduction

This instruction concerns alarm handling.

1.1   Alarm Description

The Module Failure alarm is issued by the Managed Object (MO) Node. The Cloud SDN Controller (CSC) generates the alarm when it detects that a sub-module of CSC has failed in any of the vCICs.

The severity of the alarm is CRITICAL.

This alarm is automatically cleared once the problem is solved.

Possible alarm causes and fault locations are explained in Table 1.

Table 1    Alarm Causes

Alarm
Cause

Description

Fault
Reason

Fault
Location

Impact

Sub-module of CSC failed.

One of the sub-modules of CSC is in ERROR state in one of the vCICs.

One of the following CSC sub-modules is in ERROR state:


  • INTERFACE_SERVICE

  • OPENFLOW

  • ITM

  • DATASTORE_SERVICE

  • SCF_SERVICE

  • ELAN_SERVICE

vCIC

Services of the sub-module are
unavailable; this affects the overall load
supported
by the cluster.

The following is the consequence for the node if the alarm is not solved:

The alarm attributes are listed in Table 2.

Table 2    Alarm Attributes

Attribute Name

Attribute Value

Major Type

193

Minor Type

2162699

Managed Object Class

Node

Managed Object Instance

Region=<name_of_the_region>,
Service=SDNc,
Alarm=ReducedNodalAvailability,
Node=<vcic_name>,
Sub-Service=CSC,
Sub-Module=<module_name>

Specific Problem

Module Failure

Event Type

communicationsAlarm

Probable Cause

302

Additional Text

<module_name> is in ERROR state

Severity

CRITICAL

1.2   Prerequisites

This section provides information on the documents, tools, and conditions that apply to the procedure.

1.2.1   Documents

For more information on CSC alarms, refer to the SDN document Alarms, Reference [1].

1.2.2   Tools

No tools are required.

1.2.3   Conditions

Before starting this procedure, ensure that access to the vCICs is available.

2   Procedure

This section describes the procedure to follow when this alarm is received.

2.1   Actions

Do the following:

  1. Log on to the vCIC, following the steps describing CSC login in the SDN document Using the CLI, Reference [2].
  2. Check the service status by executing the command display app-status as described in the SDN document CSC Application Command List, Reference [3].
  3. If any service is in ERROR state, identify the faulty vCIC using the Node IP Address displayed.
  4. Log out of the vCIC.
  5. Restart the ODL service on the faulty vCIC, following the below steps:
    1. Log on to the faulty vCIC identified in Step 3.
    2. Identify the process ID of the running ODL service:

      ps -eaf | grep karaf

    3. Stop the process:

      kill -9 <process_id>

  6. Wait for three minutes. Check service status by performing Step 1 and Step 2 again.

    If the service is operational and the alarm ceases, exit this procedure. Else, continue with Step 7.

  7. Collect troubleshooting data as described in the Data Collection Guideline.
  8. Contact the next level of maintenance support.

    Further actions are outside the scope of this instruction.

  9. The job is completed.

Reference List

[1] Alarms, 1/198 22-AXD 101 08/6-V1
[2] Using the CLI, 1/190 80-AXD 101 08/6-V1
[3] CSC Application Command List, 2/190 77-AXD 101 08/6-V1