CIC Failed
Cloud Execution Environment

Contents

1Introduction
1.1Alarm Description
1.2Prerequisites

2

Procedure
2.1Actions

3

Additional Information

1   Introduction

This instruction concerns alarm handling.

1.1   Alarm Description

The alarm is issued by the Managed Object (MO) CIC.

The alarm is issued when the periodic supervision algorithm detects that one of the three virtual Cloud Infrastructure Controllers (vCICs) has failed the availability test three consecutive times, and, after that, remains unavailable for more than five minutes.

The severity of the alarm is MAJOR or CLEARED.

The possible alarm causes and fault locations are explained in Table 1.

Table 1    Alarm Causes

Alarm
Cause

Description

Fault Reason

Fault
Location

Impact

One of the three vCICs is not available.

The vCIC has failed the availability test three times, and, after that, remains unavailable for more than five minutes.


Or the periodic monitoring task has identified that the vCIC has entered maintenance mode.

vCIC malfunction

vCIC

One of the three vCICs becomes permanently unavailable or remains in maintenance mode.

The following is the consequence for the node if the alarm is not solved:

The alarm attributes are listed in Table 2.

Table 2    Alarm Attributes

Attribute Name

Attribute Value

Major Type

193

Minor Type

2031672

Managed Object Class

CIC

Managed Object Instance

Region=<name_of_the_region>,
CeeFunction=1,
CtrlDomain=1,
CIC=<name_of_the_vcic_host>

Specific Problem

CIC Failed

Event Type

other (1)

Probable Cause

lossOfRedundancy (77)

Additional Text

N/A(1)
or
maintenance mode(2)

Severity

MAJOR (4) or CLEARED

(1)   If the vCIC has failed the availability test three times, and after that, remains unavailable for more than five minutes.

(2)   If the periodic monitoring task has identified that the vCIC has entered maintenance mode.


1.2   Prerequisites

This section provides information on the documents, tools, and conditions that apply to the procedure.

1.2.1   Documents

Not applicable.

1.2.2   Tools

No tools are required.

1.2.3   Conditions

Before starting this procedure, ensure that the following conditions are met:

2   Procedure

This section describes the procedure to follow when this alarm is received.

2.1   Actions

  1. Try to access the failed node by SSH:

    ssh <user_id>@<vcic_ip_address>

    The hostname of the vCIC is displayed as the following:

    cic-1, cic-2, or cic-3

    If the personal user ID does not work, use the ceeadm user ID.

    The following scenarios are possible:

    • The node can be accessed. In this case, proceed to Step 2, and carry out all subsequent steps.
    • Or the node cannot be accessed. In this case, proceed to Step 4, and carry out all subsequent steps.
  2. Check the Additional Text attribute of the alarm. If the value is maintenance mode, reboot the vCIC with the following command:

    sudo umm off reboot

    For a regular reboot, issue the following command:

    sudo reboot -f

    After a successful reboot proceed to Step 3.

    Note:  
    The user must have sudo privileges to be able to run this command.

  3. Wait at least five minutes to see whether the alarm has ceased.

    If the reboot solved the problem and the alarm is ceased, exit this procedure.

    If the reboot did not solve the problem, collect troubleshooting data as described in the Data Collection Guideline, and contact the next level of maintenance support.

  4. Check whether the compute node on which the vCIC is running is available.

    To check where the vCIC is running, execute the following command on vFuel:

    fuel node | grep virt

    Access the compute by SSH:

    ssh <user_id>@<compute_node_where_node_is_running>

    If the personal user ID does not work, use the ceeadm user ID.

    The following scenarios are possible:

    • The compute can be accessed. In this case, proceed to Step 6, and carry out all subsequent steps.
    • Or the compute cannot be accessed. In this case, proceed to Step 5, and carry out all subsequent steps.
  5. Check for an active Compute Host Failed alarm for the compute node where the failed vCIC is running. If there is an active Compute Host Failed alarm for the compute node, proceed to solve it first, before continuing with the next steps. If there is no active Compute Host Failed alarm, reboot the failed compute node by using corresponding out-of-band management.

    If the compute can be accessed by SSH after solving the Compute Host Failed alarm or after the manual reboot, continue with Step 6. If the compute cannot be accessed after rebooting using out-of-band management, replace the server as described in Server Replacement.

  6. Wait a few minutes, and check if the CIC Failed alarm has ceased.

    The following scenarios are possible:

    • The CIC Failed alarm has ceased. If the alarm ceased, exit this procedure.
    • Or the alarm is still active. In this case, proceed to Step 7.
    Note:  
    Recovery of the vCIC is not managed by Nova and is outside the scope of CM-HA.

  7. Collect troubleshooting data as described in the Data Collection Guideline, and contact the next level of maintenance support.
  8. The job is completed.

3   Additional Information

The alarm is ceased for a vCIC when the vCIC passes the availability test.