1 Introduction
This instruction concerns alarm handling.
1.1 Alarm Description
The alarm is issued by the Managed Object (MO) CIC.
The alarm is issued when the periodic supervision algorithm detects that one of the three virtual Cloud Infrastructure Controllers (vCICs) has failed the availability test three consecutive times, and, after that, remains unavailable for more than five minutes.
The severity of the alarm is MAJOR or CLEARED.
The possible alarm causes and fault locations are explained in Table 1.
|
Alarm |
Description |
Fault Reason |
Fault |
Impact |
|---|---|---|---|---|
|
One of the three vCICs is not available. |
The vCIC has failed the availability test three times, and, after that, remains unavailable for more than five minutes. Or the periodic monitoring task has identified that the vCIC has entered maintenance mode. |
vCIC malfunction |
One of the three vCICs becomes permanently unavailable or remains in maintenance mode. |
The following is the consequence for the node if the alarm is not solved:
- One of the three vCICs remains unavailable or in maintenance mode. The other two vCICs are operational with no redundancy.
- If a second vCIC becomes unavailable, the cluster size falls below the quorum value, and the entire cluster fails. As a result, the third vCIC is not operational.
The alarm attributes are listed in Table 2.
|
Attribute Name |
Attribute Value |
|---|---|
|
Major Type |
193 |
|
Minor Type |
2031672 |
|
Managed Object Class |
CIC |
|
Managed Object Instance |
Region=<name_of_the_region>, |
|
Specific Problem |
CIC Failed |
|
Event Type |
other (1) |
|
Probable Cause |
lossOfRedundancy (77) |
|
Additional Text |
|
|
Severity |
MAJOR (4) or CLEARED |
(1) If the vCIC has failed
the availability test three times, and after that, remains unavailable
for more than five minutes.
(2) If the periodic monitoring task has identified
that the vCIC has entered maintenance mode.
1.2 Prerequisites
This section provides information on the documents, tools, and conditions that apply to the procedure.
1.2.1 Documents
Not applicable.
1.2.2 Tools
No tools are required.
1.2.3 Conditions
Before starting this procedure, ensure that the following conditions are met:
- The configuration of the three vCICs is known, so that it is clear on which hosts the vCICs are running.
- Information about how to connect and use the out-of-band management is available.
- sudo user privileges are available to the user.
2 Procedure
This section describes the procedure to follow when this alarm is received.
2.1 Actions
- Try to access the failed node by SSH:
ssh <user_id>@<vcic_ip_address>
The hostname of the vCIC is displayed as the following:
cic-1, cic-2, or cic-3
If the personal user ID does not work, use the ceeadm user ID.
The following scenarios are possible:
- Check the Additional Text attribute of the alarm. If the value is maintenance
mode, reboot the vCIC with the following command:
sudo umm off reboot
For a regular reboot, issue the following command:
sudo reboot -f
After a successful reboot proceed to Step 3.
- Note:
- The user must have sudo privileges to be able to run this command.
- Wait at least five minutes
to see whether the alarm has ceased.
If the reboot solved the problem and the alarm is ceased, exit this procedure.
If the reboot did not solve the problem, collect troubleshooting data as described in the Data Collection Guideline, and contact the next level of maintenance support.
- Check whether the
compute node on which the vCIC is running is available.
To check where the vCIC is running, execute the following command on vFuel:
fuel node | grep virt
Access the compute by SSH:
ssh <user_id>@<compute_node_where_node_is_running>
If the personal user ID does not work, use the ceeadm user ID.
The following scenarios are possible:
- Check for an active Compute Host Failed alarm for the compute node where
the failed vCIC is running. If there is an active Compute Host Failed alarm for the compute node, proceed to
solve it first, before continuing with the next steps. If there is
no active Compute Host Failed alarm, reboot
the failed compute node by using corresponding out-of-band management.
If the compute can be accessed by SSH after solving the Compute Host Failed alarm or after the manual reboot, continue with Step 6. If the compute cannot be accessed after rebooting using out-of-band management, replace the server as described in Server Replacement.
- Wait a few minutes, and check
if the CIC Failed alarm has ceased.
The following scenarios are possible:
- The CIC Failed alarm has ceased. If the alarm ceased, exit this procedure.
- Or the alarm is still active. In this case, proceed to Step 7.
- Collect troubleshooting data as described in the Data Collection Guideline, and contact the next level of maintenance support.
- The job is completed.
3 Additional Information
The alarm is ceased for a vCIC when the vCIC passes the availability test.

Contents