1 Introduction
This instruction concerns alarm handling.
1.1 Alarm Description
The Fuel Failed alarm is a primary alarm. The alarm is issued by the Managed Object (MO) Fuel.
The alarm is issued when the periodic supervision algorithm detects that the vFuel node has failed the availability test three consecutive times, and, following that, remains unavailable for at least five minutes.
The severity of the alarm is MAJOR or CLEARED.
The possible alarm causes and fault locations are explained in Table 1.
|
Alarm |
Description |
Fault |
Fault |
Impact |
|---|---|---|---|---|
|
The vFuel node has failed the availability test three consecutive times, and, after that, remains unavailable for more than five minutes. |
The following is the consequence for the node if the alarm is not solved:
The alarm attributes are listed in Table 2.
|
Attribute Name |
Attribute Value |
|---|---|
|
Major Type |
193 |
|
Minor Type |
2031706 |
|
Managed Object Class |
Fuel |
|
Managed Object Instance |
Region=<name_of_the_region>, |
|
Specific Problem |
Fuel Failed |
|
Event Type |
other (1) |
|
Probable Cause |
m3100Unavailable(14) |
|
Additional Text |
N/A |
|
Severity |
MAJOR (4) or CLEARED |
1.2 Prerequisites
This section provides information on the documents, tools, and conditions that apply to the procedure.
1.2.1 Documents
Not applicable.
1.2.2 Tools
Before starting this procedure, ensure that the following tools are available:
- A console cable and a Linux computer with SSH capability for establishing console connection to the affected server as required by the documentation provided by the manufacturer of the hardware
1.2.3 Conditions
Before starting this procedure, ensure that the following conditions are met:
- You have access to the information about how to establish console connection to the used hardware.
- The password for the root user in vFuel is known.
2 Procedure
This section describes the procedure to follow when this alarm is received.
2.1 Actions
Perform the following steps:
- Try to access the failed node by SSH:
ssh <user_ID>@<name_of_the_fuel_IP_address>
If the personal user ID does not work, use the ceeadm user ID.
The following scenarios are possible:
- Issue the following command to reboot:
sudo reboot -f
After a successful reboot proceed to Step 3.
- Note:
- The user must have sudo privileges to be able to run this command.
- Wait at least five minutes
to see whether the alarm has ceased.
If the reboot solved the problem and the alarm is ceased, exit this procedure.
If the reboot did not solve the problem, collect troubleshooting data as described in the Data Collection Guideline and contact the next level of maintenance support.
- Check whether the
compute, where the failed node is running as a virtual machine (VM),
is available.
Access the compute by SSH:
ssh <userID>@<compute_node_where_node_is_running>
If the personal user ID does not work, use the ceeadm user ID.
The following scenarios are possible:
- Check for an active Compute Host Failed alarm for the compute node where
the failed vFuel is running. If there is an active Compute Host Failed alarm for the compute node, proceed to
solve it first, before continuing with the next steps. If there is
no active Compute Host Failed alarm, reboot
the failed compute node by using corresponding out-of-band management.
If the compute can be accessed by SSH after solving the Compute Host Failed alarm or after the manual reboot, proceed to Step 6. If the compute cannot be accessed after rebooting using out-of-band management, replace the server as described in Server Replacement.
- Check the state of the VM on
the compute with the following command:
virsh list --all
The name of the VM must be fuel_master.
Two scenarios are possible:
- Start the VM with the following command:
virsh start <name_of_the_VM_displayed_by_virsh_list>
Two scenarios are possible:
- Perform a manual failover
on the cold standby Fuel VM. For more information, refer to section Failover to the Cold Standby Fuel VM (Recover from Failure) in Fuel Synchronization.
Two scenarios are possible:
- If the alarm is ceased, exit this procedure.
- If the alarm is not ceased, collect troubleshooting data as described in the Data Collection Guideline and contact the next level of maintenance support.
- Check the networking of the
failed VM by checking the traffic through the virtual switch.
Start pinging the failed node from a vCIC which is still alive.
While sending ping requests, run the following command on the compute multiple times, where the failed node is running and check if the RX/TX counters are increasing:
ovs-appctl dpctl/show -s system@ovs-system | grep vfm_eth0 -A4
If the counters are increasing, collect troubleshooting data as described in the Data Collection Guideline and contact the next level of maintenance support. If the counters are not increasing, proceed to Step 10.
- Restart the virtual switch
on the compute by issuing the following command:
service openvswitch-switch restart
Two scenarios are possible:
- If the alarm is ceased, exit this procedure.
- If the alarm is not ceased, collect troubleshooting data as described in the Data Collection Guideline and contact the next level of maintenance support.
- The job is completed.
3 Additional Information
The alarm is ceased when the vFuel node has passed the availability test.

Contents