Fuel Failed
Cloud Execution Environment

Contents

1Introduction
1.1Alarm Description
1.2Prerequisites

2

Procedure
2.1Actions

3

Additional Information

1   Introduction

This instruction concerns alarm handling.

1.1   Alarm Description

The Fuel Failed alarm is issued by the Managed Object (MO) Fuel. The alarm is issued when the periodic supervision algorithm detects that the vFuel node has failed the availability test three consecutive times, and, following that, remains unavailable for at least five minutes.

The severity of the alarm is MAJOR or CLEARED.

The possible alarm causes and fault locations are explained in Table 1.

Table 1    Alarm Causes

Alarm
Cause

Description

Fault
Reason

Fault
Location

Impact

The vFuel node is not available.

The vFuel node has failed the availability test three consecutive times, and, after that, remains unavailable for more than five minutes.

vFuel node malfunction

vFuel node

The vFuel node becomes permanently unavailable.

The following is the consequence for the node if the alarm is not solved: the vFuel node remains unavailable.

The alarm attributes are listed in Table 2.

Table 2    Alarm Attributes

Attribute Name

Attribute Value

Major Type

193

Minor Type

2031706

Managed Object Class

Fuel

Managed Object Instance

Region=<name_of_the_region>,
CeeFunction=1,
CtrlDomain=1,
Fuel=<fuel_master_ip>

Specific Problem

Fuel Failed

Event Type

other (1)

Probable Cause

m3100Unavailable(14)

Additional Text

N/A

Severity

MAJOR (4) or CLEARED

1.2   Prerequisites

This section provides information on the documents, tools, and conditions that apply to the procedure.

1.2.1   Documents

Not applicable.

1.2.2   Tools

Before starting this procedure, ensure that the following tools are available for establishing console connection to the affected server, as required by the documentation provided by the hardware manufacturer:

1.2.3   Conditions

Before starting this procedure, ensure that the following conditions are met:

2   Procedure

This section describes the procedure to follow when this alarm is received.

2.1   Actions

Perform the following steps:

  1. Access the failed node by SSH:

    ssh <user_id>@<fuel_address>

    If the personal user ID does not work, use the ceeadm user ID.

    The following scenarios are possible:

    • The node can be accessed. In this case, proceed to Step 2 and carry out all subsequent steps.
    • Or the node cannot be accessed. In this case, proceed to Step 4 and carry on all subsequent steps.
  2. Issue the following command to reboot:

    sudo reboot -f

    After a successful reboot proceed to Step 3.

    Note:  
    The user must have sudo privileges to be able to run this command.

  3. Wait at least five minutes to see whether the alarm has ceased.

    If the reboot solved the problem and the alarm is ceased, exit this procedure.

    If the reboot did not solve the problem, collect troubleshooting data as described in the Data Collection Guideline and contact the next level of maintenance support.

  4. Check whether the compute, where the failed node is running as a virtual machine (VM), is available.

    Access the compute by SSH:

    ssh <user_id>@<compute_host_where_node_is_running>

    If the personal user ID does not work, use the ceeadm user ID.

    The following scenarios are possible:

    • The compute can be accessed. In this case, proceed to Step 8 and carry out all subsequent steps.
    • Or the compute cannot be accessed. In this case, proceed to Step 5 and carry out all subsequent steps.
  5. Check for an active Compute Host Failed alarm for the compute node where the failed vFuel is running.

    If there is an active Compute Host Failed alarm, continue with Step 6.

    If there is no active Compute Host Failed alarm, continue with Step 7.

  6. Solve the Compute Host Failed alarm following the instructions in Compute Host Failed.

    In case the server needs to be changed to solve the Compute Host Failed alarm, start up the passive Fuel VM as described in section Failover to the Cold Standby Fuel VM (Recover from Failure) in Fuel Synchronization.

    The following scenarios are possible:

    • If the Fuel Failed alarm is ceased, exit this procedure.
    • If the compute can be accessed by SSH, proceed to Step 8.
    • If the compute cannot be accessed, collect troubleshooting data as described in the Data Collection Guideline and contact the next level of maintenance support.
  7. If there is no active Compute Host Failed alarm, reboot the failed compute node manually by using the corresponding out-of-band management.

    The following scenarios are possible:

    • If the compute can be accessed by SSH after the reboot, proceed to Step 8.
    • If the compute cannot be accessed, collect troubleshooting data as described in the Data Collection Guideline and contact the next level of maintenance support.
  8. Check the state of the VM on the compute with the following command:

    virsh list --all

    The name of the VM must be fuel_master.

    Two scenarios are possible:

    • If the state of the VM is running, proceed to Step 11.
    • If the state of the VM is not running, proceed to Step 9.
  9. Start the VM with the following command:

    virsh start <name_of_the_vm_displayed_by_virsh_list>

    Two scenarios are possible:

    • If the Fuel VM is not in running state after a few minutes, proceed to Step 10.
    • If the state of the VM became running, check if the alarm has been ceased. If the alarm has been ceased, exit this procedure. If the alarm is still active, proceed to Step 11.
  10. Perform a manual failover on the cold standby Fuel VM. For more information, refer to section Failover to the Cold Standby Fuel VM (Recover from Failure) in Fuel Synchronization.

    Two scenarios are possible:

    • If the alarm is ceased, exit this procedure.
    • If the alarm is not ceased, collect troubleshooting data as described in the Data Collection Guideline and contact the next level of maintenance support.
  11. Check the networking of the failed VM by checking the traffic through the virtual switch.

    Start pinging the failed node from a vCIC which is still alive.

    While sending ping requests, run the following command on the compute multiple times, where the failed node is running and check if the RX/TX counters are increasing:

    ovs-appctl dpctl/show -s system@ovs-system | grep vfm_eth0 -A4

    If the counters are increasing, collect troubleshooting data as described in the Data Collection Guideline and contact the next level of maintenance support. If the counters are not increasing, proceed to Step 12.

  12. Restart the virtual switch on the compute by issuing the following command:

    service openvswitch-switch restart

    Two scenarios are possible:

    • If the alarm is ceased, exit this procedure.
    • If the alarm is not ceased, collect troubleshooting data as described in the Data Collection Guideline and contact the next level of maintenance support.
  13. The job is completed.

3   Additional Information

The alarm is ceased when the vFuel node has passed the availability test.