Fuel Failed
Cloud Execution Environment

Contents

1Introduction
1.1Alarm Description
1.2Prerequisites

2

Procedure
2.1Actions

3

Additional Information

1   Introduction

This instruction concerns alarm handling.

1.1   Alarm Description

The Fuel Failed alarm is a primary alarm. The alarm is issued by the Managed Object (MO) Fuel.

The alarm is issued when the periodic supervision algorithm detects that the vFuel node has failed the availability test three consecutive times, and, following that, remains unavailable for at least five minutes.

The severity of the alarm is MAJOR or CLEARED.

The possible alarm causes and fault locations are explained in Table 1.

Table 1    Alarm Causes

Alarm
Cause

Description

Fault
Reason

Fault
Location

Impact

The vFuel node is not available.

The vFuel node has failed the availability test three consecutive times, and, after that, remains unavailable for more than five minutes.

vFuel node malfunction

vFuel node

The vFuel node becomes permanently unavailable.

The following is the consequence for the node if the alarm is not solved:

The alarm attributes are listed in Table 2.

Table 2    Alarm Attributes

Attribute Name

Attribute Value

Major Type

193

Minor Type

2031706

Managed Object Class

Fuel

Managed Object Instance

Region=<name_of_the_region>,
CeeFunction=1,
CtrlDomain=1,
Fuel=<fuel_master_ip>

Specific Problem

Fuel Failed

Event Type

other (1)

Probable Cause

m3100Unavailable(14)

Additional Text

N/A

Severity

MAJOR (4) or CLEARED

1.2   Prerequisites

This section provides information on the documents, tools, and conditions that apply to the procedure.

1.2.1   Documents

Not applicable.

1.2.2   Tools

Before starting this procedure, ensure that the following tools are available:

1.2.3   Conditions

Before starting this procedure, ensure that the following conditions are met:

2   Procedure

This section describes the procedure to follow when this alarm is received.

2.1   Actions

Perform the following steps:

  1. Try to access the failed node by SSH:

    ssh <user_ID>@<name_of_the_fuel_IP_address>

    If the personal user ID does not work, use the ceeadm user ID.

    The following scenarios are possible:

    • The node can be accessed. In this case, proceed to Step 2 and carry out all subsequent steps.
    • Or the node cannot be accessed. In this case, proceed to Step 4 and carry on all subsequent steps.
  2. Issue the following command to reboot:

    sudo reboot -f

    After a successful reboot proceed to Step 3.

    Note:  
    The user must have sudo privileges to be able to run this command.

  3. Wait at least five minutes to see whether the alarm has ceased.

    If the reboot solved the problem and the alarm is ceased, exit this procedure.

    If the reboot did not solve the problem, collect troubleshooting data as described in the Data Collection Guideline and contact the next level of maintenance support.

  4. Check whether the compute, where the failed node is running as a virtual machine (VM), is available.

    Access the compute by SSH:

    ssh <userID>@<compute_node_where_node_is_running>

    If the personal user ID does not work, use the ceeadm user ID.

    The following scenarios are possible:

    • The compute can be accessed. In this case, proceed to Step 6 and carry out all subsequent steps.
    • Or the compute cannot be accessed. In this case, proceed to Step 5 and carry out all subsequent steps.
  5. Check for an active Compute Host Failed alarm for the compute node where the failed vFuel is running. If there is an active Compute Host Failed alarm for the compute node, proceed to solve it first, before continuing with the next steps. If there is no active Compute Host Failed alarm, reboot the failed compute node by using corresponding out-of-band management.

    If the compute can be accessed by SSH after solving the Compute Host Failed alarm or after the manual reboot, proceed to Step 6. If the compute cannot be accessed after rebooting using out-of-band management, replace the server as described in Server Replacement.

  6. Check the state of the VM on the compute with the following command:

    virsh list --all

    The name of the VM must be fuel_master.

    Two scenarios are possible:

    • If the state of the VM is running, proceed to Step 9.
    • If the state of the VM is not running, proceed to Step 7.
  7. Start the VM with the following command:

    virsh start <name_of_the_VM_displayed_by_virsh_list>

    Two scenarios are possible:

    • If the Fuel VM is not in running state after a few minutes, proceed to Step 8.
    • If the state of the VM became running, check if the alarm has been ceased. If the alarm has been ceased, exit this procedure. If the alarm is still active, proceed to Step 9.
  8. Perform a manual failover on the cold standby Fuel VM. For more information, refer to section Failover to the Cold Standby Fuel VM (Recover from Failure) in Fuel Synchronization.

    Two scenarios are possible:

    • If the alarm is ceased, exit this procedure.
    • If the alarm is not ceased, collect troubleshooting data as described in the Data Collection Guideline and contact the next level of maintenance support.
  9. Check the networking of the failed VM by checking the traffic through the virtual switch.

    Start pinging the failed node from a vCIC which is still alive.

    While sending ping requests, run the following command on the compute multiple times, where the failed node is running and check if the RX/TX counters are increasing:

    ovs-appctl dpctl/show -s system@ovs-system | grep vfm_eth0 -A4

    If the counters are increasing, collect troubleshooting data as described in the Data Collection Guideline and contact the next level of maintenance support. If the counters are not increasing, proceed to Step 10.

  10. Restart the virtual switch on the compute by issuing the following command:

    service openvswitch-switch restart

    Two scenarios are possible:

    • If the alarm is ceased, exit this procedure.
    • If the alarm is not ceased, collect troubleshooting data as described in the Data Collection Guideline and contact the next level of maintenance support.
  11. The job is completed.

3   Additional Information

The alarm is ceased when the vFuel node has passed the availability test.



Copyright

© Ericsson AB 2016. All rights reserved. No part of this document may be reproduced in any form without the written permission of the copyright owner.

Disclaimer

The contents of this document are subject to revision without notice due to continued progress in methodology, design and manufacturing. Ericsson shall have no liability for any error or damage of any kind resulting from the use of this document.

Trademark List
All trademarks mentioned herein are the property of their respective owners. These are shown in the document Trademark Information.

    Fuel Failed         Cloud Execution Environment