VM Evacuation Failed
Cloud Execution Environment

Contents

1Introduction
1.1Alarm Description
1.2Prerequisites

2

Procedure
2.1Analyzing the Alarm
2.2Actions

3

Additional Information

1   Introduction

This instruction concerns alarm handling.

1.1   Alarm Description

The VM Evacuation Failed alarm is issued by the Managed Object (MO) VM.

The alarm is issued for a Virtual Machine (VM) in the following situations:

Note:  
Only VMs in active state are attempted to evacuate automatically to a different compute node, if the High Availability (HA) policy of the VM allows that. For VMs with other states than active, an alarm is always issued to inform the application owner of the possible fault.

In case unmanaged policy is set for the VM, or no HA policy is set, the VM Evacuation Failed alarm is not sent for this VM under any circumstances.


The severity of the alarm is MAJOR.

The possible alarm causes and fault locations are explained in Table 1.

Table 1    Alarm Causes

Alarm
Cause

Description

Fault
Reason

Fault
Location

Impact

The evacuation of a VM has failed.

The VM could not be evacuated to another compute node.

  • Insufficient compute resources

  • SW error

  • HW error

  • Region

  • Compute node

The VM becomes permanently unavailable.

The evacuation of a VM is not allowed due to the HA policy of the VM.

The HA policy does not allow evacuation.

  • SW error

  • HW error

Compute node

The VM will stay unavailable.

The fencing of the compute failed

Fencing, which is a must for VM evacuation, has failed on the affected compute.

  • HW error

  • Configuration error

Controller node

The VM will stay unavailable.

The following is the consequence for the node if the alarm is not solved:

The alarm attributes are listed in Table 2.

Table 2    Alarm Attributes

Attribute Name

Attribute Value

Major Type

193

Minor Type

2031675

Managed Object Class

VM

Managed Object Instance

Region=<name_of_the_region>,
CeeFunction=1,
Tenant=<tenant_uuid>,
VM=<vm_uuid>

Specific Problem

VM Evacuation Failed

Event Type

other (1)

Probable Cause

m3100Unavailable(14)

Additional Text

The following scenarios are possible:


  • In case ha-offline policy is set (which allows evacuation), but the VM cannot be started on other compute hosts, the following additional text is displayed:


{"reason": "Evacuation failed", "host": <name_of_the_host>}


Where <name_of_the_host> specifies the host where the VM was running.


  • In case managed-on-host policy is set (which does not allow the evacuation of the VM), the following additional text is displayed:


{"reason": "Evacuation is not allowed", "host": <name_of_the_host>}


Where <name_of_the_host> specifies the host where the VM was running.


  • In case fencing failed, the following additional text is displayed:


{"reason": "Fencing failed", "host": <name_of_the_host>}


Where <name_of_the_host> specifies the host where the VM was running.

Severity

MAJOR (4)

1.2   Prerequisites

This section provides information on the documents, tools, and conditions that apply to the procedure.

1.2.1   Documents

Not applicable.

1.2.2   Tools

No tools are required.

1.2.3   Conditions

No conditions.

2   Procedure

This section describes the procedure to follow when the VM Evacuation Failed alarm is received.

2.1   Analyzing the Alarm

Determine the ha-policy of the VM by checking the Additional Text field of the alarm.

The following scenarios are possible:

Note:  
Evacuation is only attempted if the ha-policy is set to ha-offline, and the VM status is active. For VMs with other statuses, an alarm is sent to inform the application owner that action is required on their part.

2.2   Actions

The evacuation of the VM is performed as soon as it is determined that the compute node on which it was running became unavailable.

The following scenarios are possible:

  1. Resolve the Compute Host Failed alarm.

    For more information about the Compute Host Failed alarm, refer to Compute Host Failed.

    Note:  
    It is possible that the procedure to resolve the Compute Host Failed alarm takes a considerate amount of time, that is, more than 15 minutes.

  2. If the Compute Host Failed alarm is resolved, the system tries to restart the VM automatically.

    If the alarm ceases, exit this procedure.

  3. If the alarm is not ceased, use the following command:

    nova show <vm_uuid>

    Note:  
    The <vm_uuid> value is indicated in the Managed Object Instance field of the alarm text and Table 2.

  4. Check the "fault" value in the output.

    The following scenarios are possible:

    • The "fault" value contains the text "No valid host was found", the system was not able to evacuate the VM, because there are no sufficient resources available on other nodes. The following scenarios are possible:
      • The region is saturated with VMs. In this case, expand the region with additional resources, refer to the document Region Expansion.
      • The region has several failed compute nodes. In this case, resolve all Compute Host Failed alarms.

        For more information about the Compute Host Failed alarm, refer to Compute Host Failed.

        Note:  
        It is possible that the procedure to resolve the Compute Host Failed alarms takes a considerate amount of time, that is, more than 15 minutes.

        If the VM Evacuation Failed alarm persists after the Compute Host Failed alarms were resolved, proceed to Step 5.

    • Or the "fault" value in the output does not indicate insufficient resources.

      In this case, proceed to Step 5.

  5. Collect troubleshooting data as described in the Data Collection Guideline.
  6. Contact the next level of maintenance support.

    Further actions are outside the scope of this instruction.

  7. The job is completed.

3   Additional Information

The alarm is ceased for a VM when the VM is restarted, and the VM state becomes active.