VM Evacuation Failed
Cloud Execution Environment

Contents

1Introduction
1.1Alarm Description
1.2Prerequisites

2

Procedure
2.1Analyzing the Alarm
2.2Actions

3

Additional Information

1   Introduction

This instruction concerns alarm handling.

1.1   Alarm Description

The VM Evacuation Failed alarm is issued by the Managed Object (MO) VM.

The alarm is issued for a Virtual Machine (VM) in the following situations:

Note:  
Only VMs in active state are attempted to evacuate automatically to a different compute node, if the High Availability (HA) policy of the VM allows that. For VMs with other states than active, an alarm is always issued to inform the application owner of the possible fault.

The severity of the alarm is MAJOR.

The possible alarm causes and fault locations are explained in Table 1.

Table 1    Alarm Causes

Alarm
Cause

Description

Fault
Reason

Fault
Location

Impact

The evacuation of a VM has failed.

The VM could not be evacuated to another compute node.

  • Insufficient compute resources

  • SW error

  • HW error

  • Region

  • Compute node

The VM becomes permanently unavailable.

The evacuation of a VM is not allowed due to the HA policy of the VM.

The HA policy does not allow evacuation.

  • SW error

  • HW error

Compute node

The VM will stay unavailable.

The fencing of the compute failed

Fencing, which is a must for VM evacuation, has failed on the affected compute.

  • HW error

  • Configuration error

Controller node

The VM will stay unavailable.

The following is the consequence for the node if the alarm is not solved:

The alarm attributes are listed in Table 2.

Table 2    Alarm Attributes

Attribute Name

Attribute Value

Major Type

193

Minor Type

2031675

Managed Object Class

VM

Managed Object Instance

Region=<name_of_the_region>,
CeeFunction=1,
Tenant=<tenant_uuid>,
VM=<vm_uuid>

Specific Problem

VM Evacuation Failed

Event Type

other (1)

Probable Cause

m3100Unavailable(14)

Additional Text

The following scenarios are possible:


  • In case HA policy allows evacuation, but the VM cannot be started on other compute hosts, the following additional text is displayed:


{"reason": "Evacuation failed", "host": <name_of_the_host>}


Where <name_of_the_host> specifies the host where the VM was running.


  • In case HA policy does not allow the evacuation of the VM, the following additional text is displayed:


{"reason": "Evacuation is not allowed", "host": <name_of_the_host>}


Where <name_of_the_host> specifies the host where the VM was running.


  • In case fencing failed, the following additional text is displayed:


{"reason": "Fencing failed", "host": <name_of_the_host>}


Where <name_of_the_host> specifies the host where the VM was running.

Severity

MAJOR (4)

1.2   Prerequisites

This section provides information on the documents, tools, and conditions that apply to the procedure.

1.2.1   Documents

Not applicable.

1.2.2   Tools

No tools are required.

1.2.3   Conditions

No conditions.

2   Procedure

This section describes the procedure to follow when the VM Evacuation Failed alarm is received.

2.1   Analyzing the Alarm

Determine the ha-policy of the VM by checking the Additional Text field of the alarm.

The following scenarios are possible:

Note:  
Evacuation is only attempted if the ha-policy is set to ha-offline, and the VM status is active. For VMs with other statuses, an alarm is sent to inform the application owner that action is required on their part.

2.2   Actions

The evacuation of the VM is performed as soon as it is determined that the compute node on which it was running became unavailable.

The following scenarios are possible:

  1. Resolve the Compute Host Failed alarm.

    For more information about the Compute Host Failed alarm, refer to Compute Host Failed.

    Note:  
    It is possible that the procedure to resolve the Compute Host Failed alarm takes a considerate amount of time, that is, more than 15 minutes.

  2. If the Compute Host Failed alarm is resolved, the system tries to restart the VM automatically, if the ha-policy of the VM is not set to unmanaged. Unmanaged VMs will not be restarted and have to be restarted manually.

    If the alarm ceases, exit this procedure.

  3. If the alarm is not ceased, use the following command:

    nova show <vm_uuid>

    Note:  
    The <vm_uuid> value is indicated in the Managed Object Instance field of the alarm text and Table 2.

  4. Check the "fault" value in the output.

    The following scenarios are possible:

    • The "fault" value contains the text "No valid host was found", the system was not able to evacuate the VM, because there are no sufficient resources available on other nodes.

      In this case, the Region has several failed compute nodes.

      Resolve all Compute Host Failed alarms.

      For more information about the Compute Host Failed alarm, refer to Compute Host Failed.

      Note:  
      It is possible that the procedure to resolve the Compute Host Failed alarms takes a considerate amount of time, that is, more than 15 minutes.

      If the VM Evacuation Failed alarm persists after the Compute Host Failed alarms were resolved, proceed to Step 5.

    • Or the "fault" value in the output does not indicate insufficient resources.

      In this case, proceed to Step 5.

  5. Collect troubleshooting data as described in the Data Collection Guideline. For alarm-specific logs, refer to the Table Data Collection for Alarms and Alerts in the Data Collection Guideline.
  6. Contact the next level of maintenance support.

    Further actions are outside the scope of this instruction.

  7. The job is completed.

3   Additional Information

The alarm is ceased for a VM when the VM is restarted, and the VM state becomes active.



Copyright

© Ericsson AB 2016. All rights reserved. No part of this document may be reproduced in any form without the written permission of the copyright owner.

Disclaimer

The contents of this document are subject to revision without notice due to continued progress in methodology, design and manufacturing. Ericsson shall have no liability for any error or damage of any kind resulting from the use of this document.

Trademark List
All trademarks mentioned herein are the property of their respective owners. These are shown in the document Trademark Information.

    VM Evacuation Failed         Cloud Execution Environment