| 1 | Introduction |
| 1.1 | Alarm Description |
| 1.2 | Prerequisites |
2 | Procedure |
| 2.1 | Analyzing the Alarm |
| 2.2 | Actions |
3 | Additional Information |
1 Introduction
This instruction concerns alarm handling.
1.1 Alarm Description
The VM Evacuation Failed alarm is issued by the Managed Object (MO) VM.
The alarm is issued for a Virtual Machine (VM) in the following situations:
- The automatic evacuation of the VM has failed.
- Or the evacuation of the VM is not allowed due to the policy settings of the VM.
- Or the evacuation of the VM is not possible, because the fencing of the host, as a prerequisite step, has failed.
- Note:
- Only VMs in active state are attempted
to evacuate automatically to a different compute node, if the High
Availability (HA) policy of the VM allows that. For VMs with other
states than active, an alarm is always
issued to inform the application owner of the possible fault.
In case unmanaged policy is set for the VM, or no HA policy is set, the VM Evacuation Failed alarm is not sent for this VM under any circumstances.
The severity of the alarm is MAJOR.
The possible alarm causes and fault locations are explained in Table 1.
|
Alarm |
Description |
Fault |
Fault |
Impact |
|---|---|---|---|---|
|
The evacuation of a VM has failed. |
The VM could not be evacuated to another compute node. |
|
|
The VM becomes permanently unavailable. |
|
The evacuation of a VM is not allowed due to the HA policy of the VM. |
The HA policy does not allow evacuation. |
|
Compute node |
The VM will stay unavailable. |
|
The fencing of the compute failed |
Fencing, which is a must for VM evacuation, has failed on the affected compute. |
|
Controller node |
The VM will stay unavailable. |
The following is the consequence for the node if the alarm is not solved:
- The VM remains unavailable.
The alarm attributes are listed in Table 2.
|
Attribute Name |
Attribute Value |
|---|---|
|
Major Type |
193 |
|
Minor Type |
2031675 |
|
Managed Object Class |
VM |
|
Managed Object Instance |
Region=<name_of_the_region>, |
|
Specific Problem |
VM Evacuation Failed |
|
Event Type |
other (1) |
|
Probable Cause |
m3100Unavailable(14) |
|
Additional Text |
The following scenarios are possible:
{"reason": "Evacuation failed", "host": <name_of_the_host>} Where <name_of_the_host> specifies the host where the VM was running.
{"reason": "Evacuation is not allowed", "host": <name_of_the_host>} Where <name_of_the_host> specifies the host where the VM was running.
{"reason": "Fencing failed", "host": <name_of_the_host>} Where <name_of_the_host> specifies the host where the VM was running. |
|
Severity |
MAJOR (4) |
1.2 Prerequisites
This section provides information on the documents, tools, and conditions that apply to the procedure.
1.2.1 Documents
Not applicable.
1.2.2 Tools
No tools are required.
1.2.3 Conditions
No conditions.
2 Procedure
This section describes the procedure to follow when the VM Evacuation Failed alarm is received.
2.1 Analyzing the Alarm
Determine the ha-policy of the VM by checking the Additional Text field of the alarm.
The following scenarios are possible:
- The ha-policy of the VM is
set to managed-on-host.
In this case, no action is required, because the VM is not evacuated due to its HA policy.
- Or the ha-policy is set to ha-offline, but the evacuation has failed.
In this case, see Section 2.2.
- Or fencing has failed. In this case the Fencing Failed alarm must be handled.
For more information, refer to Fencing Failed.
- Note:
- Evacuation is only attempted if the ha-policy is set to ha-offline, and the VM status is active. For VMs with other statuses, an alarm is sent to inform the application owner that action is required on their part.
2.2 Actions
The evacuation of the VM is performed as soon as it is determined that the compute node on which it was running became unavailable.
The following scenarios are possible:
- A Compute Host Failed alarm
is sent. This means that the compute node becomes permanently unavailable.
In this case, proceed to Step 1, and carry on with all subsequent steps.
- Or no Compute Host Failed alarm
is sent.
In this case, the VM that has failed to evacuate is restarted automatically.
If this applies, no further actions are needed, exit this procedure.
If the VM is not restarted automatically, and the VM Evacuation Failed alarm does not cease, proceed to Step 3, and carry on with all subsequent steps.
- Resolve the Compute
Host Failed alarm.
For more information about the Compute Host Failed alarm, refer to Compute Host Failed.
- Note:
- It is possible that the procedure to resolve the Compute Host Failed alarm takes a considerate amount of time, that is, more than 15 minutes.
- If the Compute Host Failed alarm
is resolved, the system tries to restart the VM automatically.
If the alarm ceases, exit this procedure.
- If the alarm is not ceased, use
the following command:
nova show <vm_uuid>
- Note:
- The <vm_uuid> value is indicated in the Managed Object Instance field of the alarm text and Table 2.
- Check the "fault" value in the output.
The following scenarios are possible:
- The "fault" value contains the text "No valid host was found", the system was not able to evacuate
the VM, because there are no sufficient resources available on other
nodes. The following scenarios are possible:
- The region is saturated with VMs. In this case, expand the region with additional resources, refer to the document Region Expansion.
- The region has several failed compute nodes. In this
case, resolve all Compute Host Failed alarms.
For more information about the Compute Host Failed alarm, refer to Compute Host Failed.
- Note:
- It is possible that the procedure to resolve the Compute Host Failed alarms takes a considerate amount of time, that is, more than 15 minutes.
If the VM Evacuation Failed alarm persists after the Compute Host Failed alarms were resolved, proceed to Step 5.
- Or the "fault" value in the output does
not indicate insufficient resources.
In this case, proceed to Step 5.
- The "fault" value contains the text "No valid host was found", the system was not able to evacuate
the VM, because there are no sufficient resources available on other
nodes. The following scenarios are possible:
- Collect troubleshooting data as described in the Data Collection Guideline.
- Contact the next level of maintenance support.
Further actions are outside the scope of this instruction.
- The job is completed.
3 Additional Information
The alarm is ceased for a VM when the VM is restarted, and the VM state becomes active.

Contents