1 Introduction
This instruction concerns alarm handling.
1.1 Alarm Description
The alarm is issued by the Managed Object (MO) Service.
The Service Permanently Stopped alarm is issued in the following cases:
- A service operating at a vCIC or compute node is stopped
permanently.
The possible alarm cause and the corresponding fault reasons, fault locations, and impacts are described in Table 1. For the list of services monitored by the Service Supervision plugin, see Section 3.
- The alarm can also appear as a result of maintenance activity.
The severity of the alarm is MAJOR.
|
Alarm Cause |
Description |
Fault Reason |
Fault |
Impact |
|---|---|---|---|---|
|
The service indicated in the Service field of the Managed Object |
The service monitoring functionality has detected that the service indicated in the Service field of the Managed Object Instance attribute stopped permanently. |
|
The vCIC or compute node indicated in the Node field of the Managed Object Instance attribute |
In case a service is running in active-active mode (for example, nova-api) on vCIC, the corresponding performance is lower and the impacted functions do not operate. In the case of a local service (for example, nova-compute service), the function does not work at all on the node. |
The alarm attributes are listed in Table 2.
|
Attribute Name |
Attribute Value |
|---|---|
|
Major Type |
193 |
|
Minor Type |
2031715 |
|
Managed Object Class |
Service |
|
Managed Object Instance |
Region=<name_of_the_region>, |
|
Specific Problem |
Service Permanently Stopped |
|
Event Type |
processingErrorAlarm (4) |
|
Probable Cause |
softwareProgramAbnormallyTerminated (100545) |
|
Additional Text |
On node <hostname_of_the_node> <service_name> has been permanently stopped. |
|
Severity |
MAJOR (4) |
1.2 Prerequisites
This section provides information on the documents, tools, and conditions that apply to the procedure.
1.2.1 Documents
Not applicable.
1.2.2 Tools
No tools are required.
1.2.3 Conditions
Before starting this procedure, ensure that the alarm was not issued due to ongoing planned maintenance. If the alarm was issued due to ongoing planned maintenance, no further actions are required.
2 Procedure
This section describes the procedure to follow when this alarm is received.
Do the following:
- If the affected node is not a compute node, continue with Step 3.
- If the fault is detected at a compute
node, perform the relevant action:
- If the alarm is not issued by the nova-compute service, try to move the virtual machines (VMs) by using the following
command with the <hostname_of_the_node> reported in the alarm:
for VM in $(nova list –-host <hostname_of_the_node>); do nova forcemove $VM; done
- If the alarm is issued by the nova-compute service, log on to the affected compute node as root and reboot
it:
ssh root@<compute_node>
reboot -f
- If the alarm is not issued by the nova-compute service, try to move the virtual machines (VMs) by using the following
command with the <hostname_of_the_node> reported in the alarm:
- Collect troubleshooting data as described in the Data Collection Guideline.
- Consult the next level of maintenance support. Further actions are outside the scope of this instruction.
- The job is completed.
3 Additional Information
The Service Supervision plugin monitors the following services:
|
On compute nodes: |
|
|
If arp_setup is defined in config.yaml: |
arpmon |
|
If the deployment is not using Software Defined Networking (SDN): |
neutron-openvswitch-agent |
|
If the deployment is using SR-IOV: |
neutron-sriov-nic-agent |
|
Only in multi-server deployments: |
ceilometer-polling |
|
On vCICs: |
If the deployment is using Software Defined Networking (SDN):
|
|
If neutron_conf in config.yaml is non-Extreme: |
neutron-server |
|
Only in multi-server deployments: |
|
|
On Cinder: |
|
|
If arp_setup is defined in config.yaml: |
arpmon |
(1) The Zabbix
web UI and Keystone are run as Web Server Gateway Interface (WSGI)
services behind the Apache server.

Contents