| 1 | Introduction |
| 1.1 | Alert Description |
| 1.2 | Prerequisites |
2 | Procedure |
| 2.1 | Analysis |
| 2.2 | Actions |
3 | Additional Information |
1 Introduction
This instruction concerns alert handling.
1.1 Alert Description
The Service Stopped alert is issued in the following cases:
- A service operating at a vCIC or compute node is stopped.
The possible alarm cause and the corresponding fault reasons, fault locations and impacts are described in Table 1. For the list of services monitored by the Service Supervision plugin, see Section 3.
- The alarm can also appear as a result of maintenance activity.
|
Alert |
Description |
Fault |
Fault |
Impact |
|---|---|---|---|---|
|
The service indicated in the Service field of the Managed Object |
The service monitoring functionality has detected that the service indicated in the Service field of the Managed Object Instance attribute stopped. |
|
The vCIC or compute node indicated in the Node field of the Managed Object Instance attribute |
In case a service is running in active-active mode (for example nova-api) on vCIC, then the corresponding performance is lower. |
The alert attributes are listed in Table 2.
|
Attribute Name |
Attribute Value |
|---|---|
|
Major Type |
193 |
|
Minor Type |
2031710 |
|
Managed Object Class |
Service |
|
Managed Object Instance |
Region=<name_of_the_region>, |
|
Specific Problem |
Service Stopped |
|
Event Type |
Other (1) |
|
Probable Cause |
m3100Indeterminate (0) |
|
Additional Text |
On node <hostname_of_the_node> <service_name> has been stopped. |
|
Severity |
WARNING (6) |
1.2 Prerequisites
This section provides information on the documents, tools, and conditions that apply to the procedure.
1.2.1 Documents
Not applicable.
1.2.2 Tools
No tools are required.
1.2.3 Conditions
Before starting this procedure, ensure that the alert was not issued due to ongoing planned maintenance. If the alert was issued due to ongoing planned maintenance, no further actions are required.
2 Procedure
This section describes the procedure to follow when this alert is received.
2.1 Analysis
Do the following to analyze the alert:
- Check if the Service Permanently Stopped alarm is issued for the same service.
- If the Service Permanently Stopped alarm is issued, refer to Service Permanently Stopped, and exit this procedure.
- If the Service Permanently Stopped alarm is not issued, continue with Step 2.
- Count the number of alert
occurrences in a 10 minute period and perform the relevant action:
- If the alert occurs less than five times in 10 minutes, no actions are needed, the job is completed as the service has been recovered by Service Supervision.
- If the alert occurs five or more times in 10 minutes, continue with Section 2.2.
2.2 Actions
Do the following:
for VM in $(nova list --host <hostname_of_the_node>); do nova forcemove $VM; done
- If the alert does not reoccur in the next 10 minutes after moving the VMs, the job is completed. Else, continue with Step 3.
- Collect troubleshooting data as described in the Data Collection Guideline.
- Consult the next level of maintenance support. Further actions are outside the scope of this instruction.
- The job is completed.
3 Additional Information
The Service Supervision plugin monitors the following services:
|
On compute nodes: |
|
|
If arp_setup is defined in config.yaml: |
arpmon |
|
If the deployment is not using Software Defined Networking (SDN): |
neutron-openvswitch-agent |
|
If the deployment is using SR-IOV: |
neutron-sriov-agent |
|
Only in multi-server deployments: |
ceilometer-polling |
|
On vCICs: |
|
|
If neutron_conf in config.yaml is non-Extreme: |
neutron-server |
|
Only in multi-server deployments: |
|

Contents