Service Stopped
Cloud Execution Environment

Contents

1Introduction
1.1Alert Description
1.2Prerequisites

2

Procedure
2.1Analysis
2.2Actions

3

Additional Information

1   Introduction

This instruction concerns alert handling.

1.1   Alert Description

The Service Stopped alert is issued in the following cases:

Table 1    Alert Causes

Alert
Cause

Description

Fault
Reason

Fault
Location

Impact

The service indicated in the Service field of the Managed Object
Instance
attribute stopped.

The service monitoring functionality has detected that the service indicated in the Service field of the Managed Object Instance attribute stopped.

  • Misconfiguration

  • Other
    undetermined
    reasons

The vCIC or compute node indicated in the Node field of the Managed Object Instance attribute

In case a service is running in active-active mode (for example nova-api) on vCIC, then the corresponding performance is lower.

The alert attributes are listed in Table 2.

Table 2    Alert Attributes

Attribute Name

Attribute Value

Major Type

193

Minor Type

2031710

Managed Object Class

Service

Managed Object Instance

Region=<name_of_the_region>,
CeeFunction=1,
Node=<hostname_of_the_node>,
Service=<service_name>

Specific Problem

Service Stopped

Event Type

Other (1)

Probable Cause

m3100Indeterminate (0)

Additional Text

On node <hostname_of_the_node> <service_name> has been stopped.

Severity

WARNING (6)

1.2   Prerequisites

This section provides information on the documents, tools, and conditions that apply to the procedure.

1.2.1   Documents

Not applicable.

1.2.2   Tools

No tools are required.

1.2.3   Conditions

Before starting this procedure, ensure that the alert was not issued due to ongoing planned maintenance. If the alert was issued due to ongoing planned maintenance, no further actions are required.

2   Procedure

This section describes the procedure to follow when this alert is received.

2.1   Analysis

Do the following to analyze the alert:

  1. Check if the Service Permanently Stopped alarm is issued for the same service.
    • If the Service Permanently Stopped alarm is issued, refer to Service Permanently Stopped, and exit this procedure.
    • If the Service Permanently Stopped alarm is not issued, continue with Step 2.
  2. Count the number of alert occurrences in a 10 minute period and perform the relevant action:
    • If the alert occurs less than five times in 10 minutes, no actions are needed, the job is completed as the service has been recovered by Service Supervision.
    • If the alert occurs five or more times in 10 minutes, continue with Section 2.2.

2.2   Actions

Do the following:

  1. Depending on the node type, perform the relevant action:
    • If the affected node is not a compute node, continue with Step 3.
    • If the fault is detected at a compute node, try to move the virtual machines (VMs) by using the following command with the <hostname_of_the_node> reported in the alert:
for VM in $(nova list --host <hostname_of_the_node>); do nova forcemove $VM; done
  1. If the alert does not reoccur in the next 10 minutes after moving the VMs, the job is completed. Else, continue with Step 3.
  2. Collect troubleshooting data as described in the Data Collection Guideline.
  3. Consult the next level of maintenance support. Further actions are outside the scope of this instruction.
  4. The job is completed.

3   Additional Information

The Service Supervision plugin monitors the following services:

On compute nodes:

  • cron

  • libvirt-bin

  • ndevalarm

  • nova-compute

  • ntp

  • openvswitch-switch

  • qemu-kvm

  • rsyslog

  • ssh

If arp_setup is defined in config.yaml:

arpmon

If the deployment is not using Software Defined Networking (SDN):

neutron-openvswitch-agent

If the deployment is using SR-IOV:

neutron-sriov-agent

Only in multi-server deployments:

ceilometer-polling

On vCICs:

  • cinder-api

  • cinder-scheduler

  • cron

  • glance-api

  • glance-registry

  • ndevalarm

  • nova-api

  • nova-conductor

  • nova-consoleauth

  • nova-novncproxy

  • nova-scheduler

  • ntp

  • openvswitch-switch

  • rsyslog

  • sheriff

  • ssh

  • swift-account

  • swift-account-auditor

  • swift-account-reaper

  • swift-account-replicator

  • swift-container

  • swift-container-auditor

  • swift-container-replicator

  • swift-container-sync

  • swift-object

  • swift-object-auditor

  • swift-object-replicator

  • swift-object-updater

  • swift-proxy

If neutron_conf in config.yaml is non-Extreme:

neutron-server

Only in multi-server deployments:

  • aodh-notifier

  • ceilometer-agent-notification

  • ceilometer-api

  • ceilometer-collector

  • mongodb