Service Permanently Stopped
Cloud Execution Environment

Contents

1Introduction
1.1Alarm Description
1.2Prerequisites

2

Procedure

3

Additional Information

1   Introduction

This instruction concerns alarm handling.

1.1   Alarm Description

The alarm is issued by the Managed Object (MO) Service.

The Service Permanently Stopped alarm is issued in the following cases:

The severity of the alarm is MAJOR.

Table 1    Alarm Causes

Alarm Cause

Description

Fault Reason

Fault
Location

Impact

The service indicated in the Service field of the Managed Object
Instance
attribute stopped permanently.

The service monitoring functionality has detected that the service indicated in the Service field of the Managed Object Instance attribute stopped permanently.

  • Misconfiguration

  • Other
    undetermined
    reasons

The vCIC or compute node indicated in the Node field of the Managed Object Instance attribute

In case a service is running in active-active mode (for example, nova-api) on vCIC, the corresponding performance is lower and the impacted functions do not operate.


In the case of a local service (for example, nova-compute service), the function does not work at all on the node.

The alarm attributes are listed in Table 2.

Table 2    Alarm Attributes

Attribute Name

Attribute Value

Major Type

193

Minor Type

2031715

Managed Object Class

Service

Managed Object Instance

Region=<name_of_the_region>,
CeeFunction=1,
Node=<hostname_of_the_node>,
Service=<service_name>

Specific Problem

Service Permanently Stopped

Event Type

processingErrorAlarm (4)

Probable Cause

softwareProgramAbnormallyTerminated (100545)

Additional Text

On node <hostname_of_the_node> <service_name> has been permanently stopped.

Severity

MAJOR (4)

1.2   Prerequisites

This section provides information on the documents, tools, and conditions that apply to the procedure.

1.2.1   Documents

Not applicable.

1.2.2   Tools

No tools are required.

1.2.3   Conditions

Before starting this procedure, ensure that the alarm was not issued due to ongoing planned maintenance. If the alarm was issued due to ongoing planned maintenance, no further actions are required.

2   Procedure

This section describes the procedure to follow when this alarm is received.

Do the following:

  1. If the affected node is not a compute node, continue with Step 3.
  2. If the fault is detected at a compute node, perform the relevant action:
    1. If the alarm is not issued by the nova-compute service, try to move the virtual machines (VMs) by using the following command with the <hostname_of_the_node> reported in the alarm:
      for VM in $(nova list –-host <hostname_of_the_node>); do nova forcemove $VM; done
    2. If the alarm is issued by the nova-compute service, log on to the affected compute node as root and reboot it:

      ssh root@<compute_node>
      reboot -f

  3. Collect troubleshooting data as described in the Data Collection Guideline.
  4. Consult the next level of maintenance support. Further actions are outside the scope of this instruction.
  5. The job is completed.

3   Additional Information

The Service Supervision plugin monitors the following services:

On compute nodes:

  • cron

  • libvirt-bin

  • ndevalarm

  • nova-compute

  • ntp

  • ovs-vswitchd

  • ovsdb-server

  • rsyslog

  • ssh

If arp_setup is defined in config.yaml:

arpmon

If the deployment is not using Software Defined Networking (SDN):

neutron-openvswitch-agent

If the deployment is using SR-IOV:

neutron-sriov-nic-agent

Only in multi-server deployments:

ceilometer-polling

On vCICs:

  • apache-server(1)

  • cinder-api

  • cinder-scheduler

  • cron

  • glance-api

  • glance-registry

  • memcache

  • mongodb

  • mysql

  • neutron-dhcp-agent

  • nova-cert-server

  • nova-api

  • nova-conductor

  • nova-consoleauth

  • nova-novncproxy

  • nova-scheduler

  • ntp

  • ovs-vswitchd

  • ovsdb-server

  • pmapi-serverpprocess

  • rabbitmq-epmd

  • rabbitmq-server

  • rsyslog

  • sheriff

  • ssh

  • swift-account

  • swift-account-auditor

  • swift-account-reaper

  • swift-account-replicator

  • swift-container

  • swift-container-auditor

  • swift-container-replicator

  • swift-container-sync

  • swift-object

  • swift-object-auditor

  • swift-object-replicator

  • swift-object-updater

  • swift-proxy


If the deployment is using Software Defined Networking (SDN):


  • ntf_server

  • qbgpd

  • qthriftd

  • sdnc_service

  • wm_server

If neutron_conf in config.yaml is non-Extreme:

neutron-server

Only in multi-server deployments:

  • aodh-notifier

  • ceilometer-agent-notification

  • ceilometer-api

  • ceilometer-collector

  • mongodb

On Cinder:

  • cinder-volume

  • cron

  • ntp

  • ovs-vswitchd

  • rsyslog

  • ssh

If arp_setup is defined in config.yaml:

arpmon

(1)  The Zabbix web UI and Keystone are run as Web Server Gateway Interface (WSGI) services behind the Apache server.