CM-HA Service Restarted
Cloud Execution Environment

Contents

1Introduction
1.1Alert Description
1.2Prerequisites

2

Procedure
2.1Actions

1   Introduction

This instruction concerns alert handling.

1.1   Alert Description

The CM-HA Service Restarted alert is issued when during a periodic check the Continuous Monitoring High Availability (CM-HA) memory consumption reaches a certain threshold and CM-HA is restarted. The threshold of CM-HA memory usage is 4 GB and the time period is one hour.

The severity of the alert is WARNING.

The possible alert causes, corresponding fault reasons, fault locations, and impacts are described in Table 1.

Table 1    Alert Causes

Alert
Cause

Description

Fault
Reason

Fault
Location

Impact

Memory utilization of CM-HA is high, and this triggers CM-HA restart.

The alert is sent when the memory utilization of CM-HA exceeds the threshold level. To free up memory, CM-HA restarts.

In some cases the Python interpreter does not free up memory used by CM-HA properly, and CM-HA needs to restart.

Controller nodes

System capacity can be degraded if memory utilization exceeds the threshold.

The alert attributes are listed in Table 2.

Table 2    Alert Attributes

Attribute Name

Attribute Value

Major Type

193

Minor Type

2031714

Managed Object Class

CM-HA

Managed Object Instance

Region=<name_of_the_region>,
CeeFunction=1,
Node=<hostname_of_the_node>,
Service=CM-HA

Specific Problem

CM-HA Service Restarted

Event Type

other (1)

Probable Cause

m3100Indeterminate (0)

Additional Text

CM-HA service is restarted because of its memory consumption reached the threshold value

Severity

WARNING (6)

1.2   Prerequisites

This section provides information on the documents, tools, and conditions that apply to the procedure.

1.2.1   Documents

Not applicable.

1.2.2   Tools

No tools are required.

1.2.3   Conditions

No conditions.

2   Procedure

This section describes the procedure to follow when this alert is received.

2.1   Actions

When the CM-HA Service Restarted alert is issued, CM-HA has already recovered. Normally, no further actions are necessary.

If the alert is observed more than once a week, perform the following:

  1. Collect troubleshooting data as described in the Data Collection Guideline.
  2. Consult the next level of maintenance support.

    Further actions are outside the scope of this instruction.

  3. The job is completed.