1 Introduction
This instruction concerns alarm handling for the Storage Engine, DS Cluster Down alarm.
1.1 Alarm Description
This alarm is raised when some problem in the cluster prevents it from providing service.
The possible alarm causes and the corresponding fault reasons, fault locations and impacts are described in Table 1.
|
Alarm Cause |
Description |
Fault Reason |
Fault Location |
Impact |
|---|---|---|---|---|
|
The local cluster is under maintenance operation. |
The local cluster is under maintenance operation. |
Due to explicit order, the cluster is under maintenance (data restoring, initializing, stopped or restarting) and thus cannot provide service. |
Cluster Supervisors on the System Controllers (SCs). |
The cluster cannot provide service until the operation completes. |
|
All management components of the local cluster are unreachable. |
All management components of the local cluster are unreachable. |
All management components of the local cluster are unable to start or started, but impossible to access both of them. |
Management components on the SCs. |
The cluster cannot provide service. |
|
All data nodes are unreachable. |
All data nodes are unreachable. |
The data nodes cannot even start or started, but do not provide service. The fault can have several causes, for example file system consistency errors due to non-graceful shutdown, uncontrolled crash or infrastructure errors. |
Data nodes on the payload blades or Virtual Machines (VMs) of the cluster. |
The cluster cannot provide service, data redundancy is decreased. |
Unfortunately the alarm does not state which cause triggered it.
| Note: |
An alarm can appear as a result of a maintenance activity. |
The alarm attributes are listed and explained in Table 2.
|
Attribute Name |
Attribute Value |
|---|---|
|
Module |
STORAGE-ENGINE |
|
Error Code |
6 |
|
Timestamp First |
Date and time when the alarm was raised for the first time. |
|
Repeated Counter |
Number which indicates how many times the alarm was raised. |
|
Timestamp Last |
Date and time of the most recent alarm raised. |
|
Resource ID |
.1.3.6.1.4.1.193.169.1.2.6.1 |
|
Timestamp |
Date when the alarm was raised. |
|
Model Description |
Cluster down, Storage Engine. |
|
Active Description |
Storage Engine (DS-group #<DG>): Storage Engine is down. |
|
Event Type |
4 |
|
Probable Cause |
546 |
|
Perceived Severity |
(3) - Critical |
|
Originating source IP |
Node IP where the alarm was raised. |
|
Sequence Number |
Number which indicates the order in which the alarms are raised. |
In Table 2, the indicated variables are as follows:
For further information about attribute descriptions, refer to CUDB Node Fault Management Configuration Guide.
1.2 Prerequisites
This section provides information on the documents, tools, and conditions that apply to the procedure.
1.2.1 Documents
This instruction references the following documents:
1.2.2 Tools
Not applicable.
1.2.3 Conditions
Not applicable.
2 Procedure
This section describes the procedure to follow when this alarm is received.
2.1 Actions for the Local Cluster Is Under Maintenance Operation
If this state is not by intention, contact the next level of maintenance support.
2.2 Actions for All Management Components of the Local Cluster Are Unreachable
Contact the next level of maintenance support.
2.3 Actions for All Data Nodes Are Unreachable
Perform the following steps:
Steps
- Restore a previously created backup. For further information about the data backup and restore procedure, refer to CUDB Backup and Restore Procedures.
- If the alarm is not cleared automatically after the restore is completed, contact the next level of maintenance support.
- If the alarm does not cease, contact the next level of maintenance support. Further actions are outside the scope of this Operating Instruction.

Contents