1 Introduction
This document provides an overview for the Storage Engine, DS Cluster Node Down alarm.
1.1 Alarm Description
This alarm is raised when one of the nodes of the cluster database is down or unreachable.
The alarm is issued in the following situations:
The possible alarm causes and the corresponding fault reasons, fault locations and impacts are described in Table 1.
|
Alarm Cause |
Description |
Fault Reason |
Fault Location |
Impact |
|---|---|---|---|---|
|
A management node of the database is down or unreachable. |
One of the two management node processes cannot start up or is unreachable. |
|
Blade or Virtual Machine (VM). |
No impact, as each cluster database has two management nodes. |
|
A data node of the database cluster is down or unreachable. |
The data node process cannot start up due to file system consistency errors, or is unreachable. |
|
Blade or VM. |
Lower database cluster performance while the data node is down. |
|
A replication node of the database cluster is down or unreachable. |
One of the replication node processes cannot start up or is unreachable. |
|
Blade or VM. |
No impact, as each cluster database has two replication servers per replication type (master and slave). |
|
An access node of the database cluster is down or unreachable. |
One of the access node processes cannot start up or is unreachable. |
|
Blade or VM. |
No impact, as each cluster database has two access servers. |
| Note: |
An alarm can appear as a result of maintenance activity. |
The following are the consequences for the node if the alarm is not solved:
The alarm attributes are listed and explained in Table 2.
|
Attribute Name |
Attribute Value |
|---|---|
|
Auto Cease |
Yes |
|
Module |
STORAGE-ENGINE |
|
Error Code |
2 |
|
Timestamp First |
Date and time when the alarm was raised for the first time. |
|
Repeated Counter |
Number which indicates how many times the alarm was raised. |
|
Timestamp Last |
Date and time of the most recent alarm raised. |
|
Resource ID |
.1.3.6.1.4.1.193.169.1.2.2.<ND>.<DG>.<IP> |
|
Alarm Model Description |
Cluster node down, Storage Engine. |
|
Alarm Active Description |
Storage Engine (DS-group #<DG>): <NT> node #<ND> down @ <IP>, uuid: <uuid> |
|
ITU Alarm Event Type |
processingErrorAlarm (4) |
|
ITU Alarm Probable Cause |
softwareProgramError (546) |
|
ITU Alarm Perceived Severity |
(4) – Major |
|
Originating source IP |
Node IP where the alarm was raised. |
|
Sequence Number |
Number which indicates the order in which the alarms are raised. |
In Table 2, the indicated variables are as follows:
For further information about attribute descriptions, refer to CUDB Node Fault Management Configuration Guide.
The possible causes are as follows:
1.2 Prerequisites
This section lists the prerequisites required for the procedure described in Procedure.
1.2.1 Documents
This instruction references the following documents:
1.2.2 Tools
Not applicable.
1.2.3 Conditions
Not applicable.
2 Procedure
This section describes the procedure to follow when this alarm is received.
2.1 Actions for BSP Alarm or Alert Related to Hardware Identified by IP Address
If the alarm is not cleared automatically after a short period of time, perform the following steps:
Steps
After This Task
If the faulty node is a data node (NDB), find out if the failed node in the cluster database belongs to the master replica of its DSG by following the instructions in CUDB System Administrator Guide. If this is the case, CUDB might not be able to process the nominal amount of traffic for that DSG. If the nominal traffic-processing capacity is likely to be needed before corrective actions are finished, do consider moving the mastership of the affected DSG to a healthy replica by following the master change procedure in CUDB System Administrator Guide.

Contents