1 Introduction
This document provides the description and troubleshooting steps to take for the Storage Engine, Unable to Synchronize Cluster in DS, Warning alarm.
1.1 Alarm Description
The alarm is raised when the Automatic Handling of Network Isolation process is starting attempt(s) to repair Data Store (DS) cluster inconsistency between the former and current master replica servers or the Self-Ordered Backup and Restore process is starting to restore the replication on a DS slave which cannot synchronize with its master replica.
The alarm is issued in the following situations:
- Automatic Handling of Network Isolation process started.
- Self-Ordered Backup and Restore process started.
If the CUDB system enters a state in which no master replica can be reached from the current node for this DSG, then this alarm is cleared automatically, and the Storage Engine, No Available Master Replica for DS, Reference [1] alarm is raised.
The possible alarm causes and the corresponding fault reasons, fault locations, and impacts are described in Table 1.
|
Alarm Cause |
Description |
Fault Reason |
Fault Location |
Impact |
|---|---|---|---|---|
|
Automatic Handling of Network Isolation process started. |
Automatic Handling of Network Isolation process is starting the Selective Replica Check and the Data Repair tasks. |
There is non-replicated data on the former master replica (missing on new master replica):
And the Automatic Handling of Network Isolation process is starting rescuing tasks. |
Both replica servers. |
Not applicable. This alarm is part of the Automatic Handling of Network Isolation process. |
|
Self-Ordered Backup and Restore process started. |
The Self-Ordered Backup and Restore process is starting the replication restoration. |
The cases when the DS slave database cluster cannot synchronize with its corresponding master replica are the following:
And the Self-Ordered Backup and Restore process is starting for the affected slave replica. |
Slave replica servers. |
Not applicable. This alarm is part of the Self-Ordered Backup and Restore process. |
- Note:
- An alarm can appear as a result of maintenance activity.
The following are the consequences for the node if the alarm is not solved:
- Not applicable. This alarm is part of the Automatic Handling of Network Isolation process.
- Not applicable. This alarm is part of the Self-Ordered Backup and Restore process.
The alarm attributes are listed and explained in Table 2.
|
Attribute Name |
Attribute Value |
|---|---|
|
Auto Cease |
Yes |
|
Module |
STORAGE-ENGINE |
|
Error Code |
28 |
|
Timestamp First |
Date and time when the alarm was raised for the first time. |
|
Repeated Counter |
Number which indicates how many times the alarm was raised. |
|
Timestamp Last |
Date and time of the most recent alarm raised. |
|
Resource ID |
.1.3.6.1.4.1.193.169.1.2.1.<DG> |
|
Alarm Model Description |
Unable to synchronize cluster, Storage Engine. |
|
Alarm Active Description |
Storage Engine (DS-group #<DG>): Synchronization to current master impossible. <add_info> (task <taskid>, time <Timestamp> - <DateTime>). |
|
ITU Alarm Event Type |
qualityOfServiceAlarm (3) |
|
ITU Alarm Probable Cause |
equipmentMalfunction (514) |
|
ITU Alarm Perceived Severity |
(6) – Warning |
|
Originating Source IP |
Node ID where the alarm was raised. |
|
Sequence Number |
Number which indicates the order in which alarms were raised. |
In Table 2, the indicated variables are as follows:
- <add_info> is a mandatory additional description field showing up when the Automatic Handling of Network Isolation or the Self-Ordered Backup and Restore process started. Its value is either "Automatic Handling of Network Isolation process started" or "Self-Ordered Backup and Restore process started".
- <DG> is the Data Store Unit Group (DSG) this cluster belongs to.
- <taskid> is a Selective Replica Check or Data Repair task identifier based on the Automatic Handling of Network Isolation activity start time.
- <Timestamp> is the Unix time representing the time of the incident, that is, the timestamp used to determine which events from the operational logs of the former master must be analyzed.
- <DateTime> is a human-readable representation of <Timestamp>, shown as local time of the node raising the alarm.
- Note:
-
- <taskid>, <Timestamp>, and <DateTime> are not shown in case of Self-Ordered Backup and Restore.
For more information about attribute descriptions, refer to CUDB Node Fault Management Configuration Guide, Reference [2].
1.2 Prerequisites
This section provides information on the documents, tools, and conditions that apply to the procedure.
1.2.1 Documents
Before starting this procedure, ensure that you have read the following documents:
- CUDB Node Fault Management Configuration Guide, Reference [2], regarding alarm configuration.
- System Safety Information, Reference [4].
- Personal Health and Safety Information, Reference [5].
1.2.2 Tools
Not applicable.
1.2.3 Conditions
Not applicable.
2 Procedure
Not applicable. Further actions are part of the Automatic Handling of Network Isolation or Self-Ordered Backup and Restore process.
Glossary
For the terms, definitions, acronyms, and abbreviations used in this document, refer to CUDB Glossary of Terms and Acronyms, Reference [3].
Reference List
| CUDB Documents |
|---|
| [1] Storage Engine, No Available Master Replica for DS. |
| [2] CUDB Node Fault Management Configuration Guide. |
| [3] CUDB Glossary of Terms and Acronyms. |
| Other Ericsson Documents |
|---|
| [4] System Safety Information. |
| [5] Personal Health and Safety Information. |

Contents