1 Introduction
This document provides the description and troubleshooting steps to take for the Storage Engine, Replication Stopped Working in DS alarm.
1.1 Alarm Description
This alarm is raised when replication stopped working in a Data Store (DS) Storage Engine. The alarm is raised as a result of cudbCheckReplication command periodical execution. For further information, refer to CUDB Node Commands and Parameters, Reference [1].
The alarm is issued in the following situation:
- The reallocation process is ongoing.
- The replication delay exceeds the time limit.
- Mastership change during cudbCheckReplication execution.
- Replication malfunction.
The possible alarm causes and the corresponding fault reasons, fault locations, and impacts are described in Table 1.
|
Alarm Cause |
Description |
Fault Reason |
Fault Location |
Impact |
|---|---|---|---|---|
|
The reallocation process is ongoing. |
Reallocation is in progress, and the replication lag exceeds the time limit set for cudbCheckReplication. |
Due to the reallocation process, data replication time exceeds the defined amount of seconds set for cudbCheckReplication. |
Temporary replication delay. No fault. |
No impact. |
|
The replication delay exceeds the time limit. |
No reallocation was executed, but the replication delay exceeds the time limit set for cudbCheckReplication. |
High write rate/load on DSG. |
Temporary replication delay. No fault. |
No impact. |
|
Slow network link between master and slave. | ||||
|
Mastership change during cudbCheckReplication execution. |
A mastership change occurred while cudbCheckReplication was running preventing the script to work properly. |
A mastership change occurred while cudbCheckReplication was running preventing the script to work properly. |
No fault. |
No impact. |
|
Replication malfunction. |
The active replication channel between the local slave replica and the master one is not working properly. |
The slave replica has problems connecting the master DSG. |
Affected DSG cluster. |
If the slave replica becomes the master replica, there might be a service impact for the subscribers affected by the data inconsistency. |
|
Replication down inconsistencies on both replication channels. | ||||
|
Network issues, unstable link between master and slave. |
The alarm attributes are listed and explained in Table 2.
|
Attribute Name |
Attribute Value |
|---|---|
|
Auto Cease |
NO |
|
Application Id |
STORAGE-ENGINE |
|
Error Code |
18 |
|
Timestamp First |
Date and time when the alarm was raised for the first time. |
|
Repeated Counter |
Number which indicates how many times the alarm was raised. |
|
Timestamp Last |
Date and time of the most recent alarm raise. |
|
Model Description |
Replication stopped working, Storage Engine. |
|
Active Resource Id |
1.3.6.1.4.1.193.169.1.2.18.<DG> |
|
Active Description |
Storage Engine (DS-group #<DG>): Replication stopped working. |
|
Alarm Event Type |
communicationsAlarm (2) |
|
Probable Cause |
communicationsSubsystemFailure (505) |
|
Severity |
major (4) |
|
Originating source IP |
Node IP where the alarm was raised. |
|
Sequence Number |
Number which indicates the order in which the alarms are raised. |
In Table 2, the indicated variables are as follows:
For further information about attribute descriptions, refer to CUDB Node Fault Management Configuration Guide, Reference [2].
1.2 Prerequisites
This section provides information on the documents, tools, and conditions that apply to the procedure.
1.2.1 Documents
Before starting this procedure, ensure that you have read the following documents:
- CUDB Node Fault Management Configuration Guide, Reference [2], regarding alarm configuration.
- The section on the cudbCheckReplication command in CUDB Node Commands and Parameters, Reference [1].
- CUDB Subscription Reallocation, Reference [3], regarding the reallocation feature.
- Storage Engine, Replication Channels Down in DS, Reference [4] for related alarm information.
- Storage Engine, Unable to Synchronize Cluster in DS, Major, Reference [5] for related alarm information.
- System Safety Information, Reference [7].
- Personal Health and Safety Information, Reference [8].
1.2.2 Tools
Not applicable.
1.2.3 Conditions
Not applicable.
2 Procedure
This section describes the procedure to follow when this alarm is received.
2.1 Actions for the Reallocation Process is Ongoing
Do the following:
- Run the cudbCheckReplication command, refer to CUDB Node Commands and Parameters, Reference [1] for details.
- If it reports that the replication is working properly in DSG # <DG> on the CUDB node where the alarm was raised, then clear the alarm manually as described in CUDB Node Fault Management Configuration Guide, Reference [2].
2.2 Actions for the Replication Delay Exceeds the Time Limit
Do the following:
- Check network connections.
- Run the cudbCheckReplication command, refer to CUDB Node Commands and Parameters, Reference [1] for details.
- If it reports that the replication is working properly in DSG # <DG> on the CUDB node where the alarm was raised, then clear the alarm manually as described in CUDB Node Fault Management Configuration Guide, Reference [2].
2.3 Actions for Mastership Change During cudbCheckReplication Execution
Do the following:
- Run the cudbCheckReplication command, refer to CUDB Node Commands and Parameters, Reference [1] for details.
- If it reports that the replication is working properly in DSG # <DG> on the CUDB node where the alarm was raised, then clear the alarm manually as described in CUDB Node Fault Management Configuration Guide, Reference [2].
2.4 Actions for Replication Malfunction
Do the following:
- Check network connections.
- Check if the following alarms are raised:
- Storage Engine, Replication Channels Down in DS, Reference [4].
- Storage Engine, Unable to Synchronize Cluster in DS, Major, Reference [5].
If yes, follow the procedures in the corresponding documents above.
- Run the cudbCheckReplication command, refer to CUDB Node Commands and Parameters, Reference [1] for details.
- If it reports that the replication is working properly in DSG # <DG> on the CUDB node where the alarm was raised, then clear the alarm manually as described in CUDB Node Fault Management Configuration Guide, Reference [2].
- If the problem still exists, consult the next level of maintenance support. Further actions are outside the scope of this operating instruction.
Glossary
For the terms, definitions, acronyms and abbreviations used in this document, refer to CUDB Glossary of Terms and Acronyms, Reference [6].
Reference List
| Other Ericsson Documents |
|---|
| [7] System Safety Information. |
| [8] Personal Health and Safety Information. |

Contents