Storage Engine, Replication Stopped Working in DS
Ericsson Centralized User Database

Contents

1Introduction
1.1Alarm Description
1.2Prerequisites

2

Procedure
2.1Actions for the Reallocation Process is Ongoing
2.2Actions for the Replication Delay Exceeds the Time Limit
2.3Actions for Mastership Change During cudbCheckReplication Execution
2.4Actions for Replication Malfunction

Glossary

Reference List

1   Introduction

This document provides the description and troubleshooting steps to take for the Storage Engine, Replication Stopped Working in DS alarm.

1.1   Alarm Description

This alarm is raised when replication stopped working in a Data Store (DS) Storage Engine. The alarm is raised as a result of cudbCheckReplication command periodical execution. For further information, refer to CUDB Node Commands and Parameters, Reference [1].

The alarm is issued in the following situation:

The possible alarm causes and the corresponding fault reasons, fault locations, and impacts are described in Table 1.

Table 1    Alarm Causes

Alarm Cause

Description

Fault Reason

Fault Location

Impact

The reallocation process is ongoing.

Reallocation is in progress, and the replication lag exceeds the time limit set for cudbCheckReplication.

Due to the reallocation process, data replication time exceeds the defined amount of seconds set for cudbCheckReplication.

Temporary replication delay. No fault.

No impact.

The replication delay exceeds the time limit.

No reallocation was executed, but the replication delay exceeds the time limit set for cudbCheckReplication.

High write rate/load on DSG.

Temporary replication delay. No fault.

No impact.

Slow network link between master and slave.

Mastership change during cudbCheckReplication execution.

A mastership change occurred while  cudbCheckReplication was running preventing the script to work properly.

A mastership change occurred while  cudbCheckReplication was running preventing the script to work properly.

No fault.

No impact.

Replication malfunction.

The active replication channel between the local slave replica and the master one is not working properly.

The slave replica has problems connecting the master DSG.

Affected DSG cluster.

If the slave replica becomes the master replica, there might be a service impact for the subscribers affected by the data inconsistency.

Replication down inconsistencies on both replication channels.

Network issues, unstable link between master and slave.

The alarm attributes are listed and explained in Table 2.

Table 2    Alarm Attributes

Attribute Name

Attribute Value

Auto Cease

NO

Application Id

STORAGE-ENGINE

Error Code

18

Timestamp First

Date and time when the alarm was raised for the first time.

Repeated Counter

Number which indicates how many times the alarm was raised.

Timestamp Last

Date and time of the most recent alarm raise.

Model Description

Replication stopped working, Storage Engine.

Active Resource Id

1.3.6.1.4.1.193.169.1.2.18.<DG>

Active Description

Storage Engine (DS-group #<DG>): Replication stopped working.

Alarm Event Type

communicationsAlarm (2)

Probable Cause

communicationsSubsystemFailure (505)

Severity

major (4)

Originating source IP

Node IP where the alarm was raised.

Sequence Number

Number which indicates the order in which the alarms are raised.

In Table 2, the indicated variables are as follows:

For further information about attribute descriptions, refer to CUDB Node Fault Management Configuration Guide, Reference [2].

1.2   Prerequisites

This section provides information on the documents, tools, and conditions that apply to the procedure.

1.2.1   Documents

Before starting this procedure, ensure that you have read the following documents:

1.2.2   Tools

Not applicable.

1.2.3   Conditions

Not applicable.

2   Procedure

This section describes the procedure to follow when this alarm is received.

2.1   Actions for the Reallocation Process is Ongoing

Do the following:

  1. Run the cudbCheckReplication command, refer to CUDB Node Commands and Parameters, Reference [1] for details.
  2. If it reports that the replication is working properly in DSG # <DG> on the CUDB node where the alarm was raised, then clear the alarm manually as described in CUDB Node Fault Management Configuration Guide, Reference [2].

2.2   Actions for the Replication Delay Exceeds the Time Limit

Do the following:

  1. Check network connections.
  2. Run the cudbCheckReplication command, refer to CUDB Node Commands and Parameters, Reference [1] for details.
  3. If it reports that the replication is working properly in DSG # <DG> on the CUDB node where the alarm was raised, then clear the alarm manually as described in CUDB Node Fault Management Configuration Guide, Reference [2].

2.3   Actions for Mastership Change During cudbCheckReplication Execution

Do the following:

  1. Run the cudbCheckReplication command, refer to CUDB Node Commands and Parameters, Reference [1] for details.
  2. If it reports that the replication is working properly in DSG # <DG> on the CUDB node where the alarm was raised, then clear the alarm manually as described in CUDB Node Fault Management Configuration Guide, Reference [2].

2.4   Actions for Replication Malfunction

Do the following:

  1. Check network connections.
  2. Check if the following alarms are raised:

    If yes, follow the procedures in the corresponding documents above.

  3. Run the cudbCheckReplication command, refer to CUDB Node Commands and Parameters, Reference [1] for details.
  4. If it reports that the replication is working properly in DSG # <DG> on the CUDB node where the alarm was raised, then clear the alarm manually as described in CUDB Node Fault Management Configuration Guide, Reference [2].
  5. If the problem still exists, consult the next level of maintenance support. Further actions are outside the scope of this operating instruction.

Glossary

For the terms, definitions, acronyms and abbreviations used in this document, refer to CUDB Glossary of Terms and Acronyms, Reference [6].


Reference List

Ericsson Documents
[1] CUDB Node Commands and Parameters.
[2] CUDB Node Fault Management Configuration Guide.
[3] CUDB Subscription Reallocation.
[4] Storage Engine, Replication Channels Down in DS.
[5] Storage Engine, Unable to Synchronize Cluster in DS, Major.
[6] CUDB Glossary of Terms and Acronyms.
Other Ericsson Documents
[7] System Safety Information.
[8] Personal Health and Safety Information.