Operating Instructions 46/1543-CNH 160 6539/10 Uen B

Storage Engine, Unrepaired Data Inconsistency between Replicas, DS
Ericsson Centralized User Database

Contents


1 Introduction

This instruction concerns alarm handling for the Storage Engine, Unrepaired Data Inconsistency Between Replicas, DS alarm.

1.1 Alarm Description

This alarm is raised when the Data Repair procedure was invoked for the current Data Store Unit Group (DSG) master replica, but some of the inconsistencies between the current and the former master replicas have not been successfully repaired.

The alarm is issued in the following situations:

  • A Data Repair task has been completed, and there are unrepaired entries recorded in the output log.

The possible alarm causes and the corresponding fault reasons, fault locations, and impacts are described in Table 1.

Table 1   Alarm Causes

Alarm Cause

Description

Fault Reason

Fault Location

Impact

Some of the detected data inconsistencies between the current and former DSG master replicas could not be repaired.

Data Repair was executed with the identified LDAP entries that might not have been correctly replicated between the former and current master replicas, and it failed to repair some of these entries. These LDAP entries are recorded in the unrepaired log.

The repair of an entry may fail due to different reasons. These reasons are stated in the unrepaired log. The possible causes are that Data Repair was not able to repair at least one LDAP entry stored in the DSG due to one of the following reasons:

  • The timestamp of the entry on the current master is later than the incident timestamp.

  • The entry has no timestamp.

  • The entry could not be fetched from the former master due to some errors.

  • The entry does not exist on the current master, and inserts are not performed.

  • Concurrent write access was detected between repair and provisioning or traffic through the CDC mechanism.

  • The repair process was interrupted.

  • LDAP error happened during repair (either during fetching the entry from the current master or updating it).

Current and former DSG master replicas.

Some provisioning and traffic data updates may be missing in CUDB.

Incident timestamp refers to the time when the network incident, for example a network split or a DSG mastership change, happened in the CUDB system. For more information, refer to CUDB Data Storage Handling.

CDC means Collision Detection Counter, refer to CUDB LDAP Interwork Description for more information.

The following are the consequences for the node if the alarm is not acted upon:

  • Traffic updates or provisioning data from the former master (non-replicated transactions due to mastership change or a network incident) may be lost in CUDB, which may have a service impact for certain subscribers.

The alarm attributes are listed and explained in Table 2.

Table 2   Alarm Attributes

Attribute Name

Attribute Value

Auto Cease

No

Module

STORAGE-ENGINE

Error Code

26

Time

Date when the alarm was raised.

Resource ID

.1.3.6.1.4.1.193.169.1.2.25.<DG>.<TIMESTAMP>

Alarm Model Description

Unrepaired Data Inconsistency Between Replicas, Storage Engine

Alarm Active Description

Storage Engine (DS-Group #DG): Unrepaired data inconsistency between replicas, major (task <TASKID>, blade <BLADE>)

ITU Alarm Event Type

processingErrorAlarm (4)

ITU Alarm Probable Cause

databaseInconsistency (160)

ITU Alarm Perceived Severity

(4) - Major

Originating Source IP

Node IP where the alarm was raised.

In Table 2, the indicated variables are as follows:

  • <DG> is the DSG the DS cluster belongs to.

  • <TIMESTAMP> is an integer representing the seconds since the Unix epoch when the Data Repair task was started.

  • <BLADE> is the CUDB blade or Virtual Machine (VM) identifier the replica is located at.

  • <TASKID> is the identifier of the repair task.

For further information about attribute descriptions, refer to CUDB Node Fault Management Configuration Guide.

The alarm must be cleared manually.

For the interpretation of the unrepaired logs, refer to CUDB Automatic Handling of Network Isolation Output Description.

1.2 Prerequisites

This section provides information on the documents, tools, and conditions that apply to the procedure.

1.2.2 Tools

Not applicable.

1.2.3 Conditions

Not applicable.

2 Procedure

Do the following:

  • Locate and identify the unrepaired log based on the <BLADE> and <TASKID> parameters in the alarm as follows:

    • Log in to the alarm originator CUDB node.

    • Search for the file(s) /local2/cudb/ahsi/replica_repair/datarepair_<TASKID>_unrepaired_*.ldif.gz on the blade or VM <BLADE> .

    It is preferred to copy these files from the node to an external machine for further analysis if needed.

    Note: The files must be transferred and stored appropriately, as they may contain confidential subscriber data.
  • Unrepaired data inconsistencies may cause issues for the application Front Ends (FEs), owners of those data. Consult the appropriate application FE troubleshooting documentation for handling the effects of "Network Isolation in CUDB".