1 Introduction
This document provides Fault Management (FM) information for the Ericsson Centralized User Database (CUDB).
1.1 Document Purpose and Scope
The purpose of this document is to provide a list of application alarms, and describe the alarm management and the application alarm model of the CUDB system. The infrastructure alarms of the system are not in the scope of this document, but are shortly summarized in Section 2.2.
1.2 Revision Information
| Rev. A | This document is based on 3/1553-CSH 109
067/9 with the following changes:
| |
1.3 Typographic Conventions
Typographic conventions can be found in the following document:
2 Alarms in the CUDB System
An alarm in the CUDB system is a message sent through the CUDB SNMP interface that informs the operator about a problem in the node which requires attention. The CUDB system can raise two types of alarms:
- Application alarms, raised by CUDB application components, including the software components encapsulated by CUDB.
- Infrastructure alarms, raised independently of the CUDB
application.
- Note:
- In case the CUDB system is deployed on a cloud infrastructure, then infrastructure alarms are not considered CUDB alarms.
For more information on the management and alarm model of the application alarms, see Section 2.1. For a brief summary of the management of infrastructure alarms, see Section 2.2.
2.1 Application Alarms
This section describes the management and alarm model of the CUDB application alarms.
CUDB application components (including the software component, that is the operating system and Core Middleware) send their alarms through ESA. The alarms sent to ESA are formatted according to the ERICSSON-SNF-ALARM-MIB and are sent to the Network Management System (NMS). For more information, refer to ESA Fault Management, Reference [90].
2.1.1 Alarm Format and Description
An alarm model is a logical description of the CUDB system described in a tree structure. The alarm model in Figure 1 illustrates the hierarchy of the CUDB application components which are able to raise alarms.
The alarm format used by the CUDB application components is defined by the ERICSSON-SNF-ALARM-MIB. For more information, refer to ESA Fault Management, Reference [90]. The standard location for this file, as well as for other mib files used by the CUDB FM interface, is defined by Ericsson SNMP Agent (ESA) in ESA Setup and Configuration, Reference [91].
Table 1 provides relevant information about the alarms. The Severity, Alarm Event Type and Probable Cause values follow the X.733 International Telecommunications Union (ITU) recommendation, refer to Information Technology - Open Systems Interconnection - Systems Management Alarm Reporting Function ITU-T X.733. CCITT Rec. X.733 (1992 E)Information Technology - Open Systems Interconnection - Systems Management Alarm Reporting Function ITU-T X.733, Reference [92].
|
Attribute Name |
Attribute Value |
|---|---|
|
Auto Cease |
No if the alarm is not auto ceased; Yes otherwise. |
|
Module |
The CUDB application component that raises the alarm. See Figure 1 under cudb(169). |
|
Error Code |
Assigned number identifying the alarm within a certain module (application component). |
|
Timestamp First |
Date and time when the alarm was raised for the first time. |
|
Repeated Counter |
Number which indicates how many times the alarm was raised. |
|
Timestamp Last |
Date and time of the most recent alarm raise. |
|
Resource ID |
An identifier of the alarming resource. The Object Identifier (OID) derived from the alarm model is used as the base for this identifier. |
|
Alarm Model Description |
A short description of the event. |
|
Alarm Active Description |
A dynamic text with a detailed description of the event. |
|
ITU Alarm Event Type |
A text that describes the type of the selected event, for example Communications Alarm, Processing Error Alarm or Operational Violation For more information, refer to Information Technology - Open Systems Interconnection - Systems Management Alarm Reporting Function ITU-T X.733. CCITT Rec. X.733 (1992 E)Information Technology - Open Systems Interconnection - Systems Management Alarm Reporting Function ITU-T X.733, Reference [92]. |
|
ITU Alarm Probable Cause |
A text that describes the probable cause of the event. For more information, refer to Information Technology - Open Systems Interconnection - Systems Management Alarm Reporting Function ITU-T X.733. CCITT Rec. X.733 (1992 E)Information Technology - Open Systems Interconnection - Systems Management Alarm Reporting Function ITU-T X.733, Reference [92]. |
|
ITU Alarm Perceived Severity |
The status of the event. One of the following: (1) - Cleared (2) - Indeterminate (3) - Critical (4) - Major (5) - Minor (6) - Warning For more information refer to Information Technology - Open Systems Interconnection - Systems Management Alarm Reporting Function ITU-T X.733. CCITT Rec. X.733 (1992 E)Information Technology - Open Systems Interconnection - Systems Management Alarm Reporting Function ITU-T X.733, Reference [92]. |
|
Originating Source IP |
Node IP where the alarm was raised. |
|
Sequence Number |
Number which indicates the order in which alarms are raised. |
2.1.2 Alarm Management
The CUDB application does not provide specific management procedures for the mentioned alarms apart from the manual alarm clearing procedure provided by ESA, as described in Section 2.1.2.1.
2.1.2.1 Clearing Alarms
To clear an alarm, use the fmsendmessage command with the clear parameter:
# fmsendmessage -c <module> <errorcode> <resourceid> [<Alarm Active Description>] <originatingsourceip>
where:
- <module> = Application ID of the alarm.
- <errorcode> = Error Code of the alarm.
- <resourceid> = Active Resource ID of the alarm, prefixed with a dot (".").
- <Alarm Active Description> = Optional parameter that sets the content of the Alarm Active Description. In case it is used, the default Alarm Active Description is replaced with the value of this parameter. For example, it can be used to indicate that the alarm was cleared manually by entering "Manually cleared by User".
- <originatingsourceip> = Originating Source IP of the alarm.
Example 1 Clearing an Alarm
To clear the following non-autocease alarm: --------------------------------------------------------------- Module : STORAGE-ENGINE Error Code : 8 Resource ID : .1.3.6.1.4.1.193.169.1.2.8.100 Timestamp First : Thu Sep 24 13:41:48 CEST 2015 Repeated Counter : 1 Timestamp Last : Thu Sep 24 13:41:48 CEST 2015 Alarm Model Description : Memory usage at Warning level, Storage Engine. Alarm Active Description : Storage Engine (DS-group #100): memory usage at Warning level. ITU Alarm Event Type : 4 ITU Alarm Probable Cause : 151 ITU Alarm Perceived Severity : warning Originating source IP : 10.143.56.132 Sequence Number : 554 --------------------------------------------------------------- this command is used: # fmsendmessage -c STORAGE-ENGINE 8 .1.3.6.1.4.1.193.169.1.2.8.100 "Manually cleared by User" 10.143.56.132
Even though the help message of fmsendmessage suggests that the <sourceIP> parameter is optional, this is not the case in CUDB. If the parameter is not specified, the default value (IP address of the blade or virtual machine (VM) on which the command is executed) is used and the existing alarm is not cleared.
For instance, if the fmsendmessage -c STORAGE-ENGINE 8 .1.3.6.1.4.1.193.169.1.2.8.100 command is executed (where the <sourceIP> parameter is missing), the alarm shown in Example 1 is not cleared.2.1.3 Alarm List
Alarms are grouped by different application components, as shown in Figure 1. They are described in detail in the following subsection. The alarms in the tables are in alphabetical order.
2.1.3.1 Storage Engine
Storage Engine alarms are related to the Database Cluster. The alarm model for PLDB alarms is shown in Figure 2.
Figure 2 Alarm Model for PLDB Storage Engine Alarms
The alarm model for DS and general alarms is shown in Figure 3.
Figure 3 Alarm Model for DS and General Storage Engine Alarms
Table 2 shows the list of alarms related to Storage Engine.
2.1.3.2 Lightweight Directory Access Protocol Front End
The alarm model for Lightweight Directory Access Protocol (LDAP) Front End (FE)-related alarms is shown in Figure 4.
Table 3 shows the list of alarms related to LDAP FE.
|
Alarm |
Operating Instruction |
|---|---|
|
Refer to LDAP Front End, High Load in LDAP Processing Layer, Reference [53] | |
|
LDAP Front End, Processing Capacity Below Minimum |
Refer to LDAP Front End, Processing Capacity Below Minimum, Reference [54] |
|
LDAP Front End, Processing Redundancy Lost |
Refer to LDAP Front End, Processing Redundancy Lost, Reference [55] |
|
LDAP Front End, Server Down |
Refer to LDAP Front End, Server Down, Reference [56] |
2.1.3.3 Server Platform
The alarm model for Server Platform-related alarms is shown in Figure 5.
Table 4 shows the list of alarms related to Server Platform.
|
Alarm |
Operating Instruction |
|---|---|
|
Server Platform, Storage Performance Degradation Detected |
Refer to Server Platform, Storage Performance Degradation Detected, Reference [52] |
2.1.3.4 Operating System
The alarm model for Operating System-related alarms is shown in Figure 6.
Table 5 shows the list of alarms related to the Operating System.
|
Alarm |
Operating Instruction |
|---|---|
|
Operating System, Disk Usage Too High |
Refer to Operating System, Disk Usage Too High, Reference [57] |
|
Operating System, Server Configuration Backup Fault |
Refer to Operating System, Server Configuration Backup Fault, Reference [58] |
2.1.3.5 Control
The alarm model for node visibility and global system status related alarms is shown in Figure 7.
Table 6 shows the list of alarms related to Control.
|
Alarm |
Operating Instruction |
|---|---|
|
Control, Automatic Master Election Locked Down |
Refer to Control, Automatic Master Election Locked Down, Reference [59] |
|
Control, Blackboard Coordination Cluster Down |
Refer to Control, Blackboard Coordination Cluster Down, Reference [60] |
|
Control, Blackboard Coordination Server Down |
Refer to Control, Blackboard Coordination Server Down, Reference [61] |
|
Control, Potential Split Brain Detected |
Refer to Control, Potential Split Brain Detected, Reference [62] |
|
Control, Remote Node Unreachable |
|
|
Control, Remote Site Unreachable |
2.1.3.6 Application Counters
The alarm model for Application Counters-related alarms is shown in Figure 8.
Table 7 shows the list of alarms related to Application Counters.
|
Alarm |
Operating Instruction |
|---|---|
|
Application Counters, Fault In Subscriber Statistic Application |
Refer to Application Counters, Fault In Subscriber Statistic Application, Reference [65] |
2.1.3.7 Service Availability Forum
The alarm model for Service Availability Forum (SAF)-related alarms is shown in Figure 9.
Table 8 shows the list of alarms related to SAF.
|
Alarm |
Operating Instruction |
|---|---|
|
Refer to SAF, AMF Component Instantiation Failed, Reference [67] | |
|
Refer to SAF, AMF SI Unassigned, Reference [68] | |
|
Refer to SAF, LOTC Disk Replication Communication Failed, Reference [70] | |
|
Refer to SAF, LOTC Disk Replication Consistency Failed, Reference [71] | |
|
Refer to SAF, LOTC Memory Usage Failed, Reference [73] | |
|
Refer to SAF, LOTC Time Synchronization Failed, Reference [74] |
2.1.3.8 Security
The alarm model for Security-related alarms is shown in Figure 10.
Table 9 shows the list of alarms related to Security.
|
Alarm |
Operating Instruction |
|---|---|
|
Security, OAM User Exceeded Number Of Failed Logins |
Refer to Security, OAM User Exceeded Number Of Failed Logins, Reference [75] |
|
Security, OAM User Gaining Privilege Failed |
Refer to Security, OAM User Gaining Privilege Failed, Reference [76] |
|
Security, OAM User Privilege Raise To Root Failed |
Refer to Security, OAM User Privilege Raise To Root Failed, Reference [77] |
|
Security, Root Login Failed |
Refer to Security, Root Login Failed, Reference [78] |
2.1.3.9 Preventive Maintenance
The alarm model for Preventive Maintenance underlaying related alarms is shown in Figure 11.
Table 10 shows the list of alarms related to Preventive Maintenance.
|
Alarm |
Operating Instruction |
|---|---|
|
Preventive Maintenance, Logchecker Found Error(s) |
Refer to Preventive Maintenance, Logchecker Found Error(s), Reference [79]. |
2.1.3.10 Licensing
The alarm model for Licensing - related alarms is shown in Figure 12.
Table 11 shows the list of alarms related to Licensing.
|
Alarm |
Operating Instruction |
|---|---|
|
Licensing, Autonomous Mode Activated |
Refer to Licensing, Autonomous Mode Activated, Reference [80]. |
|
Licensing, Capacity Usage Threshold Reached, Major |
Refer to Licensing, Capacity Usage Threshold Reached, Major, Reference [81]. |
|
Licensing, Capacity Usage Threshold Reached, Warning |
Refer to Licensing, Capacity Usage Threshold Reached, Warning, Reference [82]. |
|
Licensing, Emergency Unlock Reset Key Required |
Refer to Licensing, Emergency Unlock Reset Key Required, Reference [83]. |
|
Licensing, Key File Fault |
Refer to Licensing, Key File Fault, Reference [84]. |
|
Licensing, License Key Not Available, Major |
Refer to Licensing, License Key Not Available, Major, Reference [85]. |
|
Licensing, License Key Not Available, Minor |
Refer to Licensing, License Key Not Available, Minor, Reference [86]. |
|
Licensing, License Manager Not Available |
Refer to Licensing, License Manager Not Available, Reference [87]. |
2.1.4 Alarm Relationships
The following alarm relationships are present in the system:
- Storage Engine alarms:
- If memory usage gets to warning level, the Storage Engine, Memory Usage Too High In PLDB, Warning, Reference [23] alarm is raised.
If memory usage then gets to major level, the Storage Engine, Memory Usage Too High In PLDB, Major, Reference [22] is raised, and
the previous alarm is cleared. If memory usage continues growing,
the Storage Engine, Out Of Memory In PLDB, Reference [27] alarm is raised,
and the previous one is maintained.
The same principle applies for similar alarms related to DS.
- If the BLOB storage space usage reaches the warning
level, the Storage Engine, Tablespace Usage Too High In PLDB, Warning, Reference [44] is raised. If the
BLOB storage space usage continues growing until there is no more
space left, the alarm Storage Engine, Out Of Tablespace In PLDB, Reference [29] is raised and the
previous one is cleared.
The same principle applies for similar alarms related to DS.
- If the Consistency Check (refer to CUDB Consistency Check, Reference [88] for more information) finds any inconsistencies between
two replicas, the Storage Engine, Data Inconsistency between Replicas Found in PLDB, Minor , Reference [9] or Storage Engine, Data Inconsistency between Replicas Found in PLDB, Major , Reference [8] alarm is raised, depending on the type and amount of inconsistencies.
If any of these alarms have been raised earlier before running the
Consistency Check, the earlier alarms are cleared, and a new one is
raised, if necessary, with the proper severity. These two alarms cannot
be raised simultaneously.
The same principle applies to similar alarms related to the DS.
- If the Automatic Handling of Network Isolation function
is enabled, when a former PLDB master replica rejoins the system as
a slave replica (that is, recovery from a system split situation or
unexpected mastership change) and it is not able to get in sync with
the current master replica, then:
- The Storage Engine, Unable to Synchronize Cluster in PLDB, Warning, Reference [49] alarm is raised and the Automatic Handling of Network Isolation task is started.
- After the execution of the first part of the Automatic
Handling of Network Isolation task, the Selective Replica Check subtask:
- The Storage Engine, Execution of Selective Replica Check Failed, PLDB, Major, Reference [17] alarm is raised if some entries were impossible to retrieve.
- The Storage Engine, Automatic Handling of Network Isolation not Completed for PLDB, Reference [2] alarm is raised if Selective Replica Check subtask cannot be completed.
- The Storage Engine, Unable to Synchronize Cluster in PLDB, Warning, Reference [49] alarm is cleared. The alarm that follows depends on whether
the Self-Ordered Backup and Restore function is enabled, irrespective
of whether the Selective Replica Check subtask could be completed
and irrespective of its outcome.
In case the Self-Ordered Backup and Restore function is disabled, the Storage Engine, Unable to Synchronize Cluster in PLDB, Major, Reference [48] alarm is raised, otherwise a new Storage Engine, Unable to Synchronize Cluster in PLDB, Warning, Reference [49] alarm is raised, which indicates the start of the Self-Ordered Backup and Restore process.
- After the execution of the second part of the Automatic
Handling of Network Isolation, the Data Repair subtask:
- The Storage Engine, Data Inconsistency between Replicas Repaired, PLDB, Reference [11] alarm is raised if any LDAP entry has been repaired.
- The Storage Engine, Unrepaired Data Inconsistency between Replicas, PLDB, Reference [50] alarm is raised if some LDAP entries were impossible to repair.
- The Storage Engine, Automatic Handling of Network Isolation not Completed for PLDB, Reference [2] alarm is raised if the Data Repair subtask could not be completed.
- Note:
- The second part of the Automatic Handling of Network Isolation task, the Data Repair subtask, is started only if the execution of Selective Replica Check subtask was completed.
All previously raised alarms, except for the Storage Engine, Unable to Synchronize Cluster in PLDB, Warning, Reference [49] alarm, are maintained.
- If the Self-Ordered Backup and Restore function is enabled
when a slave PLDB replica is unable to get in sync with the current
master replica, then:
- The Storage Engine, Unable to Synchronize Cluster in PLDB, Warning, Reference [49] alarm is raised and the Self-Ordered Backup and Restore task is started.
In case the Automatic Handling of Network Isolation function is enabled, then raising this alarm is postponed until the alarm Storage Engine, Unable to Synchronize Cluster in PLDB, Warning, Reference [49] related to the execution of Selective Check subtask of Automatic Handling of Network Isolation is cleared.
- After the execution of the Self-Ordered Backup and Restore
task:
- The Storage Engine, Unable to Synchronize Cluster in PLDB, Warning, Reference [49] alarm, related to Self-Ordered Backup and Restore task is cleared irrespective of whether the Self-Ordered Backup and Restore task could be completed and irrespective of its outcome.
- The Storage Engine, Unable to Synchronize Cluster in PLDB, Major, Reference [48] alarm is raised if the Self-Ordered Backup and Restore task failed to restore the replication.
- Note:
- Previously raised alarms, except for the Storage Engine, Unable to Synchronize Cluster in PLDB, Warning, Reference [49] alarm, are maintained.
Alarms related to the execution of Data Repair subtask of the Automatic Handling of Network Isolation function may be raised in parallel with alarms related to the Self-Ordered Backup and Restore function.
- If memory usage gets to warning level, the Storage Engine, Memory Usage Too High In PLDB, Warning, Reference [23] alarm is raised.
If memory usage then gets to major level, the Storage Engine, Memory Usage Too High In PLDB, Major, Reference [22] is raised, and
the previous alarm is cleared. If memory usage continues growing,
the Storage Engine, Out Of Memory In PLDB, Reference [27] alarm is raised,
and the previous one is maintained.
- LDAP FE alarms:
- The LDAP Front End, Server Down, Reference [56] alarm is linked to each LDAP FE component in the CUDB node. On the other hand, there is a redundancy level indication, corresponding to the maximum number of redundant LDAP FE elements. It is the number of LDAP FEs that can be down without the CUDB node losing its required level of performance.
- When the number of LDAP FEs down is equal to redundancy level, the alarm LDAP Front End, Processing Redundancy Lost, Reference [55] is raised, while LDAP Front End, Server Down, Reference [56] alarms related to those LDAP FEs are maintained.
- If one or more LDAP FEs go down, the alarm LDAP Front End, Processing Capacity Below Minimum, Reference [54] is raised, but all previous ones are maintained.
2.2 Infrastructure Alarms
Consider the following regarding infrastructure alarms:
- In case the CUDB system is deployed on native BSP 8100 hardware, the CUDB infrastructure alarms are raised by the underlying BSP 8100 management software. Unlike application ones, these alarms are not sent through ESA agents which are part of the CUDB application. More information on the infrastructure alarms and SNMP configuration can be found in the "BSP Fault Management" and "BSP System Notifications" documents in the BSP 8100 CPI.
- In case the CUDB system is deployed on a cloud infrastructure, refer to the alarm-related documentation of the infrastructure for more information on the infrastructure alarms and SNMP configuration.
3 Configuration
Some of the alarms are raised when the value of a parameter in CUDB goes above a configured threshold. See the specific alarm OPIs for information on the applicable parameters and thresholds, and how to configure them. It is also possible to configure the NMS IP address where alarm traps are sent. The version used for SNMP is version 3.
For more details about how to configure SNMP for CUDB application components, refer to ESA Fault Management, Reference [91]. For more information on configuring SNMP for infrastructure components, see Section 2.2.
Glossary
For the terms, definitions, acronyms and abbreviations used in this document, refer to CUDB Glossary of Terms and Acronyms, Reference [89].
Reference List
| Other Ericsson Documents |
|---|
| [90] ESA Fault Management. |
| [91] ESA Setup and Configuration. |
| Other Documents and Online References |
|---|
| [92] Information Technology - Open Systems Interconnection - Systems Management Alarm Reporting Function ITU-T X.733. CCITT Rec. X.733 (1992 E) http://www.itu.int/rec/T-REC-X.733/. |

Contents