Control, Blackboard Coordination Server Down
Ericsson Centralized User Database

Contents

1Introduction
1.1Alarm Description
1.2Prerequisites

2

Procedure

Glossary

Reference List

1   Introduction

This instruction concerns alarm handling for the Control, Blackboard Coordination Server Down alarm.

1.1   Alarm Description

The alarm is issued when a Blackboard Coordination (BC) server is down.

The possible alarm causes and the corresponding fault reasons, fault locations, and impacts are described in Table 1.

Table 1    Alarm Causes

Alarm Cause

Description

Fault Reason

Fault Location

Impact

The blade or virtual machine hosting a BC server is down.

The blade or virtual machine hosting the BC server instance is down.

The blade or virtual machine is rebooting or shut down, and cannot provide any service.

The blade or virtual machine holding the BC server (that is, the System Controllers (SCs), or PL_2_5).

BC server redundancy is decreased, since the system is running with one less BC server instance.

A BC server goes down, or becomes unreachable.

The BC server process is not running

The process has been stopped or killed, and cannot be started.

The BC server process running in the SCs or PL_2_5.

BC server redundancy is decreased, since the system is running with one less BC server instance.

A BC server does not provide any service.

The BC server process is running, but is unable to provide any service.

The BC server process is running, but in an unhealthy state.

The BC server process running in the SCs or PL_2_5.

BC server redundancy is decreased, since the system is running with one less BC server instance.

The files on a BC server are corrupted because of inconsistent information in the data directory.

The information stored in the files of the BC server is corrupted, or inconsistent.

Problem in the /local file system in the blade or virtual machine running the BC server, or wrong information in the BC server files.

The files in the /local/cudb/BCServer folder on the SCs, or PL_2_5.

BC server redundancy is decreased, since the system is running with one less BC server instance.

The alarm attributes are listed and explained in Table 2.

Table 2    Alarm Attributes

Attribute Name

Attribute Value

Auto Cease

Yes

Module

CONTROL

Error Code

4

Timestamp First

Date and time when the alarm was raised for the first time.

Repeated Counter

Number which indicates how many times the alarm was raised.

Timestamp Last

Date and time of the most recent alarm raise.

Resource ID

1.3.6.1.4.1.193.169.7.4.CN.BC

Alarm Model Description

Blackboard Coordination Server Down, Control.

Alarm Active Description

Control: Blackboard Coordination Server <IP Address>:<port> down, uuid: <uuid>

ITU Alarm Event Type

processingErrorAlarm (4)

ITU Alarm Probable Cause

softwareProgramError (546)

ITU Alarm Perceived Severity

(4) - Major

Originating source IP

Node IP where the alarm was raised.

Sequence Number

Number which indicates the order in which the alarms are raised.

In Table 2, the indicated variables are as follows:

For further information about attribute descriptions, refer to CUDB Node Fault Management Configuration Guide, Reference [3].

1.2   Prerequisites

This section provides information on the documents, tools, and conditions that apply to the procedure.

1.2.1   Documents

Before starting this procedure, ensure that you have read the following documents:

1.2.2   Tools

Not applicable.

1.2.3   Conditions

Not applicable.

2   Procedure

If the alarm is raised, then do the following:

  1. Wait for a short time for the alarm to clear. If the alarm clears, no further actions must be taken. If it is not cleared after a short period of time, continue with the next step.
  2. Try to restart the process manually with the following command:

    /opt/ericsson/cudb/OAM/bin/cudbManageBCServer -restart

  3. Check the log file of the failing BC Server on the blade or virtual machine holding the BC Server (look for some IOException on loading the database). The log is located in the following directory:

    /var/log/bc_server.err

    For further details, check the "Zookeeper" section of CUDB Node Logging Events, Reference [5].

  4. If the BC Server is unable to read its database, and fails to start because of file corruption in the transaction logs, then do the following:
    1. Make sure that all the other BC Servers in the BC Cluster are up and running with the following command:

      cudbSystemStatus -b

    2. If all the other BC Servers of the BC Cluster are up, then clean the database of the corrupt BC Server with the following command:

      rm -rf /local/cudb/BCServer/version-2

    3. Try to restart the process manually with the following command:

      /opt/ericsson/cudb/OAM/bin/cudbManageBCServer -restart

    4. Wait for a short time for the alarm to clear.
  5. If the problem is not identified, or the alarm does not cease with the measures taken, consult the next level of maintenance support. Further actions are outside the scope of this instruction.

Glossary

For the terms, definitions, acronyms and abbreviations used in this document, refer to CUDB Glossary of Terms and Acronyms, Reference [6].


Reference List

CUDB Documents
[1] CUDB Node Configuration Data Model Description.
[2] CUDB High Availability.
[3] CUDB Node Fault Management Configuration Guide.
[4] CUDB Node Commands and Parameters.
[5] CUDB Node Logging Events.
[6] CUDB Glossary of Terms and Acronyms.
Other Ericsson Documents
[7] System Safety Information.
[8] Personal Health and Safety Information.


Copyright

© Ericsson AB 2016. All rights reserved. No part of this document may be reproduced in any form without the written permission of the copyright owner.

Disclaimer

The contents of this document are subject to revision without notice due to continued progress in methodology, design and manufacturing. Ericsson shall have no liability for any error or damage of any kind resulting from the use of this document.

Trademark List
All trademarks mentioned herein are the property of their respective owners. These are shown in the document Trademark Information.

    Control, Blackboard Coordination Server Down         Ericsson Centralized User Database