1 Introduction
This instruction concerns alarm handling for the Server Platform, Storage Performance Degradation Detected alarm.
1.1 Alarm Description
The alarm is issued when a Ericsson Centralized User Data Base (CUDB) application detects that application components are impacted by a degradation of storage performance.
The alarm is issued in the following situations:
- A monitored I/O heavy process gets stuck uninterruptedly due to missing storage system response.
- Storage system responds to file system probe request with I/O error or timeout.
The possible alarm causes and the corresponding fault reasons, fault locations, and impacts are described in Table 1.
|
Alarm Cause |
Description |
Fault Reason |
Fault Location |
Impact |
|---|---|---|---|---|
|
File system probe detected an error. |
Monitored partition could not be written for a longer period of time (preset timeout) due to I/O error. |
Most probably faulty infrastructure. |
Storage system. |
Performance degradation in the CUDB system. |
|
Lightweight process state check detected an error. |
Monitored I/O heavy process got stuck in uninterruptible sleep ("disk sleep"). |
Most probably faulty infrastructure. |
Storage system. |
Performance degradation in the CUDB system. |
The following are the consequences for the node if the alarm is not solved:
- In case the alarm is raised in a payload blade or Virtual Machine (VM):
- In case the alarm is raised in a System
Controller (SC):
- Service degradation in controlling processes running on the impacted SC.
- Possible node reboots.
- Unplanned mastership changes which can cause data durability issues.
The alarm attributes are listed and explained in Table 2.
|
Attribute Name |
Attribute Value |
|---|---|
|
Auto Cease |
No |
|
Module |
SERVER-PLATFORM |
|
Error Code |
2 |
|
Timestamp First |
Date and time when the alarm was raised for the first time. |
|
Repeated Counter |
Number which indicates how many times the alarm was raised. |
|
Timestamp Last |
Date and time of the most recent alarm raised. |
|
Resource ID |
.1.3.6.1.4.1.193.169.4.2.<Blade ID> |
|
Alarm Model Description |
Storage performance degradation detected, Server Platform |
|
Alarm Active Description |
Server Platform: Storage performance degradation detected on host <Blade>. <Additional info> |
|
ITU Alarm Event Type |
equipmentAlarm (5) |
|
ITU Alarm Probable Cause |
replaceableUnitProblem (69) |
|
ITU Alarm Perceived Severity |
Major (4) |
|
Originating Source IP |
Node IP where the alarm was raised. |
|
Sequence Number |
Number which indicates order in which alarms are raised. |
In Table 2, the indicated variables are as follows:
- <Blade ID> is the LDE or LOTC node ID for the blade or VM.
- <Blade> is the LDE or LOTC hostname for the blade or VM.
- <Additional info> is different depending on the CUDB system deployment and blade type:
The possible cause is a failure in the storage system.
1.2 Prerequisites
This section provides information on the documents, tools, and conditions that apply to the procedure.
1.2.1 Documents
Before starting this procedure, ensure that you have read the following documents:
1.2.2 Tools
Not applicable.
1.2.3 Conditions
Not applicable.
2 Procedure
This section describes the procedure to follow when this alarm is received.
2.1 Procedure for CUDB Systems Deployed on Native BSP 8100
- Only in case of a payload blade, perform
the following procedure:
- Lock the blade. For the blade position, refer to the "Identifying the Faulty Blade" section of Server Platform, Blade Replacement, Reference [1], and for the blade lock, refer to the "Manage Blade" document in the BSP 8100 CPI.
- Perform the procedure described in the "Preparing Payload Blade (PLDB or DSG Blade) Replacement" section of Server Platform, Blade Replacement, Reference [1].
Warning!The steps above must be performed immediately, even if the blade replacement (the next step) can be performed later.
- Perform the blade replacement (refer to the "Replacing a Blade" section of Server Platform, Blade Replacement, Reference [1]) or contact the next level of maintenance support.
2.2 Procedure for CUDB Systems Deployed on a Cloud Infrastructure
In case the Storage Performance Degradation Detected alarm is raised, check the following in the cloud infrastructure:
- Check if there is any ongoing maintenance activity (for example, maintenance of the file systems used by the cloud infrastructure).
- Check if there is a problem with the cloud infrastructure software.
- Check if the cloud infrastructure hardware is hosting a faulty VM.
If everything is working correctly, manually delete the alarm.
In case problems in the cloud infrastructure are identified, make sure that they are fixed according to the Virtualized CUDB Virtual Machine Recovery, Reference [2].
Glossary
For the terms, definitions, acronyms, and abbreviations used in this document, refer to CUDB Glossary of Terms and Acronyms, Reference [3].
Reference List
| CUDB Documents |
|---|
| [1] Server Platform, Blade Replacement. |
| [2] Virtualized CUDB Virtual Machine Recovery. |
| [3] CUDB Glossary of Terms and Acronyms. |
| Other Ericsson Documents |
|---|
| [4] System Safety Information. |
| [5] Personal Health and Safety Information. |

Contents