| 1 | Introduction |
| 1.1 | Prerequisite |
2 | Recovery Procedures |
3 | Problem Reporting |
| 3.1 | Problem Solved |
| 3.2 | Consult Next Level of Support |
1 Introduction
This document describes the emergency recovery procedure for the virtual MTAS node.
Scope
Only MTAS-related recovery actions are included. The root cause can sometimes be inside the system, but troubleshooting outside MTAS is out of the scope for this procedure.
The recovery actions are intended for critical situations with an apparent disturbance in traffic handling. It is also applicable for situations where an important redundant function is lost (no single point of failure safe, any more). This document is a recovery instruction, the affected systems are assumed to have been in a fully working state before the problems started. No troubleshooting steps that are related to faulty configuration, or wrong software or hardware versions are explained.
Often, the critical situation has been caused by some kind of manual activity ongoing at the site. This is often some kind of upgrade, or other maintenance tasks. For such situations the fallback, or recovery procedures belonging to that particular activity (often delivered together with the upgrade), is to be tried first when the problem occurs. If that does not help, this recovery procedure can be tried.
If there are alarms raised from the system, the procedures for solving the cause of the alarms are also to be tried before this recovery procedure is consulted. Such documentation is a part of the Customer Product Information (CPI) stored in the Active Library Explorer. Also, MTAS Troubleshooting Guideline is to be consulted before this recovery procedure is considered.
This document provides an emergency recovery procedure for conditions where it is required to restore MTAS.
1.1 Prerequisite
This section states the prerequisite for performing the emergency recovery procedure.
1.1.1 Hardware and Software
The following hardware and software is required:
- A working Ericsson Cloud Solution environment (or equivalent)
- A working OpenStack environment (or equivalent)
- Heat Orchestration Template (HOT) files: main, scaling, environment (filled with instance or site-specific information) to deploy MTAS
- A recent backup of MTAS, if available
1.1.2 Documents
This instruction references the following documents:
- Data Collection Guideline for MTAS
- MTAS Health Check
- MTAS Scaling Management
- MTAS SW Installation
- Restore Backup
- View Progress Report
1.1.3 Conditions
Before starting this procedure, ensure the underlying hardware and environment, for example, Ericsson Cloud Solution, and an OpenStack or equivalent environment, have already been restored.
2 Recovery Procedures
It is assumed that MTAS must be reinstalled for some reason, for example, as a result of a failure in the underlying cloud infrastructure. This, in turn, could be because of a major hardware failure, an extensive power loss, flood, or fire.
- Note:
- Do not perform the following activities in the event of a
system failure, unless otherwise stated:
- Altering databases.
- Modifying anything other than configuration. Before modifying the configuration, always make a backup copy of the original configuration file.
- Introducing any additional changes (deltas).
- Powering off or rebooting.
- Altering the network level.
- Changing any system passwords.
To restore MTAS:
- Deploy MTAS, see MTAS SW Installation.
- Restore the latest known working backup, if possible;
see Restore Backup and View Progress Report.
- Note:
- This step assumes that a backup has been created, and has been exported from the last working configuration. See Create Backup and Export Backup.
- Restore MTAS to its original size, see MTAS Scaling Management.
- Perform node health check, see MTAS Health Check.
- Is the problem solved?
- Yes: Proceed with Section 3.1 Problem Solved.
- No: Proceed with Section 3.2 Consult Next Level of Support.
3 Problem Reporting
All recovery situations must be seen as abnormal, and must be reported to the next level of support or according to other documented procedure. This applies even if the recovery has been successful. Often a Customer Service Request (CSR) is written to a responsible support organization.
If the situation has affected the In-Service Performance (ISP), it must be reported as such according to documented procedure.
It is often required to perform a Root Cause Analysis (RCA) later to determine the source of the problem. It is therefore important to document the problematic situation and all the recovery steps that have been taken. Several log files in the system must be saved or copied to prevent them from being overwritten with newer information. It is important that these logs are available for any future RCA.
3.1 Problem Solved
Keep the site and the affected functions under extra observation for a while to ensure that the fault does not reoccur. Record the incident according to local procedures using a log book or similar.
3.2 Consult Next Level of Support
Provide the receiving support organization with the following information:
- Site name
- Location
- Operator
- Contact information
- MTAS node name, version loaded, including any upgrade packages and correction packages
- MTAS configuration, for example, the HOT template
- Problem description
- Procedures followed, including document number
- Information about activities done before, during, and after the incident
- Logs
For information on how to collect data and log files, see Data Collection Guideline for MTAS.

Contents