1 Introduction
This document describes recovery procedure in case of Ericsson Centralized User Database (CUDB) site failure or IP network loss.
1.1 Purpose and Scope
The purpose of this document is to provide system administrators with clear operating instructions for recovering from a site failure or IP network loss, and to make traffic or provisioning available until the CUDB site or IP network becomes operational again.
Actions to fix networking issues or IP Backbone failure are out of the scope of this document.
1.2 Revision Information
Other than editorial changes, this document has been revised as follows:
Description, Conditions, Enabling Provisioning, Enabling Traffic, and Recovery Procedure: Updated description.
1.3 Target Groups
This document is intended for CUDB system administrators.
1.4 Prerequisites
1.5 Typographic Conventions
Typographic conventions can be found in the following document:
2 Overview
This document describes the Partial Recovery procedure that can be used as an emergency work-around solution for CUDB system deployments in the case of a system split.
As a result of this procedure, the PLDB and DSG masterships are moved to a surviving CUDB partition which has been selected as service partition providing traffic and provisioning.
The partial recovery procedure consists of two procedures:
An Emergency Procedure which must be applied when the symmetrical split or minority situation occurs.
A Recovery Procedure which must be applied when the split situation ends, so that all sites are operational.
The Selective Replica Check and Data Repair processes are part of the Automatic Handling of Network Isolation function. The aim of this automatic process in CUDB is to attempt to handle and repair data loss that happened due to network split or unexpected PLDB or DSG mastership change. For more information on the Selective Replica Check and Data Repair processes, refer to CUDB Data Storage Handling.
2.1 Description
2.2 Conditions
Do not execute the procedures of this document in case of a majority situation.
The procedure can be applied in the following situations:
To detect the situations detailed above, execute the following command on one node of each live partition:
cudbSetPartitionStatus --printPartitionStatus
For more information, refer to CUDB Node Commands and Parameters.
Also check if any of the following alarms is present in the CUDB system:
For more information, refer to Control, Remote Site Unreachable and Control, Potential Split Brain Detected.
In any other situation, the system is in majority situation, and the Partial Recovery procedure cannot be applied.
3 Emergency Procedure
This section describes the emergency procedure for provisioning recovery and traffic recovery. To enable provisioning, execute procedure from Enabling Provisioning, and to enable traffic execute procedure from Enabling Traffic.
3.1 Enabling Provisioning
If isolation is caused by multiple sites failure, there is only one surviving partition where provisioning can be enabled. In this case, choose that partition as service partition to enable provisioning.
If isolation is caused by IP Backbone failure, which can be combined with site failure, there can be multiple surviving partitions. Choose the only one surviving partition as service partition.
If Selective Replica Check and Data Repair are enabled, after the backbone problem is fixed and the sites are rejoined, Automatic Handling of Network Isolation fixes data inconsistencies caused by the split and re-establishes replication in the slave replicas. For more information on Automatic Handling of Network Isolation, refer to CUDB Automatic Handling of Network Isolation Output Description.
To check if Selective Replica Check and Data Repair are enabled refer to CUDB Node Configuration Data Model Description.
If Selective Replica Check and Data Repair are disabled, disable the rest of the sites, so that they do not join the system during the execution of the procedure. The execution of this procedure on the remote nodes leads to data inconsistency and data loss without any rollback possibility. In this case, choose only one surviving partition to enable provisioning.
Steps
3.2 Enabling Traffic
This procedure is used to allow traffic in the DSGs, and it can only be applied for partitions in minority situation.
Traffic in the DSGs can be allowed in multiple surviving partitions.
Execution of this procedure does not prevent choosing this partition later as a service partition to recover the provisioning. For more information, see Enabling Provisioning.
If the automaticServiceContinuity parameter is enabled, procedure is done automatically. For more information, refer to CUDB High Availability.
Steps
4 Recovery Procedure
If CUDB system split is caused by IP Backbone failure, the network issues must be fixed before running the recovery procedure. How to fix network issues is out of the scope of this document. Request Ericsson personnel to support troubleshooting, if needed.
When the IP Backbone or failed sites are working again, follow the steps below to restore the original redundancy configuration:
| Note: |
If the Selective Replica Check and Data Repair are enabled, skip to Step 5. |
Steps
Reference List
- CUDB High Availability
- Control, Remote Site Unreachable
- Control, Potential Split Brain Detected
- CUDB Node Commands and Parameters
- CUDB Data Storage Handling
- CUDB Automatic Handling of Network Isolation Output Description
- Storage Engine, Unable to Synchronize Cluster in DS, Major
- Storage Engine, Unable to Synchronize Cluster in PLDB, Major
- CUDB Backup and Restore Procedures
- CUDB Glossary of Terms and Acronyms

Contents