Service Migration in the Production Center After Fault Recovery

If data or applications in the production center become unavailable due to disasters or faults, you must quickly recover the data or applications from the DR center. Before a fault recovery, at least one recovery plan test must be executed successfully. This operation can be performed only on eReplication in the DR center.

Prerequisites

Context

If data or applications become unavailable due to disasters or faults in the production center, services will be quickly switched to and started in the DR center. After the production center recovers, the services will be switched back to it by performing planned migration.

Procedure

  1. Perform pre-recovery configurations.

    • When the type of protected objects is FusionCompute VMs, perform the following configurations:
      • Without configuration, the VM IP address for fault recovery is the same as that in the production center. You can change it for the planned VM migration on the Protected Object tab page of the recovery policy. For details, see Self-defining Startup Parameters for a Protected Object.
      • After adding or removing disks for a protected VM, refresh the information about the VM and manually enable DR for the protected group where the VM resides in time.
    • When the type of protected objects is VMware VMs, perform the following configurations:
      • Without configuration, the VM IP address for fault recovery is the same as that in the production center. You can configure one for fault recovery on the Protected Object tab page of the recovery plan. For details, see Self-defining Startup Parameters for a Protected Object.
      • When the asynchronous replication (NAS) DR solution is deployed, you need to create a share and configure permissions on DeviceManager of the storage array at the DR site. Permissions must be the same as those in the production center.

        If you fail to create a share and configure permissions, faults cannot be rectified.

  2. Perform a fault recovery.

    If Huawei UltraPath has been installed on the Linux-based DR host, ensure that I/O suspension time is not 0 and all virtual devices generated by UltraPath have corresponding physical devices. For details, see the OceanStor UltraPath for Linux xxx User Guide.

    1. On the menu bar, select Utilization > Data Restore.
    2. Select the recovery plan to be recovered and click More > Fault Recovery on the Operation list.
    3. Perform fault recovery based on protected object types.
      • If the type of protected objects is LUN, Local File System, Oracle, IBM DB2, Microsoft SQL Server, SAP HANA, or Microsoft Exchange Server, perform the following operations:
        1. Select DR Site.
        2. Select Host (Group) > Available DR Hosts or Host Groups (This operation is optional when the protected object type is LUN).
          • If the storage array used at the DR site is T series V2 or later, the to-be-recovered host selected by a user can belong to only one host group on the storage array, and the host group can belong to only one mapping view. Moreover, the storage LUN used by protected applications and its corresponding secondary remote replication LUNs must belong to one LUN group, and the LUN group must reside in the same mapping view as the host group. If the storage array version is T series V2R2, deselect Enable Inband Command to change the mapping view attribute after the mapping view is created.
          • If the storage array is T series V2R2 or later, or 18000 series, automatic host adding and storage mapping are provided. Ensure that the storage is connected to hosts' initiators properly. In this manner, the system can automatically create hosts, host groups, LUN groups, and mapping views on the storage. The creation principles are as follows:

          • If no DR host or DR host group is selected, you need to manually map DR LUNs to the DR host when the type of protected objects is LUN.
        1. Click Fault Recovery.
        2. In the Warning dialog box that is displayed, read the content of the dialog box carefully and select I have read and understood the consequences associated with performing this operation.
        3. Click OK.
      • If the type of protected objects is VMware VM, perform the following steps:
        1. Select a recovery cluster.

          VMs will be recovered to the cluster. Select DR Site, DR vCenter, and DR Cluster.

          Upon the first network recovery, you need to set the cluster information.

        1. Select a recovery network.

          The network is used to access recovered VMs.

          • If Production Resource and DR Resource are not paired, select Production Resource and DR Resource, and click Add to the mapping view to pair them.
          • If Keep the mac unchange is selected, the system checks whether the MAC addresses of production VMs conflict with those of all VMs in the DR vCenter. If the MAC addresses do not conflict, the system retains the MAC addresses of the VMs in the DR vCenter. Otherwise, the recovery task fails.
          • If Keep the mac unchange is not selected and the mounted VM is stopped, the MAC address of the VM mounted to the vCenter remains unchanged.After the VM is started, vCenter automatically assigns a MAC address to the VM.
        1. Set Logical Port IP Address to recover hosts in the cluster to access DR file systems over the logical port.

          In scenarios where the asynchronous replication (NAS) DR solution is deployed, you need to set Access Settings.

        1. Stop non-critical VMs when executing recovery.

          In the Available VMs list, select non-critical VMs to stop them to release computing resources.

        2. Click Fault Recovery.
      • If the type of protected objects is FusionCompute VM or NAS File System, perform the following operations: In the Warning dialog box that is displayed, read the content of the dialog box carefully and select I have read and understood the consequences associated with performing this operation. Then click OK.
    • Latest data: The system uses service backup data backed up to the DR center before the disaster for recovery and starts services in the DR center.
    • Latest snapshots: Before a manual or automatic protected group execution, the system automatically creates a snapshot for placeholder VMs of the DR site. If you choose this mode for fault recovery, the system will use the latest snapshot of the placeholder VMs to register and start VMs.

  3. In the production center, check the application startup status.

    After the fault recovery is complete, check whether the applications and data are normal. If an application or data encounters an exception, contact Huawei technical support.

    • Note the following when checking the startup status of applications.
    • If the protection policies are based on applications, check whether the applications are started successfully and data can be read and written correctly.
    • If the protection policies are based on LUNs, you need to log in to the application host in the disaster recovery center, scan for disks, and start applications. Then check whether the applications are started successfully and data can be read and written correctly.

      You can use self-developed scripts to scan for disks, start applications, and test applications.

  4. Check the environment before starting reprotection.

    For Huawei Distributed Block Storage, you need to repair the production end. After it is repaired, manually unmap the LUNs on the production end.

    • Databases:

      Before reprotection, underlying storage links, remote replications, and consistency groups have been recovered.

    • FusionCompute VMs:
      • Ensure that the FusionCompute storage status at the original production site is normal and the communication between the original DR site and the original production site is normal.
      • After adding or removing disks for a protected VM, refresh the information about the VM and manually enable DR for the protected group where the VM resides in time.
    • VMware VMs:

      Ensure that underlying storage links, remote replication pairs, and consistency groups have been recovered.

  5. Perform reprotection to protect services switched to the DR center.

    After the planned migration is complete, the application system is working in the DR center and protected groups become Invalid. You must perform reprotection to recover the replication status and synchronize the data from the DR center to the production center. Then, the original DR center becomes the new production center.

    To ensure the normal running of protected groups and recovery plans after reprotection, the system automatically clears protected and recovered configurations, including startup configurations of protection policies and recovery plans, self-defined execution scripts, and self-defined execution steps. In addition, re-configuration of protection and recovery policies is recommended to ensure the continuity of DR services.

    1. On the menu bar, select Utilization > Data Restore.
    2. Select the recovery plan and click More > Reprotection on the Operation list.

      If the protected objects are VMware VMs and services are recovered from site A to site B, perform the following steps to clear redundant and incorrect data in the virtualization environment before and after the reprotection:

      1. Log in to the vCenter server at site A using vSphere Client.
      2. Close and migrate all VMs that are recovered to site B by recovery plan registration.
      3. On ESXi hosts in the cluster from which VMs are migrated, uninstall datastores used by the VMs one by one.
      4. For LUNs used by the uninstalled datastores, detach the LUNs from the datastores.
      5. In the Storage list, right-click storage devices and select Rescan All from the drop-down list one by one to ensure no datastore exists on ESXi hosts.
      6. Return to eReplication, and refresh vCenter servers and storage resources on both sites to obtain the latest VM environment information.
    3. Carefully read the content of the Confirm dialog box that is displayed and click OK to confirm the information.

    If Save user configuration data is selected, self-defined protection policies and recovery settings, such as self-defined recovery steps, will be retained. Ensure that the configuration data has no adverse impact on service running after reprotection.


Copyright © Huawei Technologies Co., Ltd.