Service Migration in the Production Center After Fault Recovery

If data or applications in the production center become unavailable due to disasters or faults, you must quickly recover the data or applications from the DR center. Before a fault recovery, at least one recovery plan test must be executed successfully.

Prerequisites

Context

If data or applications become unavailable due to disasters or faults in the production center, services will be quickly switched to and started in the DR center. After the production center recovers, the services will be switched back to it by performing planned migration.

You are advised to enable archive log protection. If archive log protection is not enabled, fault recovery may fail. If archive log protection is not enabled, you can choose Protection in the navigation tree, click a protected group, click the Protected Object tab, click Modify Protection Settings, and select Archive Log Protection to enable archive log protection.

Procedure

  1. Perform a fault recovery.

    If Huawei UltraPath has been installed on the Linux-based DR host, ensure that I/O suspension time is not 0 and all virtual devices generated by UltraPath have corresponding physical devices. For details, see the OceanStor UltraPath for Linux xxx User Guide.

    1. On the menu bar, select Utilization > Data Restore.
    2. Click the recovery plan that you want to execute and add the step of executing the user-defined script before reprotection. The operations are as follows:
      1. Click the Procedure tab, and then click Edit.
      2. Select Reprotection, click any step before the Checking the ADG environment step, and click Add Step.

        The step to be added must be performed before the step of Checking the ADG environment.

      3. Set Step Name and Script Name. The script name is the name of the user-defined script to be imported. To import a user-defined script, perform the following steps:
        1. Log in to the Linux service host where the protected object resides, obtain the script template, and customize the execution script based on the template.

          The name of a customized script contains 4 to 32 characters, including only letters, digits, underscores (_), and hyphens (-), and must start with a letter, digit, or underscore (_). The script name extension is .sh.

          The script template is stored in /xxxx/Agentless/custom/sample. The name of the script template for restoring the ADG environment is oracle_adg_recovery.sh. xxxx indicates the user-defined Agentless installation directory.

        2. Place the customized execution script in the specified path to ensure that the script can be queried in the BCManager system.

          The script is stored in /xxxx/Agentless/custom, where xxxx indicates the user-defined Agentless installation directory.

        3. Set the owner and execute permission of the user-defined execution script.

          Run the chown root:root oracle_adg_recovery.sh command to set the script owner to root:root. Run the chmod 550 oracle_adg_recovery.sh command to set the script permission to 550.

          xxx indicates the user-defined Agentless installation user.

          • If you do not set the owner and execute permission of the customized script, the script for restoring the ADG environment cannot be executed.
          • By default, the ADG environment restoration script can restore OPEN_MODE to READ ONLY WITH APPLY. To restore OPEN_MODE to READ ONLY, you need to modify the customized script.
          • If a customized script needs to be used to restore the ADG environment in the reprotection process, the customized script must be configured for all nodes in the DR center.
        4. Run the following command to set sudoer of the ADG environment restoration script:

          echo 'xxx1 ALL=(root) NOPASSWD:xxx2' >> /etc/sudoers.d/CUSTOM

          xxx1 indicates the customized Agentless installation user, and xxx2 indicates the absolute path of the ADG environment restoration script.

          Table 1 Customized script parameters

          Parameter

          Description

          ORACLE_TYPE

          Oracle deployment mode. Set this parameter to SINGLE for a single-node system and RAC for a cluster.

          ORACLE_PATH

          Oracle installation directory, which is used to store temporary files.

          RAC21

          hostname of node 1 in the standby cluster. If node 1 is deployed in a single-node system, you do not need to set RAC22.

          RAC22

          hostname of node 2 in the standby cluster.

          SYS_PASS

          Password of database user sys.

          Primary_TNS

          TNS name of the current active cluster.

          Standby_TNS

          TNS name of the current standby cluster.

          IS_SKIP

          For an RAC cluster, set this parameter to 0 for one node and to 1 for the other node.

      4. Set Step Execution Policy and Step Location, and click OK.
        • Step Execution Policy is described as follows:
          • Continue running after failure: The recovery plan execution continues for DR after this step fails.
          • Stop process after failure: The recovery plan execution stops for DR after this step fails.
        • You can select After the selected step or Before the selected step from the Step Location drop-down list box to set the execution location of the step to be added. No step can be added before the first step, and no step can be added after the last step.
    3. In the Operation area, choose More > Fault recovery.

      Before clicking Fault recovery, you can click the Procedure tab, click Edit, select Fault recovery, and enable or disable configurable steps as required. The configurable step is Test database connection (Configurable), which is enabled by default.

    4. Perform fault recovery based on protected object types.
      • If the protected object type is Oracle, perform the following steps:
        1. Select DR Site.
        2. Select Host (Group) > Available DR Hosts or Host Groups.
          • If the storage array used at the DR site is flash storage 6.1.6 or later, the to-be-recovered host selected by a user can belong to only one host group on the storage array, and the host group can belong to only one mapping view. In addition, the remote replication secondary LUN corresponding to the storage LUN used by the protected application and the LUN to which the redo logs and archive logs of the DR cluster belong must belong to the same LUN group, and the LUN group and the host group must belong to the same mapping view. If the storage array version is flash storage 6.1.6 or later, deselect Enable Inband Command to change the mapping view attribute after the mapping view is created.
          • If the storage array is flash storage 6.1.6 or later, automatic host adding and storage mapping are provided. Ensure that the storage is connected to hosts' initiators properly. In this manner, the system can automatically create hosts, host groups, LUN groups, and mapping views on the storage. The creation principles are as follows:

        1. Click Fault Recovery.
        2. In the Warning dialog box that is displayed, read the content of the dialog box carefully and select I have read and understood the consequences associated with performing this operation.
        3. Click OK.

          If the fault recovery fails, you can retry the fault recovery from the failure point.

  2. In the production center, check the application startup status.

    After the fault recovery is complete, check whether the applications and data are normal. If an application or data encounters an exception, contact Huawei technical support.

    • Note the following when checking the startup status of applications.
    • If the protection policies are based on applications, check whether the applications are started successfully and data can be read and written correctly.

  3. Perform reprotection to protect services switched to the DR center.

    After the planned migration is complete, the application system is working in the DR center and protected groups become Invalid. You must perform reprotection to recover the replication status and synchronize the data from the DR center to the production center. Then, the original DR center becomes the new production center.

    • To ensure that the protection and restoration configurations before reprotection do not affect the running of the protected group and recovery plan after reprotection, the system automatically clears the protection and restoration configurations after reprotection. After performing reprotection, reconfigure the protection and recovery policies to ensure that DR services are running properly.
    • If the reprotection fails, you can retry the reprotection from the failure point.
    1. On the menu bar, select Utilization > Data Restore.
    2. Select the recovery plan and click More > Reprotection on the Operation list.
    3. Carefully read the content of the Confirm dialog box that is displayed and click OK to confirm the information.


Copyright © Huawei Technologies Co., Ltd.