CEE 6.6 SW Update and Rollback
Cloud Execution Environment 6

Contents

1Introduction
1.1Scope
1.2Target Group
1.3Prerequisites
1.4Limitations

2

Overview
2.1Component Update Descriptions
2.2Procedure Durations

3

Update Orchestration Configuration File

4

Procedure
4.1Mandatory Preparation Stage
4.2Update Stage
4.3Common Concluding Stage

5

Error Handling
5.1Error Handling for Failed CSC Update
5.2Error Handling for Failed Orchestrated Update
5.3Error Handling for Failed Atlas Upgrade

6

Additional Operations
6.1Checking Update State
6.2Checking VM State
6.3Listing Nodes
6.4Identifying the Active and Cold Standby Fuel Hosts
6.5Insert Forwarding Rule on vCICs
6.6Generating New SSH Key for Compute Host Hosting vCIC or vFuel
Appendix

7

update_groups.yaml Examples

8

NIC Firmware Version Check and Upgrade

Reference List

1   Introduction

This document is used for performing a Cloud Execution Environment (CEE) software update and rollback between the following CEE 6 versions:

1.1   Scope

This document describes the following:

Note:  
Update of CEE is only supported to the component versions and on the update paths described in the Product Revision Information for Cloud Execution Environment document for the specific CEE release, Reference [7]

This document describes the procedures for the update and rollback of the following components:

Although they are included in the flow description, the update and rollback procedures for the following components are described in separate documents:

1.2   Target Group

This document is aimed at skilled professionals from the following groups:

1.3   Prerequisites

1.3.1   Tools and Equipment

This section describes the tools needed for some or all of the procedures described in this document.

1.3.1.1   User Access

root access to vFuel is required. The procedures below can only be executed as root.

1.3.1.2   Hardware and Software

The procedures in the document have the following hardware prerequisites:

The procedures in the document have the following firmware prerequisites:

Before starting the update, make sure that the following software are available:

1.3.1.3   Remote FTPS Server for Storing Backups

For rollback purposes, the vCIC and vFuel images and additional files must be backed up on a remote server. The remote server must fulfill the following requirements:

1.3.2   Data

The following information must be available:

1.3.3   Conditions

The following conditions apply to all procedures described in this document:

The following conditions apply to the different phases of the update procedures:

Phase

Conditions

Update

There must be no active alarms in the system when starting the update process.

Rollback

The rollback procedure requires a backed up copy of the following:


  • The vCIC VM images with the respective XML files and XML templates

  • The active vFuel VM image, and dump XML file generated from the vFuel VM image by the user


For more information, see Section 4.1.

The individual procedures can have additional conditions. See the relevant subsections of Section 2.1 for any additional conditions of the individual procedures.

1.4   Limitations

CEE SW update and rollback is only verified on the following hardware platforms:

Note:  
Update and rollback with SDN TI is supported with limitations. Refer to the document Limitations and Workarounds for Cloud Execution Environment (CEE) 6.6, Reference [5].

The following limitations apply to all procedures described in this document:

Table 1 shows the limitations that apply to the individual procedures:

Table 1    Limitations for Update and Rollback Procedures

Procedure

Limitation

Update

  • When the CEE software update is running, the OpenStack API service is unavailable for about one minute during the reboot of each vCIC.

  • CM-HA fencing is automatically turned off during CEE software update to prevent unnecessary invocation. If fencing was initially enabled and serial update method is used, fencing is turned off each time a compute host is updated, and turned back on once the update of that host is successfully finished. If fencing was initially enabled and parallel update method is used, fencing is turned off when starting the parallel update, and turned back on once the update of all hosts is successfully finished.

  • Reconfiguration of the CSS CPU reservation mode is not allowed during the update procedure. Changing the css_mode parameter in the config.yaml can cause severe system malfunction. For example, changing normal-perf to high-perf CSS mode can result in insufficient CPU resource reservation for the tenant VMs on the affected compute hosts, making them unable to boot up after reconfiguration.

  • If the different-host scheduler hint is used, the VM forcemove logic does not operate as configured. (1)

  • In SDN TI configuration, vCICs and compute hosts hosting a vCIC must be updated in separate runs, with a separate update_groups.yaml file containing information about the respective vCICs and vCIC computes only.

  • During the execution of the update_orchestrator.sh script, before the update of a compute host, the VMs located on the compute host are acted upon according to the defined High Availability (HA) policy. If multiple compute hosts are updated in one session, VMs can be migrated multiple times, and restarted multiple times during the procedure, depending on the defined migration policy. If compute hosts are updated one by one, the amount of multiple migration of VMs can be decreased. For unmanaged VMs, manual migration can be necessary. For more information on HA policies, refer to OpenStack Compute API in CEE.


If VMs are migrated during the update procedure, rollback is not possible.

Rollback

  • Rollback is only possible if no VMs are migrated during update. For more information, see Section 5.2.1.

  • Due to the transfer of the backup files between the external server and the CEE region, bandwidth is affected, and transfer time must be taken into consideration.

  • Rollback of the compute hosts hosting vCIC or vFuel is not possible. If update of any of these compute hosts fails, redeployment of CEE can be necessary.

  • Rollback of compute hosts does not preserve data stored on the ephemeral disks of the VMs hosted on the rolled back compute host.

(1)  The CEE update orchestrator attempts to forcemove VMs from the compute hosts to be updated to compute hosts not affected by the update. If a VM has the different_host=<other_vm> scheduler hint specified, then the hint is ignored, and the two VMs can be moved to the same compute host, violating the requested behavior. Therefore, Manual VM migration can be necessary.


2   Overview

The CEE Update framework has the following use cases:

Update of any component of CEE is only supported to the component versions described in the Product Revision Information document , Reference [7]

The update is orchestrated using the update_groups.yaml file, and is executed using the update_orchestrator.sh script, unless described otherwise at the procedure description. All nodes are restarted during the procedure; however, it is possible to perform the update in multiple sessions, by preparing multiple versions of the update_groups.yaml and executing the update script multiple times. For more information on update orchestration configuration, see Section 3.

The update process of CEE is performed according to the flow described in Figure 1fig-update_flow_specific_eps.

Figure 0   Flow Overview

Figure 1   Flow Overview

Note:  
Also consider the conditions to the procedures, see Section 1.3.3.

Update consists of the following phases:

Mandatory Preparation Stage

This phase consists of the following:

Procedures are described in Section 4.1.

Component Update Stage

This phase consists of the following:

For more information on component update, see Section 2.1.

Update must strictly adhere to the following update order:

  1. Cloud SDN Controller (CSC) upgrade, if the system is using tightly integrated SDN
    1. SDNc Fuel plugin
    2. L2GW Fuel plugin
    3. BGPVPN Fuel plugin
  2. vFuel update
  3. Update of the ScaleIO server cluster, if the system is using managed ScaleIO
  4. vCIC update
  5. Health check
  6. Update of compute hosts not hosting vFuel or vCIC
  7. Update of compute hosts hosting vFuel and the vCICs

Mandatory Concluding Stage

This phase consists of the following:

The procedures are described in Section 4.3.

2.1   Component Update Descriptions

2.1.1   CSC Update

Note:  
This procedure is only applicable to CEE regions using SDN TI.

Update of CSC includes the following:

  1. SDNc Fuel plugin
  2. L2GW Fuel plugin
  3. BGPVPN Fuel plugin

The update of the CSC Fuel plugins is a manual procedure, not orchestrated by the update orchestrator script. For more information, refer to the CSC document Cloud SDN Upgrade and Rollback, Reference [3].

Affected Nodes

Limitations

For any limitations, refer to the CSC document Cloud SDN Upgrade and Rollback, Reference [3].

2.1.2   vFuel Update

vFuel is updated automatically by the update orchestrator script.

Affected Nodes

Orchestration Options

If the update is to be interrupted after updating vFuel, the update_orcestrator.sh must be run with the --exit-after-fuel-update switch.

The required steps for the update of the component are described in Section 4.2.4.

2.1.3   ScaleIO Update

Note:  
This procedure is only applicable to CEE regions using managed ScaleIO.

The ScaleIO servers are updated automatically by the update orchestration script.

Affected Nodes

Orchestration Options

If only the Fuel plugins are updated, run the orchestrator script using the --plugin-update option. This option skips the vFuel update step.

For updating the ScaleIO nodes, serial update mode must be used.

The required steps for the update of the component are described in Section 4.2.5.

2.1.4   vCIC Update

Affected Nodes

Orchestration Options

When updating vCICs, serial update mode must be used.

In case of a system with SDN TI, each vCIC must be updated in a separate run with an update_groups.yaml containing information about the selected vCIC only.

The required steps for the update of the component are described in Section 4.2.6.

2.1.5   Compute Host Update

Compute hosts are updated automatically by the orchestrator script.

Compute host update includes the update of the integrated Cloud SDN Switch (CSS) component.

Compute host update includes the update of the integrated HDS Agent, if the system is based on HDS.

In case of a system with SDN TI, each compute host hosting a vCIC must be updated in a separate run with an update_groups.yaml containing information about the selected compute host only.

Affected Nodes

Orchestration Options

When updating the compute hosts hosting vCICs and the compute host hosting vFuel, serial update mode must be used.

The required steps for the update of the component are described in Section 4.2.8.

2.1.6   Atlas Update

Affected Nodes

The update and rollback procedures for Atlas are described in the respective Operating Instructions.

2.2   Procedure Durations

The complete time required for the update and rollback procedures can be estimated using the following approximate durations:

Update

Note:  
Procedure times for compute hosts includes the update time of the included CSS ort HDS Agent components.

Rollback

3   Update Orchestration Configuration File

In the update procedure, the update_groups.yaml specifies the nodes to be updated, the update order, and the update method (serial or parallel) to be used.

This section describes the preparation of the update_groups.yaml before the update procedure.

The update_groups.yaml follows the YAML Specification, Reference [8].

The update_groups.yaml can be used to perform update procedures on all nodes of the region, a subset of nodes, or individual nodes as well. If update is performed in multiple sessions, the overall update order must strictly follow the update order described in fig-update_flow_specific_eps Figure 1. If update is performed in multiple sessions, the update_groups.yaml must be changed before each execution of the update_orchestrator.sh, to only contain the nodes that are involved in the particular session.

Note:  
If no update_groups.yaml file is present in the /mnt/cee_config directory, all nodes in the CEE region are updated in serial mode.

The CEE software tarball contains the CEE_RELEASE/update_groups.yaml.template file, which can be used as a template when creating the /mnt/cee_config/update_groups.yaml update configuration file. The template file contains predefined sections for the node types to be updated. The template file also contains commented instructions on preparing the upgrade_groups.yaml file.

The update_groups.yaml consists of sections. Each section defines an update phase, that is, a subset of nodes to be updated together. A section must be defined, even for a single-node update phase or session.

Note:  
If the update stops and needs to be restarted, the already updated nodes must be removed from the update_groups.yaml file. To check the update progress, see Section 6.1.

Each section must have the following structure:

- type: <mode>
  nodes:
    - <node_1_name>
    - <node_2_name>
    - <node_3_name>
...

type:

The type key defines the update mode. <mode> can have the following values:

If parallel update mode is used for compute hosts, the number of hosts that can be updated at the same time must be defined. Depending on HA policies, some or all running VMs must be migrated from the nodes updated concurrently. Therefore, the size of the group is determined by the size of the region and the available free resources on the remaining compute hosts.

For example, if the free capacity is enough to host all VMs currently located on two compute hosts, the maximum size for parallel update is two.

nodes:

The nodes list contains the list of the nodes to be updated in each phase. For the value for the <node_name> variable, see name column in the printout of the fuel node command.

Examples

For examples for configuration files for different update procedures, see Section 7.

Editing YAML Files in Windows

If the configuration file is edited in Windows, it is likely that the file contains CRLF characters. To remove CR characters (Linux only uses LF), run the following command after transferring the file to vFuel:

$> sed -i.bak -e 's/\r//g' <FILE.NAME>

A backup of the original file with the name <FILE.NAME>.bak is also created.

4   Procedure

The update procedures have the same preparation and concluding steps if one, or multiple, or all components are updated. Do the following:

  1. Perform the steps described in Section 4.1.
  2. Perform the required procedures from Section 4.2.
  3. Perform the steps described in Section 4.3.

4.1   Mandatory Preparation Stage

Before performing any of the update procedures, do the following:

  1. Perform CEE health check as described in the document Health Check Procedure.
  2. Synchronize the active and the cold standby Fuel VMs as described in Fuel Synchronization.
  3. Create the following backups and store them in a persistent storage outside of the CEE region:
    • CIC domain data backup as described in CIC Domain Data Backup
    • Atlas backup as described in Atlas Backup, if Atlas is used in the system
    • Manual backup of the three CIC VM images and the Fuel VM image and configuration files, as described in Backup and Restore Overview
      Note:  
      The CIC and Fuel VM backup is a prerequisite of rollback, and must be executed before update.

    • In case of a configuration with SDN TI, the CCM routes are missing after vFuel is shut down and started. Add the CCM routes by executing the below script:

      [root@fuel ecs-fuel-utils]# ./add_route_for_ccm.sh /mnt/cee_config/ config.yaml detect /etc/fuel/astute.yaml
      [root@fuel ecs-fuel-utils]# ./⇒
      add_route_for_ccm.sh /mnt/cee_config/ ⇒
      config.yaml detect /etc/fuel/astute.yaml

      Example:

      [root@fuel ecs-fuel-utils]# ./add_route_for_ccm.sh /mnt/cee_config/config.yaml detect /etc/fuel/astute.yaml

      add_route_for_ccm.sh.info: Adding host route to CCM API at 10.33.216.4 via Kickstart server

      add_route_for_ccm.sh.info: Verifying connectivity to CCM API at 10.33.216.4

      add_route_for_ccm.sh.info: Verified connectivity to 10.33.216.4 (0)

  4. Log on to vFuel as root using SSH. For more information, refer to the CEE Connectivity User Guide.
    Note:  
    Connectivity to the vCICs will be lost during the update.

  5. Copy all relevant plugins to the /var/www/nailgun/ericsson/fuel-plugins/ directory on vFuel:
    1. Move any old plugin files to a backup directory using the following commands:

      mkdir -p /var/www/nailgun/ericsson/fuel-plugins/backup
      mv /var/www/nailgun/ericsson/fuel-plugins/<plugin_file> /var/www/nailgun/ericsson/fuel-plugins/backup
      mkdir -p var/www/nailgun/ericsson/fuel-plugins/backup
      mv /var/www/nailgun/ericsson/fuel-plugins/<plugin_file>⇒
       /var/www/nailgun/ericsson/fuel-plugins/backup

      where <plugin_file> corresponds to the following values:

      Component

      Value

      CSS Fuel Plugin

      ericsson_css*

      ScaleIO Fuel Plugin

      scaleio-2*rpm

      HDS Agent Fuel Plugin

      ericsson_hds_agent-*rpm

    2. Transfer the new plugin files to /var/www/nailgun/ericsson/fuel-plugins/ on vFuel.
    3. If the plugin file is packaged in a .tar file, unpack the file:

      tar -xvf <plugin_file_name>.tar

    4. If applicable, validate the integrity of the plugin .rpm file by comparing the outputs of the following commands with the contents of the respective .md5 or .sha1 file:

      md5sum <plugin>.rpm

      sha1sum <plugin>.rpm

      If the checksums do not match, contact the next level of maintenance support.

    5. Make sure that the plugin rpm file used for the update is the last item listed in the printout of the ls <plugin_name> command:

      ls /var/www/nailgun/ericsson/fuel-plugins/<plugin_name> |tail -1
      

      ls /var/www/nailgun/ericsson/fuel-plugins/<plugin_name> |tail -1
      

      where <plugin_name> corresponds to the following:

      Component

      Value

      CSS Fuel Plugin

      ericsson_css-*

      ScaleIO Fuel Plugin

      scaleio-*

      HDS Agent Fuel Plugin

      ericsson_hds_agent-*

  6. Transfer the CEE tarball to the /var/tmp directory on vFuel.
  7. Extract the tarball:

    tar -xvf <tarball_name>

  8. Copy the update_orchestrator.sh file from the CEE software tarball to the /root directory on vFuel:

    cp /<update_orchestrator_path>/update_orchestrator.sh /root
    cp /<update_orchestrator_path>/update_orchestrator.sh⇒
    /root

  9. Verify that there is sufficient disk space in the root directory for bootstrap image preparation. The minimum disk space required for bootstrap image preparation in the root directory on vFuel is 2 GiBs.

    Check the amount of free space in /:

    df -h /

    If there is not enough free space in the / directory, free some space up before starting the update.

  10. To ensure that the update process is not interrupted, start a screen session and run the commands in it:

    cd ~; screen -r update -R -L

    Later during the update, if a node is rebooted and the connection is lost towards vFuel, log back to vFuel with the steps above, and reattach the screen session with the command:

    screen -r update

    Note:  
    The screen session can only be reattached after the node rebooted and is back online.

    After exiting the screen session, the screen log file is available in ~/screenlog.0.


  11. Copy the prepared update_groups.yaml in the /mnt/cee_config directory on vFuel:

    cp /<update_groups_path>/update_groups.yaml /mnt/cee_config
    cp /<update_groups_path>/update_groups.yaml ⇒
    /mnt/cee_config

4.2   Update Stage

The relevant procedures described in this section must be executed in the order they are presented. Only perform the procedures that correspond to the relevant flow in section Section 2

Depending on the combination of the components to be updated, skip individual procedures as applicable. For example, if compute hosts are updated, CSS and HDS Agent are updated automatically, and the respective procedures are not required.

4.2.1   Update CSC

Note:  
This procedure is only applicable to CEE regions using SDN TI.

The update of the CSC Fuel plugins is a manual procedure not orchestrated by the CEE update orchestrator script. For the update and rollback procedures of the CSC Fuel plugins, refer to the SDN document Cloud SDN Upgrade and Rollback, Reference [3].

If the update of the component fails, continue with Section 5.1.

4.2.2   Preparation for the Orchestrated Update

Do the following:

  1. Create backups of the configuration YAML files:
    mkdir -p /mnt/cee_config/backup-<date>
    cp /mnt/cee_config/*.yaml /mnt/cee_config/backup-<date>

  2. Remove all servers in State: discover from the Fuel database. Do the following:
    1. Check the state of all servers according to the procedure described in Section 6.3.
    2. Lock the servers in State: discover:

      setadminstate <shelf-id> locked --blade <blade-id>

      An example of the command is the following:

      setadminstate 0 locked --blade 2
    3. Remove the servers from the Fuel database:

      fuel node --node-id <node-id> --delete-from-db --force

      An example of the command is the following:

      fuel node --node-id 8 --delete-from-db --force
    4. In config.yaml, comment out the definitions related to the servers. The following is an example:
      #        -
      #          id: 2
      #          nic_assignment: *BSP_GEP5_nic_assignment
      #          reservedHugepages: *BSP_GEP5_reservedHugepages
      #          reservedCPUs: *auto_reservedCPUs
      
    5. Remove the entries related to the servers from /mnt/cee_config/update_groups.yaml.
  3. If there is no active screen session, start a screen session and run the commands in it:

    cd ~; screen -r update -R -L

    Later during the update, if a node is rebooted and the connection is lost towards vFuel, log back to vFuel with the steps above, and reattach the screen session with the command:

    screen -r update

    Note:  
    The screen session can only be reattached after the node rebooted and is back online.

    After exiting the screen session, the screen log file is available in ~/screenlog.0.


  4. Check if there are any changes in the config.yaml between the CEE releases. If necessary, update the config.yaml using the new templates bundled with the ISO image.
    Note:  
    Only update the config.yaml according to configuration changes between the CEE releases. Reconfiguration of the system during update (for example, reallocation of vCPUs) is not possible in CEE.

    Reconfiguration of CSS CPU reservation mode by changing the css_mode parameter in the config.yaml is not allowed during the update procedure. For more information, see Table 1.

    If the NeLS server connection and certificates are not configured on the system before the update, licensing must be configured only after update, with the respective post-installation step.

    If the NeLS server connection settings and certificates are configured already before the update, the configuration in the config.yaml must correspond to the actual configuration, and the certificate files must be in place. If the configuration and the values in config.yaml are not correct, the update fails. For more information, refer to the Configuration File Guide.

    Verify that the configuration of mandatory Fuel plugins corresponds to the Fuel Plugin Configuration Guide.

    Verify that the user password "anon" in the LDAP section of the config.yaml is the same as in the base CEE release. If the password is not defined in the base CEE release, ignore it in the new config.yaml as well and proceed with the update procedure. The following is an example of the LDAP section in the config.yaml:

      idam:
        ldap:
          basedn: dc=cee,dc=ericsson,dc=com
          rootdn: cn=admin
          rootpw: ''
          anonymous_binddn: cn=anon
          anonymous_bindpwd: 'Xuy@a41EDi@a87u'
    


  5. Make sure that the /mnt/cee_config/update_groups.yaml file is available and specifies the nodes to be updated, and strictly follows the correct update order, see fig-update_flow_specific_eps Figure 1.
    Note:  
    If /mnt/cee_config/update_groups.yaml does not exist, CEE update will be executed on all hosts of the CEE region, in serial mode.

4.2.3   Execute the Orchestrated Update Script

  1. Start the update script by executing the following commands:

    /<path_to_update_orchestrator>/update_orchestrator.sh <path_to_cee_iso>
    /<path_to_update_orchestrator>/update_orchestrator.sh ⇒
    <path_to_cee_iso>

    An example of the command with the locations described in this procedure is the following:

    /root/update_orchestrator.sh /var/tmp/<cee_iso>

    Note:  
    If the update process is required to stop after vFuel update, execute the script using the --exit-after-fuel-update option.

    If update is performed in multiple sessions, the update_groups.yaml must be changed before each execution of the update_orchestrator.sh to only contain the nodes that are updated in the particular session. The update order described in this section must be strictly followed also if update is performed in multiple sessions.

    Perform health check procedure as described in the document Health Check Procedure, before starting the update of the compute hosts, as rollback possibility of the compute hosts is limited.


  2. If the update orchestrator script stops, see Section 5 for the error handling procedures.

    If all nodes have been updated, continue with Section 4.3.

4.2.4   Update vFuel

The CEE update process initiated by the orchestrator script always starts with the update of vFuel. The update of the vFuel node is not defined in update_groups.yaml. If the vFuel software version already corresponds to the vFuel software version included in the CEE release, the update orchestrator skips the update of the vFuel node.

If the system is using SDN TI, the --exit-after-fuel-update switch must be used. After vFuel update, but before updating any further components, do the following:

  1. Open the /usr/share/ericsson-orchestration/playbooks/update-fuel-deployment.vars.yml using nano or similar.
  2. By prepending #, comment out the following line:

    - odl_neutron_config

    An example of the commented line is the following:

     #- odl_neutron_config

  3. Save the changes and exit the editor.
  4. Continue with the orchestrated update.

If the update orchestrator script stops, see Section 5 for the error handling procedures.

If the update is to be interrupted after updating vFuel, the update_orcestrator.sh must be run with the --exit-after-fuel-update switch.

4.2.5   Update ScaleIO

The update process of the ScaleIO plugin and the ScaleIO servers is automatically performed by the update_orchestrator.sh script if the nodes are specified for update in the update_groups.yaml.

If the update orchestrator script stops, see Section 5 for the error handling procedures.

4.2.6   Update vCIC

The update process of the vCICs is automatically performed by the update_orchestrator.sh script if the nodes are specified for update in the update_groups.yaml.

In case of a system with SDN TI, the Data Center Gateway (DC-GW) route is missing for the Northbound Interface (NBI) after the update. Perform the following corrective steps:

  1. From config.yaml, note down the IPs for bgp_gateway and bgp_neighbour. The following is an example:
    bgp_gateway: [10.33.199.193, 10.33.199.194]
    bgp_neighbour: [10.5.2.1, 10.5.2.2]
  2. Execute the following command on the updated vCIC:
    ip r a <bgp_neighbour_ip1> via <bgp_gateway_ip1> dev br-sdnc-sig
    ip r a <bgp_neighbour_ip2> via <bgp_gateway_ip2> dev br-sdnc-sig

    The following is an example:

    ip r a 10.5.2.1 via 10.33.199.193 dev br-sdnc-sig
    ip r a 10.5.2.2 via 10.33.199.194 dev br-sdnc-sig
    
  3. Update /etc/network/interfaces.d/ifcfg-br-sdnc-sig on the updated vCICs with the lines in bold:
    # *********************************************************************
    # This file is being managed by Puppet. Changes to interfaces
    # that are not being managed by Puppet will persist;
    # however changes to interfaces that are being managed by Puppet will
    # be overwritten.
    # *********************************************************************
    
    auto br-sdnc-sig
    iface br-sdnc-sig inet static
    address 10.33.231.149/29
    
    post-up route add -host <bgp_neighbour_ip1> gw GW-IP1
    post-down route delete -host <bgp_neighbour_ip1> gw GW-IP1
    post-up route add -host <bgp_neighbour_ip2> gw GW-IP2
    post-down route delete -host <bgp_neighbour_ip2> gw GW-IP2
  4. Perform a CEE health check. On vFuel, execute the healthcheck.py script. For more information, refer to the Health Check Procedure.
  5. Perform an SDN health check. For more information, refer to the Health Check Monitoring Guideline, Reference [4].

If the update orchestrator script stops, see Section 5 for the error handling procedures.

4.2.7   Health Check

It is strongly recommended to check that the system is healthy before performing compute host update. If the update procedure fails after compute host update is started, the options for rollback and recovery of the system are limited. Perform health check as described in the Health Check Procedure.

4.2.8   Compute Host Update

The update process of the compute hosts is automatically performed by the update_orchestrator.sh script if the nodes are specified for update in the update_groups.yaml.

In case of a system with SDN TI, if the Data Plane Nodes (DPNs) or Tunnel End Points (TEPs) are missing, or SDN services are down, do the following:

  1. After the update for each compute host is completed, perform a health check for the SDN cluster.
    1. On vFuel, execute the healthcheck.py script. For more information, refer to the Health Check Procedure.
    2. On vFuel, execute the cee_sdnc_verify_setup_sanity.sh script, as described in the section about quick network health check in the Health Check Monitoring Guideline, Reference [4].
  2. Restart the SDN services by issuing the following command on a vCIC as root:

    csc_cluster reboot

Note:  
The command gracefully restarts SDN cluster services. Before and during restart, tenant traffic disturbance is expected. For more information about the csc_cluster reboot command, refer to the section about Cloud SDN services not being operational after two nodes failure recovery in the Cloud SDN Troubleshooting Guide, Reference [2].

If the update orchestrator script stops, see Section 5 for the error handling procedures.

The orchestrated update procedure is completed. In case the CEE region uses Atlas, proceed with Section 4.2.9. Else, proceed with Section 4.3.

4.2.9   Atlas Update

Note:  
This procedure is only applicable to CEE regions using Atlas.

The update of Atlas is a manual procedure not orchestrated by the CEE update orchestrator script. For the update and rollback procedures of Atlas, refer to the Atlas SW Upgrade document.

4.3   Common Concluding Stage

  1. Verify that the update is performed successfully. Perform health check according to the Health Check Procedure.
  2. Verify the version of CEE by executing the following command on the vFuel master node:

    cat /etc/cee_version.txt

    The output has the following format:

    RELEASE=CEE CXC1737883_4-<build_number>
    NAME=Mitaka on Ubuntu 14.04
    VERSION=R6-<r-state>-<specific_build_number>-9.0

    Verify the CEE version by comparing the <build_number> and the <r-state> to the Product Revision Information for Cloud Execution Environment (CEE), Reference [7].

    An example output is:

    [root@fuel ~]# cat /etc/cee_version.txt
    RELEASE=CEE CXC1737883_4-1918
    NAME=Mitaka on Ubuntu 14.04
    VERSION=R6-R7B06-5384594593-9.0
    

    If verification fails, see Section 5.

  3. Verify the version of CEE on all vCICs and compute hosts by executing the following command on the vFuel master node:

    for n in fuel $(fuel node | awk -F '|' '$7 ~ /controller|compute/ {print $3}'); do echo ${n}; ssh -o LogLevel=error ${n} 'cat /etc/cee_version.txt'; done
    for n in fuel $(fuel node | awk -F '|' '$7 ~ ⇒
    /controller|compute/ {print $3}'); do echo ${n}; ssh ⇒
    -o LogLevel=error ${n} 'cat /etc/cee_version.txt'; done

    Verify the CEE version by comparing the <build_number> and the <r-state> to the Product Revision Information for Cloud Execution Environment (CEE), Reference [7].

  4. Synchronize the active and the cold standby vFuel VM as described in the document Fuel Synchronization.
  5. After update, there can be an active NeLS Server Communication Problem alarm, because the NeLS server is not configured and not available.

    To configure the connection to the NeLS server, follow the instructions in the Runtime Configuration Guide. If the alarm does not clear, follow the instructions in the NeLS Server Communication Problem alarm OPI.

  6. If applicable, exit the screen session:

    exit

  7. Verify, and if applicable, update the OpenStack administrator password in Keystone on vFuel and the vCICs, as described in the relevant sections of the document Security User Guide, as manual changes since deployment are overwritten at update.
  8. For disaster recovery purposes, the installation media used for the update must be backed up, outside the CEE region. For more information, refer to the document Disaster Recovery.
  9. Verify that each node is updated, see Section 6.1.

5   Error Handling

Note:  
Rollback should only be performed if the update procedure cannot be recovered using the procedures described in section Section 5. The failure of the rollback procedure can result in a state that can only be recovered by redeployment. Notify the next level of customer support before attempting the rollback procedures.

5.1   Error Handling for Failed CSC Update

Note:  
Before attempting rollback of the CSC Fuel plugins, contact next level of support.

This procedure is only applicable to CEE regions using tightly integrated SDN (SDN TI).


Do the following:

  1. Attempt the downgrade of the CSC Fuel plugins using the manual procedure described in the SDN document Cloud SDN Upgrade and Rollback, Reference [3].
  2. If the downgrade procedure for the CSC Fuel plugins fails, the vCICs can be restored to the state before the update using the backed up vCIC images and configuration files. For more information, see Section 5.2.2.2.

5.2   Error Handling for Failed Orchestrated Update

In case any error occurs during the update procedures orchestrated by CEE, follow these steps to repair:

  1. Check the following logs:
    1. /var/log/ansible.log
    2. /var/log/puppet-error.log and /var/log/puppet.log of the failed systems according to ansible.log
    3. The logs of the failed systems according to ansible.log
    4. Update execution log, located at /var/log/update_orchestrator.log

      The update_orchestrator.log can contain very long lines, that can cause editors to crash. To reformat the log to a readable format, execute the following command:

      /<path_to_update_orchestrator>/update_orchestrator.sh --prettify-log <filename>
      /<path_to_update_orchestrator>/update_orchestrator.sh ⇒
      --prettify-log <filename>
      , where <filename> is the filename for the reformatted log. If no filename is specified, the reformatted log file is stored under the filename update_orchestrator.pretty.log.

      The reformatted log file is stored in the /var/log/ folder.

    5. Update procedure progress, stored at /var/tmp/update_orchestrator.state
  2. If the update orchestrator fails at the "call ansible update" step, forcemoving one of the VMs failed.

    Do the following:

    1. Verify that the last executed task is "Forcemove nova instances". On vFuel as root, execute the following command:

      grep 'TASK \[' /var/log/update_orchestrator.log | tail -n 1
      grep 'TASK \[' /var/log/update_orchestrator.log⇒
       | tail -n 1

      The expected printout is the following:

      <date> TASK [Forcemove nova instances] 
      ************************************************.

      If the printout is different from the expected printout, continue with Step 3.

      If the printout corresponds to the expected printout, continue with this procedure.

    2. Log on to a vCIC as root. For more information, refer to the CEE Connectivity User Guide.
    3. Load OpenStack admin credentials:

      source ~/openrc

    4. List the VMs with status RESIZE:

      nova list | grep RESIZE

      If there are no VMs with Status RESIZE and Task State resize_prep, continue with Step 3.

      If there are any VMs with Status RESIZE and Task State resize_prep, continue with this procedure.

    5. Reset the state of each affected VM one by one:

      nova reset-state <vm_id> --active

    6. Restart RabbitMQ on the vCIC:

      crm resource p_rabbitmq-server restart

    7. Check the update state as described in Section 6.1, and record any nodes with the status finished. These nodes have been updated successfully.
    8. Remove any already updated nodes from the update_groups.yaml file, based on the update state. For more information on the update_groups.yaml file, see Section 3.
    9. Execute the orchestrator script again, and proceed with the update procedure, see step 6 in the Preparation for the Orchestrated Update section in the CEE Update and Rollback Guide.
  3. Perform data collection according to the Data Collection Guideline.
  4. Fix the possible problems and rerun the update towards the failing node.
  5. Contact the next level of support.
  6. If applicable, attempt rollback using the procedures described in Section 5.2.1.

5.2.1   Rollback

The rollback procedure is used to restore the system to the CEE version used before the update, if the update procedure fails. The rollback procedure includes the rollback of all of the updated nodes.The rollback of the components must strictly follow the following order:

  1. Rollback of vFuel using the backed up VM image and configuration files
  2. Rollback of the vCICs using the backed up VM image and configuration files
  3. Repair of the compute hosts not hosting vFuel or vCIC, using the server replacement procedure

Rollback of ScaleIO servers is not supported by Dell EMC.

vCIC rollback is achieved through restoring the vCIC VM from a backed up image, databases are restored to the state at the time of the update as well. Databases include information on the location of each VM, that is, which compute host is hosting which VM. Rollback is only possible if the actual VM locations match the databases, therefore rollback is only possible if VMs are not migrated during the update.

Compute host rollback is achieved, using the server replacement procedure, as described in the document Server Replacement. Hardware replacement is not required. After the repair procedure, the compute host will be running the CEE version corresponding to the version of the vFuel node used for the repair.

Note:  
Contact next level of support before attempting compute rollback.

Compute host rollback is only possible if VMs were not migrated during update.

Compute hosts hosting vCIC or vFuel cannot be rolled back using server replacement. If the update of a vFuel or a vCIC host fails, redeployment of the CEE region is required.


The following workflow shows an overview of the rollback procedure, including all rollback phases:

Figure 2   Rollback Procedure Overview

Figure 2  

Start the procedure with Section 5.2.2.1

5.2.2   Rollback Procedures

5.2.2.1   vFuel Rollback

Do the following:

  1. If not performed earlier in the rollback procedure, insert forwarding rule on all three vCICs, as described in Section 6.5.
  2. Log on to the compute host hosting the active vFuel VM as root using SSH and the data collected in Section 6.4. For more information, refer to CEE Connectivity User Guide.
  3. Shut down the active vFuel VM by executing the following command:
    virsh shutdown fuel_master

    The expected printout is the following:
    Domain fuel_master is being shutdown.

  4. Verify that the active vFuel VM is shut down by executing the command virsh list --all. For more information, see Section 6.2.
  5. Undefine the active vFuel VM by executing the following command:
    virsh undefine fuel_master

    The expected printout is the following:

    Domain fuel_master has been undefined
    

  6. Verify that the active vFuel VM has been undefined by executing the command virsh list --all. For more information, see Section 6.2.

    If the vFuel VM has been undefined, it is not listed in the printout.

  7. Remove the active vFuel VM by executing the following command:
    rm /var/lib/nova/<fuel_vm_image_file>

    An example of the command is the following:
    rm /var/lib/nova/fuel_master.qcow2

  8. Add a route between the host hosting vFuel and the external FTPS server by executing the following command:

    route add <ftps_server_ip> gw <vcic_ip>

    The variables are the following:

    • <ftps_server_ip> is the IP address of the external FTPS server.
    • <vcic_ip> is the IP address of a vCIC that is operational or in maintenance mode.
  9. Copy and transfer the dump XML file from the external FTPS server described in Section 1.3.1.3 to /var/lib/nova by executing the following command:

    curl -k --ftp-ssl ftp://<username>:<password>@<ftps_server_ip>//<source_path>/<file_name> > /var/lib/nova/<file_name>
    curl -k --ftp-ssl ftp://<username>:<password>@⇒
    <ftps_server_ip>//<source_path>/<file_name> > ⇒
    /var/lib/nova/<file_name>

    The variables are the following:

    • <file_name> is the name of the dump XML file.
    • <username> and <password> are the credentials to the FTPS server.
    • <ftps_server_ip> is the IP address of the external FTPS server used for storing the CEE component backups.
    • <source_path> is the path on the FTPS server to the directory for storing the CEE component backup files.

    An example of the command is the following:

    root@compute-0-6:/var/lib/nova# curl -k --ftp-ssl ftp://admin:admin@10.0.0.1//rollback/fuel_master_compute6_running.xml > /var/lib/nova/fuel_master_compute6_running.xml

    root@compute-0-6:/var/lib/nova# curl -k --ftp-ssl ⇒
    ftp://admin:admin@10.0.0.1//rollback/⇒
    fuel_master_compute6_running.xml > ⇒
    /var/lib/nova/fuel_master_compute6_running.xml

  10. Copy, transfer, and decompress the vFuel VM image from /var/lib/nova from the external FTPS server described in Section 1.3.1.3 to /var/lib/nova by executing the following command:

    curl -k --ftp-ssl ftp://<username>:<password>@<ftps_server_ip> // <source_path>/<compressed_file_name> | pigz --stdout --decompress --processes $(xmlstarlet sel -t -v /domain/vcpu < ./<<fuel_xml_name>>) > /var/lib/nova/<vfuel-img_file_name>
    curl -k --ftp-ssl ftp://<username>:<password>@⇒
    <ftps_server_ip> // <source_path>/<compressed_file_name>⇒
     | pigz --stdout --decompress --processes⇒
     $(xmlstarlet sel -t -v /domain/vcpu < ./<<fuel_xml_name>>)⇒
     > /var/lib/nova/<vfuel-img_file_name>
    

    The variables are the following:

    • <username> and <password> are the credentials to the FTPS server.
    • <ftps_server_ip> is the IP address of the external FTPS server used for storing the CEE component backups.
    • <source_path> is the path on the FTPS server to the directory for storing the CEE component backup files.
    • <compressed_file_name> is the file name for the compressed vCIC image set at rollback. If the recommended values are used, the value is <vcic-img_file_name>.gz.
    • <vfuel-img_file_name> is The vFuel VM image file name.
    • <fuel_xml_name> is the corresponding configuration XML file.

    An example of the command is the following:

    root@compute-0-6:/var/lib/nova# curl -k --ftp-ssl ftp://admin:admin@10.0.0.1//rollback/fuel_master.qcow2.gz | pigz --stdout --decompress --processes $(xmlstarlet sel -t -v /domain/vcpu < ./fuel_master_compute6_running.xml) > /var/lib/nova/fuel_master.qcow2
    
    root@compute-0-6:/var/lib/nova# curl -k --ftp-ssl⇒
     ftp://admin:admin@10.0.0.1//rollback/fuel_master.qcow2.gz⇒
     | pigz --stdout --decompress --processes⇒
     $(xmlstarlet sel -t -v /domain/vcpu <⇒
     ./fuel_master_compute6_running.xml)⇒
     > /var/lib/nova/fuel_master.qcow2
    

  11. Define the vFuel VM using the backed up XML dump by executing the following command:
    virsh define <dump_file_name>.xml

    An example of the command and the printout is the following:

    root@compute-0-6:~# virsh define fuel_master_compute6_running.xml
    Domain fuel_master defined from fuel_master_compute6_running.xml
    
    root@compute-0-6:~# virsh define fuel_master_compute6_running.xml
    Domain fuel_master defined from ⇒
    fuel_master_compute6_running.xml
    

  12. Verify that the active vFuel VM has been defined by executing the command virsh list --all. For more information, see Section 6.2.

    If the active vFuel VM has been defined, it is listed in the printout with State: shut off.

  13. Start the active vFuel VM by executing the following command:
    virsh start fuel_master

    The expected printout is the following:
    Domain fuel_master started

  14. Verify that the active vFuel VM is running by executing the command virsh list --all. For more information, see Section 6.2.
  15. Verify that all nodes are operational by logging on to vFuel and executing the fuel node command. For more information, see Section 6.3.
  16. Restore the /root/openrc files on all vCICs. These files were temporarily changed during the update on the vCICs. Execute the following command on Fuel:

    /opt/ecs-fuel-utils/restore_openrc.sh

  17. Check the system for active alarms. If the Fuel failed alarm did not cease after the active vFuel VM is rolled back and is operational again, generate new SSH key as described in Section 6.6. For more information on listing active alarms using CLI, refer to the document CEE CLI Guide.
  18. Synchronize the active and cold standby vFuel VMs using the procedure described in Fuel Synchronization.
  19. If the vCICs have been updated, or the update failed during vCIC update, continue with Section 5.2.2.2.

5.2.2.2   vCIC Rollback

vCIC rollback is achieved through restoring the vCIC VM from a backed up image, databases are restored to the state at the time of the update as well. Databases include information on the location of each VM, that is, which compute host is hosting which VM. Rollback is only possible if the actual VM locations match the databases, therefore vCIC rollback is only possible if VMs are not migrated during the update procedures.

Note:  
Perform this procedure only if the updated vFuel VM has already been rolled back and synchronized.

In case of vCIC rollback, all updated vCICs must be rolled back.


The rollback procedure must be performed in the reverse order of the VM image backup, that is, the vCIC that was backed up last must be rolled back first.

In this section, the three vCICs are referred to as vCIC1, vCIC2 and vCIC3. The assignment of numbers is the following:

The procedure is described for rolling back vCIC3. The procedure must be repeated on the remaining vCICs with different values for the variables, respectively.

Do the following:

  1. Verify that the forwarding rule to the FTPS server is established by doing the following on each vCIC:
    1. Log on to the vCIC using SSH. For more information, refer to the CEE Connectivity User Guide.
    2. Enter maintenance mode by executing the following command:

      sudo umm on

    3. Verify that the forwarding rule is established by executing the following command:
      iptables -t nat -C POSTROUTING -j MASQUERADE

      Note:  
      If the printout indicates failure, append the rule by executing the following command:
      iptables -t nat -A POSTROUTING -j MASQUERADE


    4. Log out of the vCIC:
      exit

    5. Repeat the procedure on all vCICs.
  2. Log on to the compute host hosting vCIC3, using SSH. For more information, refer to the CEE Connectivity User Guide.
  3. Shut down the vCIC VM by executing the following command:
    virsh shutdown <vcic_vm_name>

    The expected printout is the following:
    Domain <cic_vm_name> is being shutdown

  4. Verify that the vCIC VM is shut down by executing the command virsh list --all. For more information, see Section 6.2.
  5. Undefine the vCIC by executing the following command:
    virsh undefine <cic_vm_name>

    An example of the command and the printout is the following:

    root@compute-0-1:# virsh undefine cic-3_vm
    Domain cic-3_vm has been undefined
    

  6. Verify that the vCIC VM has been undefined by executing the command virsh list --all. For more information, see Section 6.2.

    If the vCIC VM has been undefined, it is not listed in the printout.

  7. Remove the vCIC VM image file, <cic_name>_vm.xml configuration XML file and template_<cic_name>_vm.xml template file by doing the following:
    1. Navigate to /var/lib/nova:
      cd /var/lib/nova

    2. Remove the files by executing the following command:

      rm <vm_image_file_name> <cic_vm_xml_name> <xml_template_file_name>
      rm <vm_image_file_name> <cic_vm_xml_name><xml_template_file_name>

      An example of the command is the following:

      root@compute-0-1:/var/lib/nova# rm cic-3_vm.img cic-3_vm.xml template_cic-3_vm.xml
  8. Add a route between the host hosting the vCIC and the external FTPS server by executing the following command:

    route add <ftps_server_ip> gw <vcic_ip>

    The variables are the following:

    • <ftps_server_ip> is the IP address of the external FTPS server.
    • <vcic_ip> is the IP address of an operational vCIC on the fuel_ctrl_sp network, that is, if vCIC3 image is transferred, the IP address of vCIC1 or vCIC2 on the fuel_ctrl_sp network.
  9. Copy and transfer the XML configuration file and XML template one by one from the external FTPS server described in Section 1.3.1.3 to /var/lib/nova by executing the following command:

    curl -k --ftp-ssl ftp://<username>:<password>@<ftps_server_ip>//<source_path>/<file_name> > /var/lib/nova/<file_name>
    curl -k --ftp-ssl ftp://<username>:<password>@⇒
    <ftps_server_ip>//<source_path>/<file_name> >⇒
     /var/lib/nova/<file_name>

    The variables are the following:

    • <file_name> is the filename of one of the following:
      • The corresponding <cic_name>_vm.xml configuration XML file
      • The corresponding template_<cic_name>_vm.xml template file
    • <username> and <password> are the credentials to the FTPS server.
    • <ftps_server_ip> is the IP address of the external FTPS server used for storing the CEE component backups.
    • <source_path> is the path on the FTPS server to the directory for storing the CEE component backup files.

    An example of the command is the following:

    root@compute-0-6:/var/lib/nova# curl -k --ftp-ssl ftp://admin:admin@10.0.0.1//rollback/cic-1_vm.xml > /var/lib/nova/cic-1_vm.xml
    root@compute-0-6:/var/lib/nova# curl -k --ftp-ssl ftp://admin:admin@10.0.0.1//rollback/template_cic-1_vm.xml > /var/lib/nova/template_cic-1_vm.xml

    root@compute-0-6:/var/lib/nova# curl -k --ftp-ssl ⇒
    ftp://admin:admin@10.0.0.1//rollback/cic-1_vm.xml > ⇒
    /var/lib/nova/cic-1_vm.xml
    root@compute-0-6:/var/lib/nova# curl -k --ftp-ssl ⇒
    ftp://admin:admin@10.0.0.1//rollback/template_cic-1_vm.xml >⇒
     /var/lib/nova/template_cic-1_vm.xml

  10. Copy, transfer, and decompress the vCIC VM image from /var/lib/nova from the external FTPS server described in Section 1.3.1.3 to /var/lib/nova by executing the following command:

    curl -k --ftp-ssl ftp://<username>:<password>@<ftps_server_ip> // <source_path>/<compressed_file_name> | pigz --stdout --decompress --processes $(xmlstarlet sel -t -v /domain/vcpu < ./<<cic_name>_vm.xml>) > /var/lib/nova/<vcic-img_file_name>
    curl -k --ftp-ssl ftp://<username>:<password>@⇒
    <ftps_server_ip> // <source_path>/<compressed_file_name>⇒
     | pigz --stdout --decompress --processes⇒
     $(xmlstarlet sel -t -v /domain/vcpu < ./<<cic_name>_vm.xml>)⇒
     > /var/lib/nova/<vcic-img_file_name>
    

    The variables are the following:

    • <username> and <password> are the credentials to the FTPS server.
    • <ftps_server_ip> is the IP address of the external FTPS server used for storing the CEE component backups.
    • <source_path> is the path on the FTPS server to the directory for storing the CEE component backup files.
    • <compressed_file_name> is the file name for the compressed vCIC image set at rollback. If the recommended values are used, the value is <vcic-img_file_name>.gz.
    • <vcic-img_file_name> is The vCIC VM image file name.
    • <cic_name>_vm.xml is the corresponding configuration XML file.

    An example of the command is the following:

    root@compute-0-6:/var/lib/nova# curl -k --ftp-ssl ftp://admin:admin@10.0.0.1//rollback/cic-1_vm.img.gz | pigz --stdout --decompress --processes $(xmlstarlet sel -t -v /domain/vcpu < ./cic-1_vm.xml) > /var/lib/nova/cic-1_vm.img
    

    root@compute-0-6:/var/lib/nova# curl -k --ftp-ssl⇒
     ftp://admin:admin@10.0.0.1//rollback/cic-1_vm.img.gz⇒
     | pigz --stdout --decompress --processes⇒
     $(xmlstarlet sel -t -v /domain/vcpu < ./cic-1_vm.xml)⇒
     > /var/lib/nova/cic-1_vm.img
    

  11. Verify that user and group permissions for the VM image are nova:nova by executing the following command and checking user and group permissions in the respective columns of the printout:
    ls -l <vm_image_file_name>

    If the user or group permissions changed, update them by executing the following command:
    chown nova:nova <vm_image_file_name>

  12. Define the vCIC VM using the respective <cic_name>_vm.xml configuration XML file by executing the following command:
    virsh define <xml_file_name>

  13. Verify that the vCIC VM has been defined by executing the command virsh list --all. For more information, see Section 6.2.

    If the vCIC VM has been defined, it is listed in the printout with State: shut off.

  14. Set the vCIC VM to autostart:

    virsh autostart <cic_vm_name>

    The expected printout is the following:

    Domain <cic_vm_name> marked as autostarted

  15. Start the vCIC VM by executing the following command:
    virsh start <cic_vm_name>

    The expected printout is the following:
    Domain <cic_vm_name> started

  16. Wait until the vCIC VM is operational. The time required for the vCIC VM to start is approximately four minutes. Do not perform any operations until the vCIC VM is operational.
  17. Verify that the vCIC VM is operational by executing the command virsh list --all. For more information, see Section 6.2.
  18. If not all vCICs have been rolled back, repeat the procedure on one of the remaining vCICs starting from Step 1.
    • For vCIC2, the route must be added and removed on vCIC1 and vCIC3.
    • For vCIC1, the route must be added and removed on vCIC2 and vCIC3.
  19. If all vCICs have been rolled back, do the following:
    1. Verify that all vCICs are operational by executing the sudo umm status command on all vCICs. Do one of the following:
      • If any of the vCICs is in maintenance mode, exit maintenance mode by executing the sudo umm off command on the affected vCIC.
      • If any of the vCICs fails to start after 15 minutes, contact next level of support and exit this procedure.
      • If all vCICs are operational, continue with the procedure.
    2. When all three vCICs are in active state, wait until the databases are synchronized. Database synchronization takes less than 10 minutes.
    3. Perform vCIC health check according to the procedures described in the Health Check Procedure, including the following:
    4. Check the system for active alarms. If the CIC failed alarm did not cease after a vCIC is rolled back and is operational again, generate new SSH key as described in Section 6.6. For more information on listing active alarms using CLI, refer to the document CEE CLI Guide.
    5. If applicable, continue with Section 5.2.2.3.

5.2.2.3   Compute Host Rollback

Note:  
Contact next level of support before attempting compute rollback.

Compute host rollback is achieved, using the server replacement procedure, as described in the document Server Replacement. Hardware replacement is not required. After the repair procedure, the compute host will be running the CEE version corresponding to the version of the vFuel node used for the repair.

Compute repair can only be attempted if the following conditions are fulfilled:

  1. No VMs were migrated during update.
  2. vFuel is rolled back.
  3. The active vFuel VM and the cold standby vFuel VM are synchronized.
  4. vCICs are rolled back.

If the update of a compute host not hosting vFuel or vCIC fails, the compute host must be removed from the CEE region and repaired with the procedure described in Server Replacement. Hardware replacement is not required. After the repair procedure, the compute host is running the CEE version corresponding to the version of vFuel, that is, the rolled back version of CEE.

Repair of compute hosts hosting vCIC, vFuel or both is not possible. If the update of such compute hosts fails, redeployment of the CEE region is required.

5.3   Error Handling for Failed Atlas Upgrade

Note:  
This procedure is only applicable if the CEE region is using Atlas.

Perform Atlas rollback as described in the relevant section of Atlas SW Upgrade.

6   Additional Operations

This section describes operations required by multiple procedures in this document.

6.1   Checking Update State

After vFuel has been updated, the state of the update can be checked anytime during the update process from vFuel. Run the command optionally with node names:

update_state [node_name]

This gives a short state report of the nodes. The following is an example of the update state report:

Example 1   Update State Report

[root@fuel ~]# update_state
+-------------+----------+--------------------------+--------+
|     Node    |  State   |         Current          | Target |
+-------------+----------+--------------------------+--------+
| compute-0-1 | finished | R6-R7B06-5384594593-9.0  |  None  |
| compute-0-3 | finished | R6-R7B06-5384594593-9.0  |  None  |
| compute-0-4 | finished | R6-R7B06-5384594593-9.0  |  None  |
| compute-0-5 | finished | R6-R7B06-5384594593-9.0  |  None  |
|    cic-1    | finished | R6-R7B06-5384594593-9.0  |  None  |
|    cic-2    | finished | R6-R7B06-5384594593-9.0  |  None  |
|    cic-3    | finished | R6-R7B06-5384594593-9.0  |  None  |
Note:  
The version under the Current and Target columns must match the versions of the update path.

6.2   Checking VM State

To check if a VM is operational or shut down, execute the following command on the compute host hosting the VM: virsh list --all

An example of the printout is the following:

root@compute-0-1:~$ virsh list --all
 Id    Name                           State
----------------------------------------------------
 2     cic-1_vm                       running
 -     fuel_master                    shut off

6.3   Listing Nodes

To verify that all nodes are operational, or to list node names and IP addresses, execute the following command on vFuel:

fuel node

An example of the printout is the following:

id | status | name        | cluster | ip           | mac               | roles             | pending_roles | online | group_id
---|--------|-------------|---------|--------------|-------------------|-------------------|---------------|--------|---------
7  | ready  | cic-1       | 1       | 192.168.0.32 | 6a:df:69:05:25:4d | controller, mongo |               | True   | 1       
8  | ready  | cic-3       | 1       | 192.168.0.31 | 8e:f0:49:45:6a:43 | controller, mongo |               | True   | 1       
1  | ready  | compute-0-5 | 1       | 192.168.0.24 | 90:55:ae:3a:05:f6 | compute           |               | True   | 1       
2  | ready  | compute-0-4 | 1       | 192.168.0.22 | 90:55:ae:3a:e5:76 | compute           |               | True   | 1       
5  | ready  | compute-0-1 | 1       | 192.168.0.23 | 90:55:ae:39:f7:26 | compute, virt     |               | True   | 1       
4  | ready  | compute-0-2 | 1       | 192.168.0.21 | 90:55:ae:3a:e3:ae | compute, virt     |               | True   | 1       
6  | ready  | cic-2       | 1       | 192.168.0.30 | 92:f9:49:4c:d4:4f | controller, mongo |               | True   | 1       
3  | ready  | compute-0-3 | 1       | 192.168.0.25 | 90:55:ae:3a:e3:96 | compute, virt     |               | True   | 1       
9  | ready  | compute-0-6 | 1       | 192.168.0.26 | 56:bd:11:f2:cd:42 | compute           |               | True   | 1       
10 | ready  | compute-0-7 | 1       | 192.168.0.27 | fa:30:2d:96:16:40 | compute           |               | True   | 1       
id | status | name        | cluster | ip           | mac               | roles             |⇒
---|--------|-------------|---------|--------------|-------------------|-------------------|⇒
7  | ready  | cic-1       | 1       | 192.168.0.32 | 6a:df:69:05:25:4d | controller, mongo |⇒
8  | ready  | cic-3       | 1       | 192.168.0.31 | 8e:f0:49:45:6a:43 | controller, mongo |⇒
1  | ready  | compute-0-5 | 1       | 192.168.0.24 | 90:55:ae:3a:05:f6 | compute           |⇒
2  | ready  | compute-0-4 | 1       | 192.168.0.22 | 90:55:ae:3a:e5:76 | compute           |⇒
5  | ready  | compute-0-1 | 1       | 192.168.0.23 | 90:55:ae:39:f7:26 | compute, virt     |⇒
4  | ready  | compute-0-2 | 1       | 192.168.0.21 | 90:55:ae:3a:e3:ae | compute, virt     |⇒
6  | ready  | cic-2       | 1       | 192.168.0.30 | 92:f9:49:4c:d4:4f | controller, mongo |⇒
3  | ready  | compute-0-3 | 1       | 192.168.0.25 | 90:55:ae:3a:e3:96 | compute, virt     |⇒
9  | ready  | compute-0-6 | 1       | 192.168.0.26 | 56:bd:11:f2:cd:42 | compute           |⇒
10 | ready  | compute-0-7 | 1       | 192.168.0.27 | fa:30:2d:96:16:40 | compute           |⇒

 pending_roles | online | group_id
---------------|--------|---------
               | True   | 1       
               | True   | 1       
               | True   | 1       
               | True   | 1       
               | True   | 1       
               | True   | 1       
               | True   | 1       
               | True   | 1       
               | True   | 1       
               | True   | 1       

6.4   Identifying the Active and Cold Standby Fuel Hosts

Identify the compute hosts hosting the active vFuel VM and the cold standby vFuel VM by executing the following script:

[root@fuel ~]# for node in primary secondary
do
ip=$(get_vfuel_info --ip --$node);
name=$(ssh $ip hostname -s 2>&1 | grep compute);
stat=$(ssh $ip sudo virsh list --all 2>&1 | grep fuel);
stat=$(echo $stat | awk '{print $3 " " $4}');
printf "%-10s | %s | %s\n" "$name" "$ip" "$stat";
done

An example of the printout is the following:

compute-0-6 | 192.168.0.23 | running
compute-0-1 | 192.168.0.20 | shut off


In the printout, running identifies the compute host hosting the active vFuel VM, and shut off identifies the compute host hosting the cold standby vFuel VM. Record the IP addresses of the hosts hosting the vFuel VMs from the printouts as this data is required in the rollback procedure.

6.5   Insert Forwarding Rule on vCICs

Do the following:

  1. Log on to one of the vCICs using SSH. For more information, refer to the CEE Connectivity User Guide.
  2. Insert a forwarding rule for the routes to the external FTPS server by executing the following command:
    iptables -t nat -A POSTROUTING -j MASQUERADE

  3. Log out of the vCIC:
    exit
  4. Repeat the procedure on all vCICs.

6.6   Generating New SSH Key for Compute Host Hosting vCIC or vFuel

If the Fuel failed or CIC failed alarms issued during the rollback of the vFuel VM or any of the vCIC VMs did not cease after successful rollback and start of the node, a new SSH key must be generated for the host hosting the node. Do the following:

  1. Log on to vFuel using SSH. For more information, refer to the document CEE Connectivity User Guide.
  2. Identify the ID of the host running the vFuel or vCIC VM by executing the following command:
    fuel node

    The ID of the host is listed under the id column of the printout. The host can be identified based on the name of the host listed in the name column of the printout.

    Save this data as it will be used in a later step of the procedure.

    For more information, see Section 6.3.

  3. Generate new SSH key for the node by executing the following command:

    fuel node --node <node_id> --tasks eri_idam_distribute_fuel_creds --force
    fuel node --node <node_id> ⇒
    --tasks eri_idam_distribute_fuel_creds --force

    where <node_id> corresponds to the ID of the host identified in Section 6.3.

    An example of the command is the following:

    fuel node --node 5 --tasks eri_idam_distribute_fuel_creds --force 
    
    fuel node --node 5 --tasks ⇒
    eri_idam_distribute_fuel_creds --force 
    

  4. Check the system for active alarms to see if the alarm has ceased. For more information on listing active alarms using CLI, refer to the CEE CLI Guide.

Appendix

7   update_groups.yaml Examples

In Example 2, the update_groups.yaml file is configured for a 16-node CEE region, with a 5-node managed ScaleIO cluster. Update is done in one session. All nodes are updated in serial mode. In the last phase, the compute hosts hosting vFuel and the vCICs are updated. compute-0-3 is hosting vFuel and one of the vCICs.

Example 2   update_groups.yaml for 16-node CEE with ScaleIO, single session

- type: serial
  nodes:
    - scaleio-0-4
    - scaleio-0-5
    - scaleio-0-6
    - scaleio-0-7
    - scaleio-0-8
- type: serial
  nodes:
    - cic-1
    - cic-2
    - cic-3
- type: serial
  nodes:
    - compute-0-9
    - compute-0-10
    - compute-0-11
    - compute-0-12
    - compute-0-13
    - compute-0-14
    - compute-0-15
    - compute-0-16
- type: serial
  nodes:
    - compute-0-3
    - compute-0-1
    - compute-0-2

In Example 3, the update_groups.yaml file is configured for a 24-node CEE region, with a 5-node managed ScaleIO cluster. Update is done in one session. Compute hosts are updated in parallel mode in multiple phases, in subsets of four. compute-0-9 is hosting vFuel. compute-0-1, compute-0-2 and compute-0-3 are hosting the vCICs.

Example 3   update_groups.yaml for 24-node CEE with ScaleIO, single session

- type: serial
  nodes:
    - scaleio-0-4
    - scaleio-0-5
    - scaleio-0-6
    - scaleio-0-7
    - scaleio-0-8
- type: serial
  nodes:
    - cic-1
    - cic-2
    - cic-3
- type: parallel
  nodes:
    - compute-0-10
    - compute-0-11
    - compute-0-12
    - compute-0-13
- type: parallel
  nodes:
    - compute-0-14
    - compute-0-15
    - compute-0-16
    - compute-1-1
- type: parallel
  nodes:
    - compute-1-2
    - compute-1-3
    - compute-1-4
    - compute-1-5
- type: parallel
  nodes:
    - compute-1-6
    - compute-1-7
    - compute-1-8
- type: serial
  nodes:
    - compute-0-9
- type: serial
  nodes:
    - compute-0-1
    - compute-0-2
    - compute-0-3

In the following example, the update_groups.yaml file is configured for a 12-node CEE region. In this example, the update is accomplished in multiple sessions, with updated update_groups.yaml for each session:

  1. vFuel and vCICs, see Example 4
  2. Six compute hosts in parallel mode, in groups of two, see Example 5
  3. The remaining three compute hosts in a single phase, in serial mode, see Example 6
  4. The vFuel and vCIC hosts, in serial mode. compute-0-1 is hosting vFuel and one of the vCICs, see Example 7.

Compute hosts are updated in parallel mode in subsets of four, in one session.

Example 4   update_groups.yaml for 12-node CEE, session 1 - vFuel, vCICs

- type: serial
  nodes:
    - cic-1
    - cic-2
    - cic-3

Example 5   update_groups.yaml for 12-node CEE, session 2 - Compute hosts

- type: parallel
  nodes:
    - compute-0-4
    - compute-0-5
- type: parallel
  nodes:
    - compute-0-6
    - compute-0-7
- type: parallel
  nodes:
    - compute-0-8
    - compute-0-9

Example 6   update_groups.yaml for 12-node CEE, session 3 - Compute hosts

- type: serial
  nodes:
    - compute-0-10
    - compute-0-11
    - compute-0-12

Example 7   update_groups.yaml for 12-node CEE, session 4 - vFuel and vCIC hosts

- type: serial
  nodes:
    - compute-0-1
    - compute-0-2
    - compute-0-3

8   NIC Firmware Version Check and Upgrade

To check the firmware version of any X710 NICs assigned to DPDK, do the following on each compute host:

  1. Log on to the compute host as root using SSH. For more information, refer to the CEE Connectivity User Guide.
  2. Check NIC driver binding and record the PCI address and device name of any X710 NIC assigned to DPDK using the following command:

    dpdk-devbind.py -s

    An example of the printout is the following:

    root@compute-0-3:~#  dpdk-devbind.py -s
    Network devices using DPDK-compatible driver
    ============================================
    0000:83:00.0 'Ethernet Controller X710 for 10GbE SFP+' drv=vfio-pci unused=
    0000:83:00.3 'Ethernet Controller X710 for 10GbE SFP+' drv=vfio-pci unused=
     
    Network devices using kernel driver
    ===================================
    0000:01:00.0 'I350 Gigabit Network Connection' if=eth0 drv=igb unused=vfio-pci 
    0000:01:00.1 'I350 Gigabit Network Connection' if=eth1 drv=igb unused=vfio-pci 
    0000:03:00.0 '82599ES 10-Gigabit SFI/SFP+ Network Connection' if=eth2 drv=ixgbe unused=vfio-pci 
    0000:03:00.1 '82599ES 10-Gigabit SFI/SFP+ Network Connection' if=eth3 drv=ixgbe unused=vfio-pci 
    0000:83:00.1 'Ethernet Controller X710 for 10GbE SFP+' if=eth5 drv=i40e unused=vfio-pci 
    0000:83:00.2 'Ethernet Controller X710 for 10GbE SFP+' if=eth6 drv=i40e unused=vfio-pci 
     
    Other network devices
    =====================
    <none>
     
    Crypto devices using DPDK-compatible driver
    ===========================================
    <none>
     
    Crypto devices using kernel driver
    ==================================
    <none>
     
    Other crypto devices
    ====================
    

    
    
  3. Check the firmware version of the NICs using one of the following options:
    1. Query the device information for the NIC using the following command:

      ethtools -i <device_name>

      where <device_name> is the device name of the NIC recorded earlier in the procedure.

      An example of the command is the following:

      root@compute-0-3:~#  ethtools -i eth5
      driver: i40e
      version: 2.2.4
      firmware-version: 4.53 0x80001fad 0.0.0
      bus-info: 0000:83:00.1
      supports-statistics: yes
      supports-test: yes
      supports-eeprom-access: yes
      supports-register-dump: yes
      supports-priv-flags: yes
      root@compute-0-3:~# 
      
    2. If only DPDK interfaces are used, execute the following command:

      egrep "<PCI_address> fw [0-9]\.[1-9][0-9]\.[0-9]{5}" /var/log/dmesg

      where <PCI_address> is the PCI address of the NIC recorded earlier in the procedure.

      An example of the printout is the following:

      root@compute-0-3:~# egrep "0000:83:00.3: fw [0-9]\.[1-9][0-9]\.[0-9]{5}" /var/log/dmesg
      [   15.122395] i40e 0000:83:00.3: fw 5.50.47059 api 1.5 nvm 5.51 0x80002bca 1.1568.0
  4. If the NIC firmware version is lower than 6.0.1, update the firmware version according to the procedure described by the NIC manufacturer. Refer to Reference [6].
    Note:  
    Before firmware update, VMs hosted in the affected compute host must be migrated.

    Note:  
    In the procedure provided by the NIC manufacturer, the following step must be changed:

    Instead of the chmod755 nvmupdate.cfg command, chmod 755 nvmupdate64e must be used.


  5. After firmware update, restart the server to activate the NIC firmware by executing the following command:

    shutdown -r


Reference List

[1] Cloud SDN R6.1 for CEE TI - Release Notes, 2/109 47-HSD 101 048/3-1
[2] Cloud SDN Troubleshooting Guide, 1/154 51-HSD 101 048/3-V1
[3] Cloud SDN Upgrade and Rollback, 1/1543-HSD 101 048/2-3
[4] Health Check Monitoring Guideline, 1543-HSD 101 048/3-V1
[5] Limitations and Workarounds for Cloud Execution Environment (CEE) 6.6, 5/109 21-AZE 102 01/5-12
[6] Non-Volatile Memory (NVM) Update Utility for Intel® Ethernet Adapters—Linux. https://downloadcenter.intel.com/download/25791/Ethernet-Non-Volatile-Memory-NVM-Update-Utility-for-Intel-Ethernet-Adapters-Linux-?product=82947
[7] Product Revision Information for Cloud Execution Environment (CEE) 6.6, 109 21-AZE 102 01/5-12
[8] YAML Specification. http://www.yaml.org/spec/1.2/spec.html