BGP Control Path Failure
Cloud Execution Environment

Contents

1Introduction
1.1Alarm Description
1.2Prerequisites

2

Procedure
2.1Actions

Reference List

1   Introduction

This instruction concerns alarm handling.

1.1   Alarm Description

The BGP Control Path Failure alarm is issued by the Managed Object (MO) BGP_Neighbor when the control connection with the Border Gateway Protocol (BGP) neighbor is down.

The severity of the alarm is CRITICAL.

Possible alarm causes and fault locations are explained in Table 1.

Table 1    Alarm Causes

Alarm
Cause

Description

Fault
Reason

Fault
Location

Impact

Control path connection with the BGP neighbor is down

BGP neighbor connection is down

BGP neighbor connection is not in ESTABLISHED state

BGP neighbor

  • The service provided by the component is degraded or lost

  • Connection with the BGP neighbor is lost and datapath is affected

The following is the consequence for the node if the alarm is not solved:

The alarm attributes are listed in Table 2.

Table 2    Alarm Attributes

Attribute Name

Attribute Value

Major Type

193

Minor Type

2162705

Managed Object Class

BGP_Neighbor

Managed Object Instance

Region=<name_of_the_region>,
Service=SDNc,
Alarm=BgpControlPathFailure,
BGP_Neighbor=<ip_address>

Specific Problem

BGP neighbor connection is not in Established state

Event Type

communicationsAlarm

Probable Cause

302

Additional Text

Bgp Neighbor TCP connection is down,for BGP_Neighbor=<ip_address>

Severity

CRITICAL

1.2   Prerequisites

This section provides information on the documents, tools, and conditions that apply to the procedure.

1.2.1   Documents

For more information on CSC alarms, refer to the SDN document Alarms, Reference [1].

1.2.2   Tools

No tools are required.

1.2.3   Conditions

Not applicable.

2   Procedure

This section describes the procedure to follow when this alarm is received.

2.1   Actions

This alarm is automatically cleared once the connection with the neighbor gets ESTABLISHED state.

Normally, no further actions are necessary. In this case, exit this procedure. If the alert is issued for the same BGP neighbor frequently, do the following:

  1. Log on to a vCIC.
  2. Check if E-ODL is up and running:

    display app-status [--all <all>]

    Example output:

    root@cic-1:~# /etc/init.d/sdnc-service comcli
    cli>display app-status
    Enter password for user cscadm:
    Timestamp: Thu Jul 06 08:28:53 GMT+01:00 2017
    Node IP Address: 192.168.70.2
      INTERFACE_SERVICE   : OPERATIONAL      
      OPENFLOW            : ERROR      
      ITM                 : OPERATIONAL      
      DATASTORE_SERVICE   : OPERATIONAL      
      SCF_SERVICE         : OPERATIONAL      
      ELAN_SERVICE        : OPERATIONAL      
    Node IP Address: 192.168.70.3
      INTERFACE_SERVICE   : OPERATIONAL      
      OPENFLOW            : OPERATIONAL      
      ITM                 : OPERATIONAL      
      DATASTORE_SERVICE   : OPERATIONAL      
      SCF_SERVICE         : OPERATIONAL      
      ELAN_SERVICE        : OPERATIONAL      
    Node IP Address: 192.168.70.4
      INTERFACE_SERVICE   : OPERATIONAL      
      OPENFLOW            : OPERATIONAL      
      ITM                 : OPERATIONAL      
      DATASTORE_SERVICE   : OPERATIONAL      
      SCF_SERVICE         : OPERATIONAL      
      ELAN_SERVICE        : OPERATIONAL      

    If any service is in ERROR state, restart CSC:

    crm resource restart clone_p_sdnc-service

    Restart can take about 3-5 minutes. Wait for 5 minutes before executing any other command.

    If the connection is reestablished and the alarm ceases, exit this procedure. Else, continue with Step 3.

  3. Check the port and the underlying connectivity.
    1. Check if Quagga Border Gateway Protocol (QBGP) ports 179, 6644, and 7644 are up:

      netstat –antp | grep 6644
      netstat –antp | grep 179
      netstat –antp | grep 7644

      The ports can be in OPEN or ESTABLISHED state on any of the vCICs.

      An example output where the ports are functional is shown below.

      **** 6644 port****
      tcp        0      0 0.0.0.0:6644            0.0.0.0:*               LISTEN      25214/java      
      tcp        0      0 192.168.123.2:6644      192.168.123.5:47644     ESTABLISHED 25214/java      
      tcp        0      0 192.168.123.5:47644     192.168.123.2:6644      ESTABLISHED 14652/qthriftd  
      ****  7644 port ****
      tcp        0      0 0.0.0.0:7644            0.0.0.0:*               LISTEN      14652/qthriftd  
      tcp        0      0 192.168.123.5:42698     192.168.123.1:7644      ESTABLISHED 25214/java      
      tcp        0      0 192.168.123.1:7644      192.168.123.5:42698     ESTABLISHED 14652/qthriftd  
      **** 179 port****
      tcp        0      0 0.0.0.0:179             0.0.0.0:*               LISTEN      19030/bgpd
      tcp        0      0 17.17.17.6:44978        17.17.17.45:179         ESTABLISHED 19030/bgpd

      If all the ports are functional, continue with Step 4.

      Example output where one of the ports is down:

      ****6644 port****
      cic-1.domain.tld
      tcp        0      0 0.0.0.0:6644            0.0.0.0:*               LISTEN      25214/java      
      tcp        0      0 192.168.123.2:6644      192.168.123.5:47644     ESTABLISHED 25214/java      
      ****  7644 port ****
      cic-1.domain.tld
      tcp        0      0 0.0.0.0:7644            0.0.0.0:*               LISTEN      14652/qthriftd  
      tcp        0      0 192.168.123.5:42698     192.168.123.1:7644      ESTABLISHED 25214/java      
      tcp        0      0 192.168.123.1:7644      192.168.123.5:42698     ESTABLISHED 14652/qthriftd  

      In the above case, connectivity to DC-GW over port 179 is not established.

      The following scenarios are possible:

      • If port 6644 is not in ESTABLISHED and LISTEN state, restart the QBGP service:

        crm resource restart p_qbgp-service

      • If port 7644 is not in ESTABLISHED and LISTEN state, restart the QBGP service:

        crm resource restart p_qbgp-service

      • If port 6644 or 7644 is not in LISTEN state (in any of the vCICs), restart both services:

        crm resource restart p_sdnc-service
        crm resource restart p_qbgp-service

      • If port 179 is not in ESTABLISHED and LISTEN state, verify BGP configuration and DC-GW connectivity. If they are correct, restart the QBGP service:

        crm resource restart p_qbgp-service

      Restart can take about 3-5 minutes. Wait for 5 minutes before executing any other command.

      If the connection is reestablished and the alarm ceases, exit this procedure. Else, continue with Step b of Step 3sl-CheckConnectivity.

    2. Check data connectivity:

      ping <dc-gw_ip>

      An example of a successful command is shown below.

      cic-1:~ # ping 10.184.22.13
      PING 10.184.22.13 (10.184.22.13) 56(84) bytes of data.
      64 bytes from 10.184.22.13: icmp_seq=1 ttl=254 time=1.03 ms
      64 bytes from 10.184.22.13: icmp_seq=2 ttl=254 time=0.867 ms
      64 bytes from 10.184.22.13: icmp_seq=3 ttl=254 time=0.780 ms
      ^C
      --- 10.184.22.13 ping statistics ---
      3 packets transmitted, 3 received, 0% packet loss, time 2002ms
      rtt min/avg/max/mdev = 0.780/0.895/1.038/0.107 ms

      This means that DC-GW is reachable through the underlay. In this case, continue with Step 4.

      Example output where the command is not successful:

      cic-1:~ # ping 10.184.22.99
      PING 10.184.22.99 (10.184.22.99) 56(84) bytes of data.
      ^C
      --- 10.184.22.99 ping statistics ---
      6 packets transmitted, 0 received, 100% packet loss, time 5038ms

      This means that DC-GW is not reachable through the underlay, which can indicate an underlay fault. Refer to alarm topic LostConnection in the HDS documentation, Reference [2], and continue with Step 4.

  4. Collect troubleshooting data as described in the Data Collection Guideline.
  5. Contact the next level of maintenance support.

    Further actions are outside the scope of this instruction.

  6. The job is completed.

Reference List

[1] Alarms, 1/198 22-AXD 101 08/6-V1
[2] Hyperscale Datacenter System 8000 Customer Documentation, 2/1551-LZN 901 5032