1 Introduction
This instruction concerns alarm handling.
1.1 Alarm Description
The BGP Control Path Failure alarm is issued by the Managed Object (MO) BGP_Neighbor when the control connection with the Border Gateway Protocol (BGP) neighbor is down.
The severity of the alarm is CRITICAL.
Possible alarm causes and fault locations are explained in Table 1.
|
Alarm |
Description |
Fault |
Fault |
Impact |
|---|---|---|---|---|
|
Control path connection with the BGP neighbor is down |
BGP neighbor connection is down |
BGP neighbor connection is not in ESTABLISHED state |
BGP neighbor |
|
The following is the consequence for the node if the alarm is not solved:
- The service provided by the component is degraded or lost.
The alarm attributes are listed in Table 2.
|
Attribute Name |
Attribute Value |
|---|---|
|
Major Type |
193 |
|
Minor Type |
2162705 |
|
Managed Object Class |
BGP_Neighbor |
|
Managed Object Instance |
Region=<name_of_the_region>, |
|
Specific Problem |
BGP neighbor connection is not in Established state |
|
Event Type |
communicationsAlarm |
|
Probable Cause |
302 |
|
Additional Text |
Bgp Neighbor TCP connection is down,for BGP_Neighbor=<ip_address> |
|
Severity |
CRITICAL |
1.2 Prerequisites
This section provides information on the documents, tools, and conditions that apply to the procedure.
1.2.1 Documents
For more information on CSC alarms, refer to the SDN document Alarms, Reference [1].
1.2.2 Tools
No tools are required.
1.2.3 Conditions
Not applicable.
2 Procedure
This section describes the procedure to follow when this alarm is received.
2.1 Actions
This alarm is automatically cleared once the connection with the neighbor gets ESTABLISHED state.
Normally, no further actions are necessary. In this case, exit this procedure. If the alert is issued for the same BGP neighbor frequently, do the following:
- Log on to a vCIC.
- Check if E-ODL is up and running:
display app-status [--all <all>]
Example output:
root@cic-1:~# /etc/init.d/sdnc-service comcli
cli>display app-status
Enter password for user cscadm:
Timestamp: Thu Jul 06 08:28:53 GMT+01:00 2017
Node IP Address: 192.168.70.2
INTERFACE_SERVICE : OPERATIONAL
OPENFLOW : ERROR
ITM : OPERATIONAL
DATASTORE_SERVICE : OPERATIONAL
SCF_SERVICE : OPERATIONAL
ELAN_SERVICE : OPERATIONAL
Node IP Address: 192.168.70.3
INTERFACE_SERVICE : OPERATIONAL
OPENFLOW : OPERATIONAL
ITM : OPERATIONAL
DATASTORE_SERVICE : OPERATIONAL
SCF_SERVICE : OPERATIONAL
ELAN_SERVICE : OPERATIONAL
Node IP Address: 192.168.70.4
INTERFACE_SERVICE : OPERATIONAL
OPENFLOW : OPERATIONAL
ITM : OPERATIONAL
DATASTORE_SERVICE : OPERATIONAL
SCF_SERVICE : OPERATIONAL
ELAN_SERVICE : OPERATIONALIf any service is in ERROR state, restart CSC:
crm resource restart clone_p_sdnc-service
Restart can take about 3-5 minutes. Wait for 5 minutes before executing any other command.
If the connection is reestablished and the alarm ceases, exit this procedure. Else, continue with Step 3.
- Check the port and the underlying
connectivity.
- Check if Quagga Border Gateway Protocol (QBGP) ports 179, 6644, and 7644 are up:
netstat –antp | grep 6644
netstat –antp | grep 179
netstat –antp | grep 7644The ports can be in OPEN or ESTABLISHED state on any of the vCICs.
An example output where the ports are functional is shown below.
**** 6644 port****
tcp 0 0 0.0.0.0:6644 0.0.0.0:* LISTEN 25214/java
tcp 0 0 192.168.123.2:6644 192.168.123.5:47644 ESTABLISHED 25214/java
tcp 0 0 192.168.123.5:47644 192.168.123.2:6644 ESTABLISHED 14652/qthriftd
**** 7644 port ****
tcp 0 0 0.0.0.0:7644 0.0.0.0:* LISTEN 14652/qthriftd
tcp 0 0 192.168.123.5:42698 192.168.123.1:7644 ESTABLISHED 25214/java
tcp 0 0 192.168.123.1:7644 192.168.123.5:42698 ESTABLISHED 14652/qthriftd
**** 179 port****
tcp 0 0 0.0.0.0:179 0.0.0.0:* LISTEN 19030/bgpd
tcp 0 0 17.17.17.6:44978 17.17.17.45:179 ESTABLISHED 19030/bgpdIf all the ports are functional, continue with Step 4.
Example output where one of the ports is down:
****6644 port****
cic-1.domain.tld
tcp 0 0 0.0.0.0:6644 0.0.0.0:* LISTEN 25214/java
tcp 0 0 192.168.123.2:6644 192.168.123.5:47644 ESTABLISHED 25214/java
**** 7644 port ****
cic-1.domain.tld
tcp 0 0 0.0.0.0:7644 0.0.0.0:* LISTEN 14652/qthriftd
tcp 0 0 192.168.123.5:42698 192.168.123.1:7644 ESTABLISHED 25214/java
tcp 0 0 192.168.123.1:7644 192.168.123.5:42698 ESTABLISHED 14652/qthriftdIn the above case, connectivity to DC-GW over port 179 is not established.
The following scenarios are possible:
- If port 6644 is not in ESTABLISHED and LISTEN state,
restart the QBGP service:
crm resource restart p_qbgp-service
- If port 7644 is not in ESTABLISHED and LISTEN state,
restart the QBGP service:
crm resource restart p_qbgp-service
- If port 6644 or 7644 is not in LISTEN state
(in any of the vCICs), restart both services:
crm resource restart p_sdnc-service
crm resource restart p_qbgp-service - If port 179 is not in ESTABLISHED and LISTEN state,
verify BGP configuration and DC-GW connectivity. If they are correct,
restart the QBGP service:
crm resource restart p_qbgp-service
Restart can take about 3-5 minutes. Wait for 5 minutes before executing any other command.
If the connection is reestablished and the alarm ceases, exit this procedure. Else, continue with Step b of Step 3sl-CheckConnectivity.
- If port 6644 is not in ESTABLISHED and LISTEN state,
restart the QBGP service:
- Check data connectivity:
ping <dc-gw_ip>
An example of a successful command is shown below.
cic-1:~ # ping 10.184.22.13
PING 10.184.22.13 (10.184.22.13) 56(84) bytes of data.
64 bytes from 10.184.22.13: icmp_seq=1 ttl=254 time=1.03 ms
64 bytes from 10.184.22.13: icmp_seq=2 ttl=254 time=0.867 ms
64 bytes from 10.184.22.13: icmp_seq=3 ttl=254 time=0.780 ms
^C
--- 10.184.22.13 ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 2002ms
rtt min/avg/max/mdev = 0.780/0.895/1.038/0.107 msThis means that DC-GW is reachable through the underlay. In this case, continue with Step 4.
Example output where the command is not successful:
cic-1:~ # ping 10.184.22.99
PING 10.184.22.99 (10.184.22.99) 56(84) bytes of data.
^C
--- 10.184.22.99 ping statistics ---
6 packets transmitted, 0 received, 100% packet loss, time 5038msThis means that DC-GW is not reachable through the underlay, which can indicate an underlay fault. Refer to alarm topic LostConnection in the HDS documentation, Reference [2], and continue with Step 4.
- Check if Quagga Border Gateway Protocol (QBGP) ports 179, 6644, and 7644 are up:
- Collect troubleshooting data as described in the Data Collection Guideline.
- Contact the next level of maintenance support.
Further actions are outside the scope of this instruction.
- The job is completed.
Reference List
| [1] Alarms, 1/198 22-AXD 101 08/6-V1 |
| [2] Hyperscale Datacenter System 8000 Customer Documentation, 2/1551-LZN 901 5032 |

Contents