| 1 | Introduction |
| 1.1 | Alarm Description |
| 1.2 | Prerequisites |
2 | Procedure |
| 2.1 | Analyzing the Alarm |
| 2.2 | Check Network Connectivity |
| 2.3 | Checking Management Node |
| 2.4 | Checking Data Node |
| 2.5 | Checking SQL Node |
1 Introduction
This instruction concerns alarm handling.
1.1 Alarm Description
In Geographic Redundancy scenario, the replication status between two sites is monitored by the script, and once it detects any failure in replication procedure, it reports an alarm.
The alarm attributes are listed and explained in Table 1.
|
Attribute Name |
Attribute Value |
|---|---|
|
Major Type |
193 |
|
Minor Type |
860163 |
|
Managed Object Class |
IpworksEM |
|
Source |
ManagedElement=<Node Name>,SystemFunctions=1,Fm=1,FmAlarmModel=ipworksEM,FmAlarmType=ipworksEmMysqlGeoReplicationFailure,Source= <SC IP Address> |
|
Specific Problem |
Storage Server, The MySQL Replication for Geographic Redundancy Failed |
|
Event Type |
communicationsAlarm(2) |
|
Probable Cause |
x733RemoteNodeTransmissionError(342) |
|
Additional Text |
This alarm is issued because the MySQL slave on %s[the node IP Address] detected a replication failure,[Slave_IO_Errno:%s | Slave_SQL_Errno:%s] |
|
Perceived Severity |
Major |
1.2 Prerequisites
This section provides information on the documents, tools, and conditions that apply to the procedure.
1.2.1 Documents
Before starting this procedure, ensure that you have read the following documents:
1.2.2 Tools
No tools are required.
1.2.3 Conditions
No conditions.
2 Procedure
This section describes the procedure to follow when this alarm is received.
2.1 Analyzing the Alarm
Do the following at the maintenance center:
- Check Network Connectivity
- Check Management Node
- Check Data Node
- Check SQL Node
2.2 Check Network Connectivity
Take Site A and Site B as example and they are configured as Geographic Redundancy.
- Note:
- The “ping” command requires that ICMP message is NOT blocked by Gateway/Route in the involved network.
- Log on to SC-1 on Site A.
- Check whether OAM MIP is up on Site A.
# ping <MIP_OAM_IP of Site A>
# ping <MIP_OAM_IP of Site B>
- Check whether provision MIP is up on Site A.
# ping <MIP_PROV_IP of Site A>
# ping <MIP_PROV_IP of Site B>
- Log on to SC-1 on Site B.
- Check whether OAM MIP is up on Site B.
# ping <MIP_OAM_IP of Site A>
#ping <MIP_OAM_IP of Site B>
- Check whether provision MIP is up on Site B.
# ping <MIP_PROV_IP of Site A>
#ping <MIP_PROV_IP of Site B>
2.3 Checking Management Node
To clear the alarm, do the following:
- Log on to SC-1.
# ssh SC-1
- Check whether the Management Node is down.
Example:
SC-1:~#/etc/init.d/ipworks.mysql show-status
Connected to Management Server at: SC-1:1186 Cluster Configuration --------------------- [ndbd(NDB)] 2 node(s) id=27 @169.254.101.1 (mysql-5.6.27 ndb-7.4.8, Nodegroup: 0, *) id=28 @169.254.101.2 (mysql-5.6.27 ndb-7.4.8, Nodegroup: 0) [ndb_mgmd(MGM)] 2 node(s) id=1 (not connected, accepting connect from SC-1) id=2 @169.254.101.2 (mysql-5.6.27 ndb-7.4.8) [mysqld(API)] 24 node(s) ...
The example indicates that the SC-1 Management Node is down.
- Start the Management Node.
- If the Data Node and the SQL Node are running, execute
the following command:
#/etc/init.d/ipworks.mysql start-mgmd
- If the Data Node and the SQL Node are down, execute
the following commands:
#/etc/init.d/ipworks.mysql start-mgmd
#/etc/init.d/ipworks.mysql start-ndbd
#/etc/init.d/ipworks.mysql start-sqlnode
If the operator needs to initialize the Data Node, execute the following command instead:
#/etc/init.d/ipworks.mysql start-ndbd-initial
- Note:
- The initialization deletes all the data in the Data Node.
- If the Data Node and the SQL Node are running, execute
the following command:
- Confirm that the alarm has ceased (within 60 seconds). If the alarm remains, consult the next level of maintenance support. Further actions are outside the scope of this instruction.
2.4 Checking Data Node
To clear the alarm, do the following:
- Check whether the Data Node is down.
Example:
SC-1:~#/etc/init.d/ipworks.mysql show-status Connected to Management Server at: localhost:1186 Cluster Configuration --------------------- [ndbd(NDB)] 2 node(s) id=27 (not connected, accepting connect from SC-1) id=28 @169.254.100.2 (mysql-5.6.27 ndb-7.4.8, Nodegroup: 0, *) [ndb_mgmd(MGM)] 2 node(s) id=1 @169.254.101.1 (mysql-5.6.27 ndb-7.4.8) id=2 @169.254.101.2 (mysql-5.6.27 ndb-7.4.8) [mysqld(API)] 24 node(s) id=3 @169.254.101.1 (mysql-5.6.27 ndb-7.4.8) id=4 (not connected, accepting connect from SC-2) ...
This example output indicates that the Data Node (id=27) is down.
- If the Data Node is down, execute the following command
to start it:
Example:
SC-1:~#/etc/init.d/ipworks.mysql start-ndbd
If the operator needs to initialize the Data Node, execute the following command instead:
#/etc/init.d/ipworks.mysql start-ndbd-initial
- Note:
- The initialization deletes all the data in the Data Node.
- Confirm that the alarm has ceased. If the alarm remains, consult the next level of maintenance support. Further actions are outside the scope of this instruction.
2.5 Checking SQL Node
To clear the alarm, do the following:
- Check whether the SQL Node is down.
Example:
SC-1:~#/etc/init.d/ipworks.mysql show-status
Connected to Management Server at: SC-1:1186 Cluster Configuration --------------------- [ndbd(NDB)] 2 node(s) id=27 @169.254.101.1 (mysql-5.6.27 ndb-7.4.8, Nodegroup: 0, *) id=28 @169.254.101.2 (mysql-5.6.27 ndb-7.4.8, Nodegroup: 0) [ndb_mgmd(MGM)] 2 node(s) id=1 @169.254.101.1 (mysql-5.6.27 ndb-7.4.8) id=2 @169.254.101.2 (mysql-5.6.27 ndb-7.4.8) [mysqld(API)] 24 node(s) id=3 (not connected, accepting connect from SC-1) id=4 (not connected, accepting connect from SC-2) ...
This example output indicates that the SQL Nodes (id=3 and id=4) are down.
- If the SQL Node is down, execute the following command
to start it:
Example:
SC-1:~#/etc/init.d/ipworks.mysql start-sqlnode
- Confirm that the alarm has ceased. If the alarm remains, consult the next level of maintenance support. Further actions are outside the scope of this instruction.

Contents