| 1 | Introduction |
| 1.1 | Alarm Description |
| 1.2 | Prerequisites |
2 | Procedure |
| 2.1 | Analyzing the Alarm |
| 2.2 | Checking Management Node |
| 2.3 | Checking Data Node |
| 2.4 | Checking SQL Node |
1 Introduction
This instruction concerns alarm handling.
1.1 Alarm Description
The IPWorks Storage Server periodically monitors the status of all the NDB nodes, once it detects any node is unreachable, it reports an alarm.
- Note:
-
- The Storage Server detects status of Data Node and SQL Node through both activated Management Nodes.
- If both Management Nodes are offline, the Storage Server is unable to detect status of Data Node and SQL Node.
The possible alarm causes and the corresponding fault reasons, fault locations, and impacts are described in Table 1.
|
Alarm Cause |
Description |
Fault Reason |
Fault Location |
Solution |
|---|---|---|---|---|
|
Management Node Issue |
Management Node is down. |
Management Node is down by maintenance activity or any error. |
NDB cluster |
See Section 2.2 |
|
Data Node Issue |
Data Node is down. |
|
NDB cluster |
See Section 2.3 |
|
SQL Node Issue |
SQL Node is down. |
NDB cluster |
See Section 2.4 |
- Note:
- An alarm can appear as a result of the maintenance activity.
The following are the consequences for the node if the alarm is not solved:
- When both Management Nodes or Data Nodes are down, ENUM server cannot provide service.
- When all SQL Nodes are unavailable while Data Nodes are available, SS is impacted.
- When ENUM cannot connect to either of the Management Nodes, an alarm "ENUM, Server Lost Connections of DB" is raised.
- When both Data Nodes or all SQL Nodes lost communication with SS, SS fails ipwcli provisioning.
- Note:
- SS monitors the status of SQL Nodes and Data Nodes through Management Node. If any node is down, an alarm that indicates the node is down is raised.
The alarm attributes are listed and explained in Table 2.
|
Attribute Name |
Attribute Value |
|---|---|
|
Major Type |
193 |
|
Minor Type |
860161 |
|
Managed Object Class |
IpworksEM |
|
Source |
ManagedElement=<Node Name>,SystemFunctions=1,Fm=1,FmAlarmModel=ipworksEM,FmAlarmType=ipworksEmMysqlClusterNodeUnreachable,Source= <One of the following>
|
|
Specific Problem |
Storage Server, MySQL Cluster Node Unreachable |
|
Event Type |
communicationsAlarm(2) |
|
Probable Cause |
x733CommunicationsSubsystemFailure(306) |
|
Additional Text |
|
|
Perceived Severity |
Major |
(1) <Product_UUID> is the
universally unique identifier (UUID) of machine that generates the
alarm. The value can be fetched from /sys/devices/virtual/dmi/id/product_uuid on the PL node.
1.2 Prerequisites
This section provides information on the documents, tools, and conditions that apply to the procedure.
1.2.1 Documents
Before starting this procedure, ensure that you have read the following documents:
1.2.2 Tools
No tools are required.
1.2.3 Conditions
No conditions.
2 Procedure
This section describes the procedure to follow when this alarm is received.
2.1 Analyzing the Alarm
Do the following at the maintenance center:
- Check Management Node
- Check Data Node
- Check SQL Node
2.2 Checking Management Node
To clear the alarm, do the following:
- Log on to SC-1.
ssh sc-1
- Check whether the Management Node is down.
Example:
# /etc/init.d/ipworks.mysql show-status
Connected to Management Server at: SC-1:1186 Cluster Configuration --------------------- [ndbd(NDB)] 2 node(s) id=27 @169.254.100.1 (mysql-5.6.31 ndb-7.4.12, Nodegroup: 0, *) id=28 @169.254.100.2 (mysql-5.6.31 ndb-7.4.12, Nodegroup: 0) [ndb_mgmd(MGM)] 2 node(s) id=1 (not connected, accepting connect from SC-1) id=2 @169.254.100.2 (mysql-5.6.31 ndb-7.4.12) [mysqld(API)] 24 node(s) ...
The example indicates that the SC-1 Management Node is down.
- Start the Management Node.
- If the Data Node and the SQL Node are running, execute
the following command:
#/etc/init.d/ipworks.mysql start-mgmd
- If the Data Node and the SQL Node are down, execute
the following commands:
#/etc/init.d/ipworks.mysql start-mgmd
#/etc/init.d/ipworks.mysql start-ndbd
#/etc/init.d/ipworks.mysql start-sqlnode
If the operator needs to initialize the Data Node, execute the following command instead:
#/etc/init.d/ipworks.mysql start-ndbd-initial
- Note:
- The initialization deletes all the data in the Data Node.
- If the Data Node and the SQL Node are running, execute
the following command:
- Confirm that the alarm has ceased (within 60 seconds). If the alarm remains, consult the next level of maintenance support. Further actions are outside the scope of this instruction.
2.3 Checking Data Node
To clear the alarm, do the following:
- Check whether the Data Node is down.
Example:
SC-1:~#/etc/init.d/ipworks.mysql show-status
Connected to Management Server at: SC-1:1186 Cluster Configuration --------------------- [ndbd(NDB)] 2 node(s) id=27 (not connected, accepting connect from SC-1) id=28 @169.254.100.2 (mysql-5.6.31 ndb-7.4.12, Nodegroup: 0, *) [ndb_mgmd(MGM)] 2 node(s) id=1 @169.254.100.1 (mysql-5.6.31 ndb-7.4.12) id=2 @169.254.100.2 (mysql-5.6.31 ndb-7.4.12) [mysqld(API)] 24 node(s) id=3 @169.254.100.1 (mysql-5.6.31 ndb-7.4.12) id=4 (not connected, accepting connect from SC-2) ...
This example output indicates that the Data Node (id=27) is down.
- If the Data Node is down, execute the following command
to start it:
Example:
SC-1:~#/etc/init.d/ipworks.mysql start-ndbd
If the operator needs to initialize the Data Node, execute the following command instead:
#/etc/init.d/ipworks.mysql start-ndbd-initial
- Note:
- The initialization deletes all the data in the Data Node.
- Confirm that the alarm has ceased. If the alarm remains, consult the next level of maintenance support. Further actions are outside the scope of this instruction.
2.4 Checking SQL Node
To clear the alarm, do the following:
- Check whether the SQL Node is down.
Example:
SC-1:~#/etc/init.d/ipworks.mysql show-status
Connected to Management Server at: SC-1:1186 Cluster Configuration --------------------- [ndbd(NDB)] 2 node(s) id=27 @169.254.100.1 (mysql-5.6.27 ndb-7.4.8, Nodegroup: 0, *) id=28 @169.254.100.2 (mysql-5.6.27 ndb-7.4.8, Nodegroup: 0) [ndb_mgmd(MGM)] 2 node(s) id=1 @169.254.100.1 (mysql-5.6.27 ndb-7.4.8) id=2 @169.254.100.2 (mysql-5.6.27 ndb-7.4.8) [mysqld(API)] 24 node(s) id=3 (not connected, accepting connect from SC-1) id=4 (not connected, accepting connect from SC-2) ...
This example output indicates that the SQL Nodes (id=3 and id=4) are down.
- If the SQL Node is down, execute the following command
to start it:
Example:
SC-1:~#/etc/init.d/ipworks.mysql start-sqlnode
- Confirm that the alarm has ceased. If the alarm remains, consult the next level of maintenance support. Further actions are outside the scope of this instruction.

Contents