Storage Server, MySQL Cluster Node Unreachable
IPWorks

Contents

1Introduction
1.1Alarm Description
1.2Prerequisites

2

Procedure
2.1Analyzing the Alarm
2.2Checking Management Node
2.3Checking Data Node
2.4Checking SQL Node

1   Introduction

This instruction concerns alarm handling.

1.1   Alarm Description

The IPWorks Storage Server periodically monitors the status of all the NDB nodes, once it detects any node is unreachable, it reports an alarm.

Note:  
  • The Storage Server detects status of Data Node and SQL Node through both activated Management Nodes.
  • If both Management Nodes are offline, the Storage Server is unable to detect status of Data Node and SQL Node.

The possible alarm causes and the corresponding fault reasons, fault locations, and impacts are described in Table 1.

Table 1    Alarm Causes

Alarm Cause

Description

Fault Reason

Fault Location

Solution

Management Node Issue

Management Node is down.

Management Node is down by maintenance activity or any error.

NDB cluster

See Section 2.2

Data Node Issue

Data Node is down.

  • Data Node is down for maintenance activity, configuration issue, or memory issue.

  • Data Node lost communication with one of the Management Nodes.

NDB cluster

See Section 2.3

SQL Node Issue

SQL Node is down.

  • SQL Node is down for maintenance activity or other errors.

  • SQL Node lost communication with one of the Management Nodes.

NDB cluster

See Section 2.4

Note:  
An alarm can appear as a result of the maintenance activity.

The following are the consequences for the node if the alarm is not solved:

Note:  
SS monitors the status of SQL Nodes and Data Nodes through Management Node. If any node is down, an alarm that indicates the node is down is raised.

The alarm attributes are listed and explained in Table 2.

Table 2    Alarm Attributes

Attribute Name

Attribute Value

Major Type

193

Minor Type

860161

Managed Object Class

IpworksEM

Source

ManagedElement=<Node Name>,SystemFunctions=1,Fm=1,FmAlarmModel=ipworksEM,FmAlarmType=ipworksEmMysqlClusterNodeUnreachable,Source= <One of the following>


  • If Management Node is down: Storage Server:<SC hostname>:MGM Node

  • If SQL Node is down: <SC hostname>:ManageNode:<SC hostname>:SQL Node

  • If Data Node is down: <SC hostname>:ManageNode:<SC hostname>:Data Node

Specific Problem

Storage Server, MySQL Cluster Node Unreachable

Event Type

communicationsAlarm(2)

Probable Cause

x733CommunicationsSubsystemFailure(306)

Additional Text

  • If Management Node is down: "This alarm is issued when the MySQL Cluster [<SC hostname>:MGM Node ] is down or unreachable from [ <SC hostname> ] Storage Server";uuid:<Product_UUID>(1)

  • If SQL Node or Data Node is down: "This alarm is issued when the MySQL Cluster [ <SC hostname>:<SQL or Data Node> ] is down or unreachable from [ <SC hostname>] ManageNode";uuid:<Product_UUID>(1)

Perceived Severity

Major

(1)  <Product_UUID> is the universally unique identifier (UUID) of machine that generates the alarm. The value can be fetched from /sys/devices/virtual/dmi/id/product_uuid on the PL node.


1.2   Prerequisites

This section provides information on the documents, tools, and conditions that apply to the procedure.

1.2.1   Documents

Before starting this procedure, ensure that you have read the following documents:

1.2.2   Tools

No tools are required.

1.2.3   Conditions

No conditions.

2   Procedure

This section describes the procedure to follow when this alarm is received.

2.1   Analyzing the Alarm

Do the following at the maintenance center:

  1. Check Management Node
  2. Check Data Node
  3. Check SQL Node

2.2   Checking Management Node

To clear the alarm, do the following:

  1. Log on to SC-1.

    ssh sc-1

  2. Check whether the Management Node is down.

    Example:

    # /etc/init.d/ipworks.mysql show-status

    Connected to Management Server at: SC-1:1186
    Cluster Configuration
    ---------------------
    [ndbd(NDB)]           2 node(s)
    id=27   @169.254.100.1  (mysql-5.6.31 ndb-7.4.12, Nodegroup: 0, *)
    id=28   @169.254.100.2  (mysql-5.6.31 ndb-7.4.12, Nodegroup: 0)
    
    [ndb_mgmd(MGM)]               2 node(s)
    id=1 (not connected, accepting connect from SC-1)
    id=2    @169.254.100.2  (mysql-5.6.31 ndb-7.4.12)
    
    [mysqld(API)]   24 node(s)
    ...

    The example indicates that the SC-1 Management Node is down.

  3. Start the Management Node.
    • If the Data Node and the SQL Node are running, execute the following command:

      #/etc/init.d/ipworks.mysql start-mgmd

    • If the Data Node and the SQL Node are down, execute the following commands:

      #/etc/init.d/ipworks.mysql start-mgmd

      #/etc/init.d/ipworks.mysql start-ndbd

      #/etc/init.d/ipworks.mysql start-sqlnode

      If the operator needs to initialize the Data Node, execute the following command instead:

      #/etc/init.d/ipworks.mysql start-ndbd-initial

      Note:  
      The initialization deletes all the data in the Data Node.

  4. Confirm that the alarm has ceased (within 60 seconds). If the alarm remains, consult the next level of maintenance support. Further actions are outside the scope of this instruction.

2.3   Checking Data Node

To clear the alarm, do the following:

  1. Check whether the Data Node is down.

    Example:

    SC-1:~#/etc/init.d/ipworks.mysql show-status

    Connected to Management Server at: SC-1:1186
    Cluster Configuration
    ---------------------
    [ndbd(NDB)]      2 node(s)
    id=27 (not connected, accepting connect from SC-1)
    id=28     @169.254.100.2  (mysql-5.6.31 ndb-7.4.12, Nodegroup: 0, *)
    
    [ndb_mgmd(MGM)]      2 node(s)
    id=1       @169.254.100.1  (mysql-5.6.31 ndb-7.4.12)
    id=2       @169.254.100.2  (mysql-5.6.31 ndb-7.4.12)
    
    [mysqld(API)]    24 node(s)
    id=3       @169.254.100.1  (mysql-5.6.31 ndb-7.4.12)
    id=4 (not connected, accepting connect from SC-2)
    ...

    This example output indicates that the Data Node (id=27) is down.

  2. If the Data Node is down, execute the following command to start it:

    Example:

    SC-1:~#/etc/init.d/ipworks.mysql start-ndbd

    If the operator needs to initialize the Data Node, execute the following command instead:

    #/etc/init.d/ipworks.mysql start-ndbd-initial

    Note:  
    The initialization deletes all the data in the Data Node.

  3. Confirm that the alarm has ceased. If the alarm remains, consult the next level of maintenance support. Further actions are outside the scope of this instruction.

2.4   Checking SQL Node

To clear the alarm, do the following:

  1. Check whether the SQL Node is down.

    Example:

    SC-1:~#/etc/init.d/ipworks.mysql show-status

    Connected to Management Server at: SC-1:1186
    Cluster Configuration
    ---------------------
    [ndbd(NDB)]     2 node(s)
    id=27  @169.254.100.1  (mysql-5.6.27 ndb-7.4.8, Nodegroup: 0, *)
    id=28  @169.254.100.2  (mysql-5.6.27 ndb-7.4.8, Nodegroup: 0)
    
    [ndb_mgmd(MGM)] 2 node(s)
    id=1  @169.254.100.1  (mysql-5.6.27 ndb-7.4.8)
    id=2  @169.254.100.2  (mysql-5.6.27 ndb-7.4.8)
    
    [mysqld(API)]   24 node(s)
    id=3 (not connected, accepting connect from SC-1)
    id=4 (not connected, accepting connect from SC-2)
    ...

    This example output indicates that the SQL Nodes (id=3 and id=4) are down.

  2. If the SQL Node is down, execute the following command to start it:

    Example:

    SC-1:~#/etc/init.d/ipworks.mysql start-sqlnode

  3. Confirm that the alarm has ceased. If the alarm remains, consult the next level of maintenance support. Further actions are outside the scope of this instruction.


Copyright

© Ericsson AB 2017, 2018. All rights reserved. No part of this document may be reproduced in any form without the written permission of the copyright owner.

Disclaimer

The contents of this document are subject to revision without notice due to continued progress in methodology, design, and manufacturing. Ericsson shall have no liability for any error or damage of any kind resulting from the use of this document.


    Storage Server, MySQL Cluster Node Unreachable         IPWorks