Storage Server, The MySQL Replication for Geographic Redundancy Failed
IPWorks

Contents

1Introduction
1.1Alarm Description
1.2Prerequisites

2

Procedure
2.1Analyzing the Alarm
2.2Check Network Connectivity
2.3Checking Management Node
2.4Checking Data Node
2.5Checking SQL Node

1   Introduction

This instruction concerns alarm handling.

1.1   Alarm Description

In Geographic Redundancy scenario, the replication status between two sites is monitored by the script, and once it detects any failure in replication procedure, it reports an alarm.

The alarm attributes are listed and explained in Table 1.

Table 1    Alarm Attributes

Attribute Name

Attribute Value

Major Type

193

Minor Type

860163

Managed Object Class

IpworksEM

Source

ManagedElement=<Node Name>,SystemFunctions=1,Fm=1,FmAlarmModel=ipworksEM,FmAlarmType=ipworksEmMysqlGeoReplicationFailure,Source= <SC IP Address>

Specific Problem

Storage Server, The MySQL Replication for Geographic Redundancy Failed

Event Type

communicationsAlarm(2)

Probable Cause

x733RemoteNodeTransmissionError(342)

Additional Text

This alarm is issued because the MySQL slave on %s[the node IP Address] detected a replication failure,[Slave_IO_Errno:%s | Slave_SQL_Errno:%s]

Perceived Severity

Major

1.2   Prerequisites

This section provides information on the documents, tools, and conditions that apply to the procedure.

1.2.1   Documents

Before starting this procedure, ensure that you have read the following documents:

1.2.2   Tools

No tools are required.

1.2.3   Conditions

No conditions.

2   Procedure

This section describes the procedure to follow when this alarm is received.

2.1   Analyzing the Alarm

Do the following at the maintenance center:

  1. Check Network Connectivity
  2. Check Management Node
  3. Check Data Node
  4. Check SQL Node

2.2   Check Network Connectivity

Take Site A and Site B as example and they are configured as Geographic Redundancy.

Note:  
The “ping” command requires that ICMP message is NOT blocked by Gateway/Route in the involved network.

  1. Log on to SC-1 on Site A.
  2. Check whether OAM MIP is up on Site A.

    # ping <MIP_OAM_IP of Site A>

    # ping <MIP_OAM_IP of Site B>

  3. Check whether provision MIP is up on Site A.

    # ping <MIP_PROV_IP of Site A>

    # ping <MIP_PROV_IP of Site B>

  4. Log on to SC-1 on Site B.
  5. Check whether OAM MIP is up on Site B.

    # ping <MIP_OAM_IP of Site A>

    #ping <MIP_OAM_IP of Site B>

  6. Check whether provision MIP is up on Site B.

    # ping <MIP_PROV_IP of Site A>

    #ping <MIP_PROV_IP of Site B>

2.3   Checking Management Node

To clear the alarm, do the following:

  1. Log on to SC-1.

    # ssh SC-1

  2. Check whether the Management Node is down.

    Example:

    SC-1:~#/etc/init.d/ipworks.mysql show-status

    Connected to Management Server at: SC-1:1186
    Cluster Configuration
    ---------------------
    [ndbd(NDB)]           2 node(s)
    id=27   @169.254.101.1  (mysql-5.6.27 ndb-7.4.8, Nodegroup: 0, *)
    id=28   @169.254.101.2  (mysql-5.6.27 ndb-7.4.8, Nodegroup: 0)
    
    [ndb_mgmd(MGM)]               2 node(s)
    id=1 (not connected, accepting connect from SC-1)
    id=2    @169.254.101.2  (mysql-5.6.27 ndb-7.4.8)
    
    [mysqld(API)]   24 node(s)
    ...
    

    The example indicates that the SC-1 Management Node is down.

  3. Start the Management Node.
    • If the Data Node and the SQL Node are running, execute the following command:

      #/etc/init.d/ipworks.mysql start-mgmd

    • If the Data Node and the SQL Node are down, execute the following commands:

      #/etc/init.d/ipworks.mysql start-mgmd

      #/etc/init.d/ipworks.mysql start-ndbd

      #/etc/init.d/ipworks.mysql start-sqlnode

    If the operator needs to initialize the Data Node, execute the following command instead:

    #/etc/init.d/ipworks.mysql start-ndbd-initial

    Note:  
    The initialization deletes all the data in the Data Node.

  4. Confirm that the alarm has ceased (within 60 seconds). If the alarm remains, consult the next level of maintenance support. Further actions are outside the scope of this instruction.

2.4   Checking Data Node

To clear the alarm, do the following:

  1. Check whether the Data Node is down.

    Example:

    SC-1:~#/etc/init.d/ipworks.mysql show-status
    Connected to Management Server at: localhost:1186
    Cluster Configuration
    ---------------------
    [ndbd(NDB)]      2 node(s)
    id=27 (not connected, accepting connect from SC-1)
    id=28     @169.254.100.2  (mysql-5.6.27 ndb-7.4.8, Nodegroup: 0, *)
    
    [ndb_mgmd(MGM)]      2 node(s)
    id=1       @169.254.101.1  (mysql-5.6.27 ndb-7.4.8)
    id=2       @169.254.101.2  (mysql-5.6.27 ndb-7.4.8)
    
    [mysqld(API)]    24 node(s)
    id=3       @169.254.101.1  (mysql-5.6.27 ndb-7.4.8)
    id=4 (not connected, accepting connect from SC-2)
    ...

    This example output indicates that the Data Node (id=27) is down.

  2. If the Data Node is down, execute the following command to start it:

    Example:

    SC-1:~#/etc/init.d/ipworks.mysql start-ndbd

    If the operator needs to initialize the Data Node, execute the following command instead:

    #/etc/init.d/ipworks.mysql start-ndbd-initial

    Note:  
    The initialization deletes all the data in the Data Node.

  3. Confirm that the alarm has ceased. If the alarm remains, consult the next level of maintenance support. Further actions are outside the scope of this instruction.

2.5   Checking SQL Node

To clear the alarm, do the following:

  1. Check whether the SQL Node is down.

    Example:

    SC-1:~#/etc/init.d/ipworks.mysql show-status

    Connected to Management Server at: SC-1:1186
    Cluster Configuration
    ---------------------
    [ndbd(NDB)]     2 node(s)
    id=27  @169.254.101.1  (mysql-5.6.27 ndb-7.4.8, Nodegroup: 0, *)
    id=28  @169.254.101.2  (mysql-5.6.27 ndb-7.4.8, Nodegroup: 0)
    
    [ndb_mgmd(MGM)] 2 node(s)
    id=1  @169.254.101.1  (mysql-5.6.27 ndb-7.4.8)
    id=2  @169.254.101.2  (mysql-5.6.27 ndb-7.4.8)
    
    [mysqld(API)]   24 node(s)
    id=3 (not connected, accepting connect from SC-1)
    id=4 (not connected, accepting connect from SC-2)
    ...

    This example output indicates that the SQL Nodes (id=3 and id=4) are down.

  2. If the SQL Node is down, execute the following command to start it:

    Example:

    SC-1:~#/etc/init.d/ipworks.mysql start-sqlnode

  3. Confirm that the alarm has ceased. If the alarm remains, consult the next level of maintenance support. Further actions are outside the scope of this instruction.


Copyright

© Ericsson AB 2017, 2018. All rights reserved. No part of this document may be reproduced in any form without the written permission of the copyright owner.

Disclaimer

The contents of this document are subject to revision without notice due to continued progress in methodology, design, and manufacturing. Ericsson shall have no liability for any error or damage of any kind resulting from the use of this document.


    Storage Server, The MySQL Replication for Geographic Redundancy Failed         IPWorks