IPWorks Troubleshooting Guideline

Contents

1Introduction
1.1Prerequisites
1.1.1Tools
1.1.2Conditions
1.2Related Information

2

Tools
2.1Toolbox
2.1.1ps
2.1.2ipw-ctr
2.1.3kill
2.1.4rndc
2.1.5named-checkconf
2.1.6MySQL Benchmark Tool
2.1.7ifconfig
2.1.8netstat
2.1.9dig
2.1.10mysql
2.1.11df
2.1.12trace
2.2Alarm and Notification Viewer
2.3CM Attribute Viewer
2.3.1Storage Server
2.3.2DNS Server Manager
2.3.3DNS Server
2.3.4ActiveSelect DNS Server
2.3.5ENUM Server
2.3.6AAA Server
2.3.7AAA Server Manager
2.3.8AAA Load Unbalanced in eVIP Scenario
2.3.9MySQL NDB Cluster
2.4Performance Management Viewer

3

Troubleshooting Functions
3.1Alarm
3.2Logging
3.2.1Error Log File Type
3.2.2Application-specific Logs
3.2.3Storage Server
3.2.4Server Manager
3.2.5DNS Server
3.2.6ActiveSelect DNS Server
3.2.7ENUM Server
3.2.8AAA Server
3.2.9MySQL NDB Cluster
3.2.10Backup and Restore
3.2.11Scaling
3.3Core Dumps
3.3.1Locating Core File
3.3.2Core Dump Limitation
3.3.3Defining Name of Core Dump File
3.3.4Analyzing Core Dump File
3.4Performance Measurements
3.5Software Version Checks
3.6Log Level Changes
3.7Restart
3.8Server Status Checks
3.9IPWorks Common Component

4

Troubleshooting Procedure

5

Problem-Solving Procedure
5.1IPWorks VNF Stack Deployment
5.1.1Server Groups Forbidden
5.1.2VLAN Conflicts
5.1.3Failed to Create Network
5.1.4Policy Problem
5.1.5Failed to Delete HEAT Stack
5.2IPWorks Upgrade
5.2.1Error: Could Not Find Local Upgrade Package
5.2.2Error: Failed to Remove Upgrade Package
5.2.3Failed to Restore System Data after Upgrade Failure
5.2.4Error: Campaign Failed Verification
5.2.5Login Fails during Rebooting SC
5.2.6Health Check Hang
5.3IPWCLI
5.3.1Network Issues
5.3.2Provisioning Issues
5.3.3Provisioning Rate Too Low
5.4ECLI
5.4.1ERROR: Transaction validation failed with error code: ComFailure
5.5IPWorks DNS Management
5.5.1Trouble Symptoms
5.5.2Locating Fault
5.5.3Confirming Solution
5.6Storage Server
5.6.1Failed to Stop/Start/Restart Storage Server by ipw-ctr
5.6.2Storage Server Not Listen on the Port
5.7Server Manager
5.7.1Server Manager Failed to Start
5.7.2Problem in Deleting Server Instance
5.7.3Network Unreachable Exception
5.7.4Access Denied Exception
5.7.5Connection Time-out Exception
5.7.6Failed Attempting to Get Machine Information
5.7.7New or Renamed Object Already Exists Exception
5.7.8Permission Denied Exception
5.7.9Cannot Stop the Server Manager
5.7.10Failed Sending Command to the DNS Server
5.7.11Cannot Find Script
5.7.12Cannot Execute Message When Running a Script
5.7.13IPWorks CLI Displays DNS Records Slowly
5.7.14Large Data Queries Cause Memory Problems
5.7.15DNS Server Performance Drops during Queries
5.7.16Status of Server in Interface Disagrees with Current Status
5.7.17RNDC Statistics History Is Lost
5.8DNS Server
5.8.1Master Server Errors
5.8.2Slave Server Errors
5.8.3DNS Server Fails to Start after System Boot
5.8.4Slave Server Fails to Transfer Zone Data from the Master
5.8.5Server Query Problems
5.8.6Operations Protected by TSIG Fail
5.8.7Incorrect Data Returned for Queries
5.8.8Bad Data from a Malicious External DNS Server
5.8.9Bad Data from a Roaming Partner
5.8.10External Clients Are Unable to Query the Server
5.8.11Dynamic DNS Update Failed
5.8.12Authoritative Server for Dynamic Zone Crashes
5.8.13Rename the DNS Server
5.9ActiveSelect DNS Server
5.9.1Order of Returned Addresses Changes
5.9.2Address Is Displayed in Responses When the Resource Is Down
5.9.3Address Does Not Appear in Responses When Resource Is Up
5.10ENUM Server
5.10.1ENUM Server Connectivity Errors
5.10.2Failed to Stop/Start/Restart ENUM Server by ipw-ctr
5.10.3Error Responses to ENUM Requests
5.10.4Errors Related to ERH
5.10.5NP Traffic Loss
5.11ENUM Front End
5.11.1No LDAP Connection
5.11.2Server Fail in ENUM Response
5.11.3Failed to Cache ENUMDnSched to Local MySQL Cluster (for ENUM)
5.11.4Failed to Cache ENUMDnSched to Local MySQL Cluster (for ENUM FE Sync)
5.11.5Failed to Refresh EnumDnRange
5.11.6Cannot Find ENUM Zone
5.12Radius AAA Server
5.12.1Radius AAA Server Process Not Running
5.12.2Unreachable Radius Traffic
5.12.3AAA Rejects Authentication or Authorization Request
5.12.4AAA Does Not Proxy Radius Message
5.12.5AAA Rejects EAP-AKA/SIM Authentication Request
5.13EPC AAA Server
5.13.1EPC AAA Server Process Not Running
5.13.2C-diameter Stack Not Running
5.13.3Ineffective Diameter over SCTP
5.13.4High failure ratio caused by discarding DERs
5.14License Problems
5.14.1License Control Problem
5.14.2Clear the Emergency Unlock Alarm
5.15MySQL NDB Cluster
5.15.1SQL Node Not Started
5.15.2Management Node Down
5.15.3Data Node Down
5.15.4SQL Node Down
5.15.5MySQL NDB Cluster Status Abnormal
5.15.6MySQL NDB Cluster Cannot Work Normally
5.15.7SQL Node Start Failure with Wrong Folder Permission
5.15.8MySQL Data Lost on an SC
5.16Backup and Restore
5.16.1No Enough Space in the Disk
5.16.2Complete Backup or Restore Failed due to MySQL NDB Process Not Started
5.16.3Restart Server Failed
5.16.4Slow Backup or Restore Operation
5.17C-Diameter
5.17.1C-Diameter OperState is DISABLED
5.17.2C-Diameter Stack Cannot Listen the Listening Port (3868)
5.18Geographic Redundancy
5.18.1MySQL Replication for Geographic Redundancy Failed on One Site
5.18.2MySQL Replication for Geographic Redundancy Failed On All Sites
5.19Data Migration
5.19.1Backup failed
5.19.2Required configuration files did not migrate from HP to IPWorks 1
5.19.3Files missing in the migration process
5.19.4Failed to import the netconf xml file to ECIM with netconf command
5.20IPWorks Scaling
5.20.1Unable Scale-In PL in ECLI
5.20.2Failed to Start Scale-Out VM on KVM
5.20.3Unable Scale-Out PL for Core Middleware
5.20.4Unable Scale-Out PL for SS7CAF
5.20.5AAA Cannot Start in Scale-Out PL
5.20.6Restore User Backup in Superset Cluster
5.20.7Scale-Out Failure Triggers Scale-Out/Scale-In Cyclically
5.21IPWorks Deployment for KVM
5.21.1Both SCs Cyclic Reboot after Deployment
5.21.2Failed to Execute Scripts ipwInit.sh after a Re-deployment of IPWorks for KVM
5.22IPWorks Deployment for CEE
5.22.1Fault Symptoms
5.22.2Locating Fault
5.22.3Confirming Solution
5.23"COM SA, AMF Component Instantiation Failed" on SC-1
5.23.1Trouble Symptoms
5.23.2Locating Fault
5.23.3Confirming Solution
5.24IPWorks Workflows Problems
5.24.1Authentication Failed
5.24.2Parameter Value Is Wrong
5.24.3Missing File in Configuration Directory
5.24.4Environment Has Been Used
5.24.5IPWorks lm or sql init Failed
5.24.6Missing Parameter Value
5.24.7Termination Script Missed in IPWorks
5.24.8Workflow Gets no Stacks

6

Trouble Reporting

7

Appendix A: Example of PM, FM, LM, and AMF Logs

8

Appendix B: Capturing and Tracing the Messages
8.1Capturing and Tracing the Access-Request Messages
8.2Capturing and Tracing the Accounting-request Messages

Reference List

1   Introduction

This document describes how to perform the troubleshooting procedure in the Ericsson IPWorks product.

The purpose of this document is to provide information on how to troubleshoot and diagnose problems found in IPWorks. It also describes the available troubleshooting tools and how to use them.

The following procedures are NOT covered in this document:

1.1   Prerequisites

This section describes the prerequisites for this document.

This guide is intended for system and network administrators working with Ericsson IPWorks. It is assumed that users of this document are familiar with performing operations within Operation and Maintenance (O&M) in general. The following prior knowledge is required:

1.1.1   Tools

This section lists the tools that can be used to troubleshoot the IPWorks.

For more information about these tools, see Section 2 Tools.

1.1.2   Conditions

The following conditions must apply:

1.2   Related Information

Definition and explanation of acronyms and terminology, trademark information, and typographic conventions can be found in the following documents:

2   Tools

This section describes the tools that can be used to troubleshoot the IPWorks.

2.1   Toolbox

2.1.1   ps

Use the ps command to obtain information about a process:

# ps -ef | grep <name>

Table 1 lists the corresponding name for each IPWorks component. Select the appropriate name from the table. The "Node" column indicates on which node the command is executed.

Table 1    Process Names

Component

Name

Node

DNS Server

named

Payload

* DNS Server Manager

ipwdnssm

Payload

ASDNS Monitor

asdnsmon

Payload

* ASDNS Monitor Server Manager

ipwasdnsmonsm

Payload

ENUM Server

ipwenum

Payload

* ENUM FE Sync

ipwfesync

Payload

EPC AAA Server

ipwa3d

Payload

*AAA Server Manager

aaasm

Payload

* Storage Server

ipwss

System Controller

MySQL NDB Cluster Management Node

ndb_mgmd

System Controller

MySQL NDB Cluster Data Node

ndbmtd

System Controller

MySQL NDB Cluster SQL Node

mysqld

System Controller

DHCP Server

dhcpd

Payload

* DHCP Server Manager

ipwdhcpv4sm

Payload

Note:  
* denotes a Java process

The appropriate line for the process shows the command (on the right) either starting with the name shown in Table 1 or, for Java processes, starting with java followed by -DApp=<process name> in the java arguments.

For example, to find the pid for the DNS Server Manager:

# ps -ef | grep ipwdnssm | grep -v grep
root 32479 1 0 Mar13 ? 00:53:51 java -DApp=ipwdnssm -mx128m
-DTCPSTARTPORT=9701 -DTCPENDPORT=9708 -Djboss.server.name=DNS15 -DMULTICASTAD
DRESS=224.0.0.1 -DMULTICASTPORT=15663 -DBIND_INTERFACE_ADDRESS=169.254.43.15
-Djava.net.preferIPv4Stack=true -classpath /opt/ipworks/sm/scripts:/opt/ipworks
/common/java/ipwcommon.jar:/opt/ipworks/sm/java/ipwsm.jar:/opt/ipworks/common/ja
va/log4j-1.2.15.jar:/opt/ipworks/common/java/ipwse.jar:/opt/ipworks/common/java
/dom4j-1.6.1.jar:/home/mmas/javaoam/lib/shoal-gms-impl-1.5.29.ericsson.7.jar:/
home/mmas/javaoam/lib/javaoam-coremw-spi-R3E05.jar:/home/mmas/javaoam/lib/javaoam
-core-R3E05.jar:/home/mmas/javaoam/lib/grizzly-utils-1.9.24.jar:/home/mmas/javaoam
/lib/grizzly-framework-1.9.24.jar:/opt/ipworks/common/java/AdventNetSnmp.jar:/opt
/ipworks/common/java/AdventNetSnmpAgent.jar ericsson.ipworks.sm.ServerManager ServerType=DNS

The desired pid is 32479.

2.1.2   ipw-ctr

Users can use ipw-ctr to start, stop, or check the status of IPWorks services (such as SS, DNS, ASDNS, ENUM).

Usage:

ipw-ctr <option> <component> [<hostname>]

For more information about this tool, refer to the section Service Life Cycle Management in IPWorks Configuration Management.

If certain services cannot be stopped by ipw-ctr, use kill command to terminate the process.

2.1.3   kill

For the services that cannot be stopped by ipw-ctr smoothly, try to use the kill command to terminate the processes.

Note:  
Use ipw-ctr to stop the services after the kill command is executed, because the services are started by AMF automatically when the processes are terminated by the kill command.

Users can stop the process using the kill command as follows:

  1. Use the ps command as described in Section 2.1.1 to identify the pid of the process.
  2. Use the kill command to send a SIGTERM signal to the process as follows:

    # kill <pid>
    or:
    # kill -15 <pid>
    or:
    # kill -TERM <pid>

    Each of these commands has the same effect, giving the process an opportunity to terminate gracefully.

  3. Use the ps command again to check if the process has gone away.
  4. If the process is still running, use the kill command to send a SIGKILL signal to the process as follows:

    # kill -9 <pid>
    or:
    # kill -KILL <pid>

    Each of these commands has the same effect, forcing the process to terminate.

2.1.4   rndc

The following table lists the rndc commands for DNS service.

The following commands are executed on the PL nodes on which DNS service is running.

Table 2    DNS Server Commands

Operation

Shell Command

Reload DNS Configuration

rndc -s 0 reload

Dump database

rndc -s 0 dumpdb(1)

Dump statistics

rndc -s 0 stats

Toggle query logging

rndc -s 0 querylog

Set debugging level 0 debug-level

rndc -s 0 notrace


rndc -s 0 trace <debug-level>(2)

(1)  If the data size in cache is too large, it is possible the named process crashed after running “rndc -s 0 dumpdb”. This is a BUG of BIND. Before the bug is fixed, if the process is crashed, restart the DNS process.

(2)  Where: <debug-level> is integer ranging from 1 to 99.


2.1.5   named-checkconf

named-checkconf is used to do validation for zone configuration file in path /etc/ipworks/<host_name>/dns on all PL nodes. The <host_name> is the host name of PL node, for example, PL-3.

Here is the example of using named-checkconf to validate the zone configuration file on PL-3 node:

  1. Go to the location of DNS configuration DB file.

    #cd /etc/ipworks/PL-3/dns

  2. Generate the test report.

    #named-checkconf -z named.conf > /tmp/report

  3. Abstract error message.

    #grep -i -e 'error' -e 'unexpected' -e 'unknown option' /tmp/report

Table 3    Example Message and Corresponding Actions

Error Message

Actions

Description

named.conf:22: unknown option '.'

1. Clear the syntax error in the 22nd row of the file named.conf.


2. Use named-checkconf to check if the error still exists.


3. Reload the DNS configuration:


#rndc reload

Named file is located in the file path /etc/ipworks/PL-3/dns.


Clear syntax error and check if it still exists. If it is cleared successfully, reload the DNS configuration dynamically.

dns_rdata_fromtext: db.ims.etisalat.ae.Site1_NNIView:27: syntax error zone ims.etisalat.ae/IN: loading from master file db.ims.etisalat.ae.Site1_NNIView failed: syntax error zone ims.etisalat.ae/IN: not loaded due to errors.

1. Clear the syntax error in the 27th row of the db file db.ims.etisalat.ae.Site1_NNIView.


2. Use named-checkconf to check if the error still exists.


3. Reload the DNS configuration:


#rndc reload

There are some syntax errors in the db file.


Clear syntax error and check if it still exists. If it is cleared successfully, reload the DNS configuration dynamically.

The command returns nothing if there is no error.

2.1.6   MySQL Benchmark Tool

MySQL Benchmark Tool is used to test the Storage Server provisioning rate. For example, see its use in Section 5.3.3.

2.1.7   ifconfig

ifconfig is used to check the status of configured interfaces. For example, see its use in Section 5.7.3.

2.1.8   netstat

netstat is used to check routing and router settings. For example, see its use in Section 5.7.3.

2.1.9   dig

Attention!

User shall not dig from any SC to the VIP traffic address of PL to verify DNS/ENUM function. Because SC is in OAM subnet and PL is in signaling subnet, these 2 subnets are totally separated.

The Domain Information Groper (dig) is a tool for interrogating DNS servers. It performs DNS queries and displays the answers returned from the DNS servers queried. dig is useful to troubleshoot DNS problems because of its flexibility, ease of use and clarity of output. Other lookup tools tend to have less functionality than dig. Although dig is normally used with command line arguments, it also has a batch mode of operation for reading lookup requests from a file.

For more information, use dig -h command or go to dig man page http://linux.die.net/man/1/dig.

dig utility is commonly used to diagnose DNS problems.

Note:  
The IPWorks dig utility is installed in /opt/ipworks/dns/usr/bin. The OS provides a native dig utility in /usr/bin.

It is recommend that replacing the native utility as follows if this has not already been done:

# cd /usr/bin

# mv dig dig.orig

Example:

dig @10.0.0.3 recl.example.com

The resulting dig output is as follows:

1   ; <<>> DiG 9.9.8-P2 <<>> @10.0.0.3 rec1.example.com
2   ;; global options: printcmd
3   ;; Got answer:
4   ;; ->>HEADER<<- opcode: QUERY, 
        status: NOERROR, id: 175
5   ;; flags: qr aa rd ra; QUERY: 1, ANSWER: 1,
        AUTHORITY: 1, ADDITIONAL: 0
6
7   ;; QUESTION SECTION:
8   ;rec1.example.com.         IN   A
9
10  ;; ANSWER SECTION:
11  rec1.example.com.   300   IN  A   10.2.3.4
12
13  ;; AUTHORITY SECTION:
14  example.com.        86400 IN  NS  mydns.example.com.
15
16  ;; Query time: 13 msec
17  ;; SERVER: 10.0.0.3#53(10.0.0.3)
18  ;; WHEN: Thu Dec 29 22:37:43 2005
19  ;; MSG SIZE  rcvd: 69

Starting with line 1, dig shows its version and the command arguments given.

Line 4 contains the following header information of the DNS packet that answers our query:

Lines 7 through 14 contain the data in the DNS sections as outlined in line 5.

Line 16 shows the round trip time for processing the query.

Line 17 shows the address of the DNS Server that was queried.

Line 18 shows the date and time of the query.

Line 19 shows the packet size of the DNS response.

2.1.10   mysql

The mysql utility is a command line utility that provides direct access to MySQL databases.

The full pathname of the utility is /usr/local/mysql/bin/mysql.

The user can use mysql to inspect the status and content of the IPWorks databases.

Note:  
Do NOT use unfamiliar commands or attempt to modify anything unless fully understand the consequences.

Use the following command to start mysql:
# /usr/local/mysql/bin/mysql -P 3307 --protocol=tcp

Welcome to the MySQL monitor. Commands end with ; or \g.
Your MySQL connection id is 15
Server version: 5.6.31-ndb-7.4.12-cluster-commercial-advanced-log \
MySQL Cluster Server - Advanced Edition (Commercial)
Copyright (c) 2000, 2015, Oracle and/or its affiliates. All rights reserved.
Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.
Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

mysql>

Use the following command to select a database:
mysql> use <database-name>

For example, to select the Storage Server database:
mysql> use ipworks

Reading table information for completion
of table and column names
You can turn off this feature
to get a quicker startup with -A

Database changed
mysql>

Use the following command to close mysql and return to the shell prompt:
mysql> exit

Bye
#

2.1.11   df

Completed Backup and Restore handling requires large space for the directory /cluster/ipwbrf on the disk. The df tool can be used to check the disk space. It displays the amount of disk space occupied by mounted or unmounted file system, the amount of used and available space, and how much of the file system's total capacity has been used.

For example:

SC-1:~ # df -hl
Filesystem                                     Size  Used Avail Use% Mounted on
/dev/sdb2                                       20G  2.3G   17G  13% /
devtmpfs                                        32G  8.0K   32G   1% /dev
tmpfs                                           32G  728K   32G   1% /dev/shm
tmpfs                                           32G  339M   32G   2% /run
tmpfs                                           32G     0   32G   0% /sys/fs/cgroup
/dev/sdb1                                      2.0G  125M  1.7G   7% /boot
/dev/mapper/lde--cluster--vg-lde--cluster--lv  148G   24G  117G  17% /.cluster
/dev/md0p3                                      99G  1.4G   92G   2% /local/ipworks
com_fuse_module                                148G   24G  117G  17% /var/filem/nbi_root
SC-1:~ #

2.1.12   trace

The Trace provides the ability to perform subscriber tracing which helps troubleshoot the issues in IPWorks system.

For how to use trace in IPWorks, refer to IPWorks Trace User Guide.

2.2   Alarm and Notification Viewer

For more information about alarm and notification, refer to Fault Management and IPWorks Alarm List.

2.3   CM Attribute Viewer

There are two methods to view and modify the configuration parameters.

Note:  
Since /etc/ipworks is a link to /cluster/home/ipworks/etc, you can view all the files in /etc/ipworks on any node.

Table 4    CM Attribute

Name

ECLI DN

Configuration Files Directories

Storage Server

ManagedElement=<Node Name>,IpworksFunction=1,IpworksCommonRoot=1,StorageServer=1

/etc/ipworks/ipworks_ss.conf

Server Manager

ManagedElement=<Node Name>,IpworksFunction=1,IpworksDnsRoot=1,DnsServer=1,DnsSm=1


ManagedElement=<Node Name>,IpworksFunction=1,IpworksDnsRoot=1,AsdnsServer=1,AsdnsSm=1


ManagedElement=<Node Name>,IpworksFunction=1,IPWorksAAARoot=1,IPWorksAAACommonRoot=1,AAAServerManager=1

/etc/ipworks/ipworks_dnssm.conf


/etc/ipworks/ipworks_asdnsmonsm.conf


/etc/ipworks/ipworks_aaasm.conf

DNS Server

ManagedElement=<Node Name>,IpworksFunction=1,IpworksDnsRoot=1,DnsServer=1,BindService=1

/etc/ipworks/<hostname>/ipworks_dns.conf

ActiveSelect DNS Server

ManagedElement=<Node Name>,IpworksFunction=1,IpworksDnsRoot=1,AsdnsServer=1

/etc/ipworks/<hostname>/ipworks_asdnsmon.conf

ENUM Server

  • For ENUM server: ManagedElement=<Node Name>,IpworksFunction=1,IpworksDnsRoot=1, IpworksEnumRoot=1,EnumServer=1

  • For ENUM FE Sync: ManagedElement=<Node Name>,IpworksFunction=1,IpworksDnsRoot=1, IpworksEnumRoot=1,EnumFE=1

  • For CUDB connection pool with ENUM server: ManagedElement=<Node Name>,IpworksFunction=1,IpworksCommonRoot=1,DataBaseInfo=1,CudbManager=1,CudbServiceSite=ENUM

  • For ERH module: ManagedElement=<Node Name>,IpworksFunction=1,IpworksDnsRoot=1, IpworksEnumRoot=1,EnumServer=1,Erh=1(1)

  • For CUDB connection pool with ERH module: ManagedElement=<Node Name>,IpworksFunction=1,IpworksCommonRoot=1,DataBaseInfo=1,CudbManager=1,CudbServiceSite=NP

For ENUM FE and ERH FE:


/etc/ipworks/ldapschema/ldap_dictionary.xml

EPC AAA Server

ManagedElement=<Node Name>,IpworksFunction=1,IPWorksAAARoot=1

/etc/ipworks/aaa_diameter/*

MySQL NDB Cluster

Not Applicable

  • For the Management Node: /etc/ipworks/mysql/confs/ipworks_mgm_conf

  • For the Data Node: /etc/ipworks/mysql/confs/ipworks_datanode_my.conf

  • For the SQL Node: /etc/ipworks/mysql/confs/ipworks_sqlnode.conf

(1)  For the ERH configuration in SS7 signaling manager, refer to Configure SS7 for ENUM Number Portability.


2.3.1   Storage Server

Following example shows how to check the configuration parameters of Storage Server by ECLI:

Example 1   Check Configuration Parameters of Storage Server

>show -v ManagedElement=<Node Name>,IpworksFunction=1,
IpworksCommonRoot=1,StorageServer=1 
 StorageServer=1
   directory="/cluster/storage/no-backup/ipworks/logs" <default>
   fileSize=1 <default>
   filesNumber=3 <default>
   level=LOG_LEVEL_DISABLE <default>
   passwordExpiryDays=45 <default>
   port=17071 <default>
   securityLog=false <default>
   storageServerId="1"
   timelyRotate=DISABLE <default>

For the other configuration parameters of Storage Server, they are stored in the file /etc/ipworks/ipworks_ss.conf.

Storage Server AMF wrapper configuration parameters are stored in the file /opt/ipworks/ss/etc/ss_wrapper.conf. The Storage Server AMF log directory, log name, log level can be configured here.

2.3.2   DNS Server Manager

Following examples show how to check the configuration parameters of DNS Server Manager by ECLI:

Example 2   Check Configuration Parameters of DNS Server Manager

>show -v ManagedElement=<Node Name>,IpworksFunction=1,
IpworksDnsRoot=1,DnsServer=1,DnsSm=1
 DnsSm=1
   dnsSmId="1"
   ssAddress="ipw_ss" <default>
   ssPassword="<Encrypted Password>"
   ssUserName="admin" <default>
   DnsSmLog=1

For the other configuration parameters of Server Managers, they are stored in the files /etc/ipworks/ipworks_*sm.conf. The files contain the Server Manager properties that are used most often, where the * stands for dns or asdnsmon.

The file /opt/ipworks/sm/confs/ipworks_sm_defaults.conf contains the default values for properties used for all the Server Managers that are installed on a machine. It is stored on the board where the DNS is installed. This file is changed only rarely.

2.3.3   DNS Server

Following example shows how to check the configuration parameters of DNS by ECLI:

Example 3   Check Configuration Parameters of DNS Server

>show –v ManagedElement=<Node Name>,IpworksFunction=1,IpworksDnsRoot=1,
DnsServer=1,BindService=1 
BindService=1
   asdnsGrpDiff=BIND_ASDNS_GRP_ENABLE_DIFF_1 <default>
   bindServiceId="1"
   debugLogLevel=1 <default>
   queryLogging=false <default>
   securityLog=false <default>
   DnsLog=1
   DnsTransLog=1

For the other configuration parameters of DNS server, they are stored in the file /etc/ipworks/<hostname>/ipworks_dns.conf.

2.3.4   ActiveSelect DNS Server

Verify the ActiveSelect DNS Server configuration files have been properly exported and are in the correct location. The default path of ActiveSelect DNS server configuration file is /etc/ipworks/<hostname>/ipworks_asdnsmon.conf.

Check the ActiveSelect DNS configuration file, ipworks_asdnsmon.conf for the DNS Server to ensure that the return counts for the ActiveSelect DNS Sites are not limiting the number of returned addresses. Also, confirm that the Prefer Statements are properly configured.

2.3.5   ENUM Server

Following example shows how to check the configuration parameters of ENUM by ECLI:

Example 4   Check Configuration Parameters of ENUM Server

>ManagedElement=<Node Name>,IpworksFunction=1,IpworksDnsRoot=1,
IpworksEnumRoot=1,EnumServer=1
(EnumServer=1)>show -v
EnumServer=1
   dbConnectString="SC-1:1186" <default>
   dbConnectStringSecondary="SC-2:1186" <default>
   dnsResolver=true <default>
   dnsResolverIPAddress="127.0.0.1" <default> <read-only>
   dnsResolverPort=5300 <default>
   enumServerId="1"
   ipv4Address="0.0.0.0" <default>
   ipv6Address="::" <default>
   port=53 <default>
   securitylog=false <default>
   threadCount=50 <default>
   Erh=1
   Log=1 

Following example shows how to check the configuration parameters of ERH by ECLI:

Example 5   Check Configuration Parameters of ERH

>ManagedElement=<Node Name>,IpworksFunction=1,IpworksDnsRoot=1,
IpworksEnumRoot=1,EnumServer=1,Erh=1
(Erh=1)>show -v
Erh=1
   discardErhFailure=false <default>
   erhId="1"
   ldap=true
   MAPRespNumberFormat=COUNTRYCODEWITHDASHSEC <default>
   nxdomainForNonPortedNumber=true <default>
   rcseInterConnect=false <default>
   teTimer=30 <default>
   ErhLdap=1
   ErhSs7=1

Following example shows how to check the configuration parameters of ENUM FE by ECLI:

Example 6   Check Configuration Parameters of ENUM FE

>ManagedElement=<Node Name>,IpworksFunction=1,IpworksDnsRoot=1,IpworksEnumRoot=1,
EnumFE=1
(EnumFE=1)>show -v
EnumFE=1
   enableEnumDnSchedCache=false <default>
   enableEnumFE=true
   enumDnRangeExpiration=7 <default>
   enumDnSchedExpiration=7 <default>
   enumFEId="1"
   handleLDAPFailure=NXDOMAIN <default>
   EnumFELog=1

Following example shows how to check the CUDB connection with ENUM server by ECLI:

Example 7   Check CUDB Connection with ENUM Server

>ManagedElement=<Node Name>,IpworksFunction=1,IpworksCommonRoot=1,
DataBaseInfo=1,CudbManager=1,CudbServiceSite=ENUM,CudbSiteManager=1,
CudbSite=<CudbSite Name>,CudbNode=<CudbNode Name>
(CudbNode=1)>show -v
CudbNode=<CudbNode Name>
   address="192.168.20.14"
   cudbNodeId="1" <default>
   distinguishedName="cudbUser=ENUMUser,ou=admin,dc=ericsson,dc=com"
   password="1:gliG5ALpb/AiV+hl2cd89uNRnnnCZCR7"
   poolSize=400 <default>
   port=389 <default>

Following example shows how to check the CUDB connection with ERH module by ECLI:

Example 8   Check CUDB Connection with ERH Module

>ManagedElement=<Node Name>,IpworksFunction=1,IpworksCommonRoot=1,
DataBaseInfo=1,CudbManager=1,CudbServiceSite=NP,CudbSiteManager=1,
CudbSite=<CudbSite Name>,CudbNode=<CudbNode Name>
(CudbNode=1)>show -v
CudbNode=<CudbNode Name>
   address="192.168.20.14"
   cudbNodeId="1" <default>
   distinguishedName="cudbUser=ERHUser,ou=admin,dc=ericsson,dc=com"
   password="1:gliG5ALpb/AiV+hl2cd89uNRnnnCZCR7"
   poolSize=400 <default>
   port=389 <default>

2.3.6   AAA Server

Following example shows how to check the configuration parameters of AAA by ECLI:

Example 9   Check Configuration Parameters of AAA Server

>show –v –r ManagedElement=<Node Name>,IpworksFunction=1,IPWorksAAARoot=1
IPWorksAAARoot=1
   ipworksAAARootId="1" <default>
   IPWorksAAACommonRoot=1
      ipworksAAACommonRootId="1" <default>
      AAAServer=PL-3
         aaaServerId="PL-3"
...

2.3.7   AAA Server Manager

Following example shows how to check the configuration parameters of AAA Server Manager by ECLI:

Example 10   Check Configuration Parameters of AAA Server Manager

>show -v ManagedElement=<Node Name>,IpworksFunction=1,IPWorksAAARoot=1,
IPWorksAAACommonRoot=1,AAAServerManager=1
AAAServerManager=1
aaaServerManagerId="1"
   directory="/cluster/storage/no-backup/ipworks/logs" <default>
   fileSize=10 <default>
   filesNumber=10 <default>
   level=LOG_LEVEL_DEBUG
   timelyRotate=DISABLE <default>

2.3.8   AAA Load Unbalanced in eVIP Scenario

Under normal situation, eVIP distributes the connection number to every PayLoad equally.

If one of the Payloads is down, all the connections will be automatically distributed to the other payload. Once the down Payload is recovered, the connection will not recover automatically. You must manually disconnect and re-establish the connection to make the connection number in every payload is nearly equal.

Check if connection number of PL-3 and PL-4 is nearly equal.

If the connection number of PL-3 and PL-4 is not close to equal, rebalance connection number by disconnecting some connections or all connections on the Payload which have more connection number.

2.3.9   MySQL NDB Cluster

Management Node

Configuration parameters for the MySQL NDB Cluster Management Node are stored in the file /etc/ipworks/mysql/confs/ipworks_mgm_conf. Both NDB cluster Active-Active Management Nodes share the same .conf file.

Data Node

Configuration parameters for the MySQL NDB Cluster Data Node are stored in file /etc/ipworks/mysql/confs/ipworks_datanode_my.conf.

SQL Node

Configuration parameters for the MySQL NDB Cluster SQL Node are stored in file /etc/ipworks/mysql/confs/ipworks_sqlnode.conf. All SQL Nodes share the same .conf file.

2.4   Performance Management Viewer

For more information about how to check performance measurements, refer to IPWorks Performance Measurements.

3   Troubleshooting Functions

This section describes the troubleshooting functions.

3.1   Alarm

ECLI is the tool for product that shows all active alarms.

Example 11   Show Active Alarms

# /opt/com/bin/cliss
>ManagedElement=<Node Name>,SystemFunctions=1,Fm=1
(Fm=1)>show FmAlarm=397
FmAlarm=397
   activeSeverity=MINOR
   additionalText="Agent 169.254.43.15 reports node 192.168.10.201 down"
   eventType=COMMUNICATIONSALARM
   lastEventTime="2015-03-03T01:54:22+01:00"
   majorType=193
   minorType=851974
   originalAdditionalText="Agent 169.254.43.15 reports node 192.168.10.201 down"
   originalEventTime="2015-03-03T01:54:22+01:00"
   originalSeverity=MINOR
   probableCause=342
   sequenceNumber=397
source="ManagedElement=<Node Name>,SystemFunctions=1,Fm=1,FmAlarmModel=ipworksDns,
FmAlarmType=ipworksDnsServASDNSNodeDown,HostName=PL-3,Node=192.168.10.201"
   specificProblem="DNS, ASDNS Node down"

Also, the operator can check the alarm status by referring to Check Alarm Status.

All alarms, including active and cleared alarms, are recorded in alarm logs recorded in folder: /cluster/storage/no-backup/nbi_root/AlarmLogs on SC nodes.

For more information about the IPWorks alarms, refer to IPWorks Alarm List.

3.2   Logging

This section describes the event logs for the product.

3.2.1   Error Log File Type

Not applicable.

3.2.2   Application-specific Logs

Table 5    Application-specific Logs

Log Directory

Description

/storage/no-backup/ipworks/logs/(1)

IPWorks Service and AMF wrapper logs.

/storage/no-backup/coremw/var/log(1)

Core MW logs.

AMF logs.

var/log/messages

Linux OS, kernel logs

OpenSaf, CLM, COM, CMW, SMF, IMM, AMF, FM, JavaOam log, BRF, NTP, RPM, etc. logs

IPWorks scripts logs (for example, amf, brf, tools, installation, initial configuration)

/local/ipworks/mysql-cluster/(2)

MySQL NDB Cluster logs

(1)  /storage folder is a link to /cluster/storage, you can view the log files on any node.

(2)  The log files under /local/ipworks/mysql-cluster are stored only on SC node.


3.2.3   Storage Server

The Storage Server writes logging information to the file /cluster/storage/no-backup/ipworks/logs/<hostname>/ipworks_ss_<hostname>.log.

The Storage Server appends logging information to the existing log file. When user checks log files, it is recommended to start from the end of the file.

3.2.3.1   File I/O Error

A File I/O Exception is thrown for log files, when user starts Storage Server as a non-root user:

File "logfile" I/O Error: /storage/no-backup/ipworks/logs/<hostname>/ipworks_ss_<hostname>.log (Permission denied)

File I/O Exception is thrown for audit log file when user logon to the CLI as a non-root user.

File "auditlogfile" I/O Error: /var/ipworks/logs/security/ipworks_ss_security Oct 05.audit (Permission denied)

Ensure that user is logon with root privileges to avoid these exceptions.

3.2.4   Server Manager

The Server Manager can be configured to use debug logging. By default, Server Manager log is disabled, it can be enabled by using ECLI. For details, see Section 3.6.

The Server Manger logs are stored in the file /storage/no-backup/ipworks/logs/<host-name>/<*>sm.log.

Where: <*> is the dns , asdnsmon, or aaasm.

3.2.5   DNS Server

To help resolve problems with the DNS Server, inspect the log files of server, either directly on the server system or through the IPWorks CLI.

The DNS Server log events use the syslog utility and can log events to log files. By default, major events are written through the syslog utility, though other events can be added. The default path is /var/log/messages.

Following example shows how to enable the debug logging for DNS server:

Example 12   Enable Debug Logging for DNS Server

#/opt/com/bin/cliss
#config
(config)>ManagedElement=<Node Name>,IpworksFunction=1,
IpworksDnsRoot=1,DnsServer=1,BindService=1,debugLogLevel=<number>
(config)>ManagedElement=<Node Name>,IpworksFunction=1,
IpworksDnsRoot=1,DnsServer=1,BindService=1,
DnsLog=1,level=DNS_LOG_LEVEL_DEBUG
(config-DnsLog=1)>commit

Where: <number> represents the granularity of debug logging information. Refer to the attribute debugLogLevel in the MO BindService for details.

Note:  
By default, DNS transaction log is enabled.

The DNS server opens a log file, ipworks_dns.log, in the configured log directory (/cluster/storage/no-backup/ipworks/logs/), if the debug level is DNS_LOG_LEVEL_DEBUG. The log directory is read-only.

There are also amf wrapper and coremw related logs recorded in /cluster/storage/no-backup/coremw/var/log/. In default, the logs are enabled.

3.2.6   ActiveSelect DNS Server

Check the ActiveSelect DNS (ASDNS) Monitor log file, ipworks_asdnsmon.log for errors. The default path is /cluster/storage/no-backup/ipworks/logs/.

Check the status for a given address using the ipworks_asdnsmon.log file.

Check the ipworks_asdnsmon_trans.log that tracks the transaction events regarding ASDNS monitor.

coremw related log is enabled by default. It is located in /cluster/storage/no-backup/coremw/var/log/.

Check the DNS Server log file, ipworks_dns.log for the following two messages:

datagram from [ASDNS Monitor IP Address].port
ns_req: TSIG verify failed - BADSIG (16)

If this message is displayed, there is a mismatch in the TSIG key being used and thus messages from the ASDNS Monitor are not being processed. Use the IPWorks CLI to correct the configuration.

3.2.6.1   ActiveSelect DNS Monitor Log Files

To help resolve problems with the ASDNS Monitor, trace the activity by inspecting the monitor log files.

The IPWorks ASDNS Monitor logs events to the syslog utility and log files. By default, major events are written to the syslog utility. For details about the syslog utility, see the syslog(3C) manual page.

By default, the ASDNS Monitor log is disabled as logging consumes CPI and disk resources.

Following example shows how to enable the logging for ASDNS Monitor:

Example 13   Enable Logging for ASDNS Monitor

#/opt/com/bin/cliss
#config
(config)>ManagedElement=<Node Name>,IpworksFunction=1,IpworksDnsRoot=1,
AsdnsServer=1,asdnsMonitor=1,AsdnsMonLog=1,level=LOG_LEVEL_DEBUG
(config-AsdnsMonLog=1)>commit
Note:  
The ASDNS transaction log is enabled by default.

The ASDNS Monitor opens a file, ipworks_asdnsmon.log, in the log directory (/cluster/storage/no-backup/ipworks/logs).

For more information about the ASDNS Monitor Log and ASDNS Monitor Transaction Log, refer to AsdnsMonLog and AsdnsMonTransLog in Managed Object Model (MOM).

3.2.6.2   ActiveSelect DNS Monitor System Logs

The system log is the primary location where operational problems with the ASDNS Monitor are logged. It is important to monitor the system log (on the host where the monitor is running) for errors or problems. The path of the system log file is /var/log/messages.

For logs generated by coremw that is related to ASDNS Monitor, is recorded in the directory /cluster/storage/no-backup/coremw/var/log/<PL hostname>/asdns_coremw.log. It is enabled in default.

When errors are displayed, the messages in general describe the error and most prevents the monitor from running. The monitor may run with a partially successful configuration file, so it is important to check the log messages and not simply assume that the configuration is correct if the monitor is running.

Note:  
In the log file, the ASDNS Monitor identifies itself as dagent. For example:

Feb 28 09:46:35 dagent started -
this version compiled 01:14:27 Apr 21 2003


The following table lists the error messages generated by the asdnsmon daemon:

Table 6    ActiveSelect DNS Monitor Error Messages

Error

Description

exec failed for script error-message

This error indicates that the monitor failed to start the script and the error message should provide information as to why it failed.

can’t send to dns: error

An error was encountered while trying to send load information to a DNS Server.

exec failed for command: error

An error was encountered when trying to run the command configured for a monitor script.

unable to locate target for fd number, pid

A temporary error condition when processing the exit status of a monitor command. If this often occurs, review the scripts used.

target name failed to complete

A previous monitor load sample had failed to complete by the time the next sample was measured. It may be the service is down or that the interval specified is too short.

too many processes for name

Too many monitoring processes have been created. This may be because they are not completing because of the short interval between checks, or they are not able to detect an error condition quickly enough and return the error condition.

can’t fork in


create_child: error


can’t dup errno: error

These are errors in creating monitoring processes. Contact product support.

unable to open pidfile file: error

The file where the monitor process ID is maintained cannot be created. Typically this is because the monitor process has not been started as root.

select error: error

This is a fatal runtime error that can be caused by problems with the network layer.

error setting priority: error

The monitor was unable to change its priority, typically because it was not run as root.

can’t malloc entity


can’t get mem in function

These are fatal runtime error messages that indicate there is no more memory is available. Perhaps too many resources are being monitored by this monitor.

3.2.7   ENUM Server

The error log file of ENUM server (including ERH over LDAP), ERH over SS7, and ENUM FE Sync are stored in /cluster/storage/no-backup/ipworks/logs/<hostname>, the log file is named as ipwenum.log.x and ipworks_fesync.log.x respectively.

The ENUM server, the ERH module, and ENUM FE Sync automatically start a new error log file after a configurable period or when the current file reaches a configurable size. Take ENUM error log file as an example, it retains a configurable number of previous versions of the file with names ipwenum.log.<n>, where n is the number of the log file. The user can configure the number of files retained, and the size and time limits except the directory path using the ECLI.

3.2.8   AAA Server

The AAA Server writes logging information under the directory /cluster/storage/no-backup/ipworks/logs/<PL hostname>/aaa_diameter_server.log.

To help resolve problems with the AAA Server, inspect the server’s log files, refer to the Section EPC AAA in Data Collection Guideline for IPWorks.

3.2.9   MySQL NDB Cluster

The MySQL NDB Cluster writes logging information under the directory /local/ipworks/mysql-cluster/.

3.2.10   Backup and Restore

The Backup and Restore handling writes logging information under the directory /cluster/storage/no-backup/ipworks/logs/<hostname>/ipwbrf.log.

3.2.11   Scaling

IPWorks application scaling writes logging information under SC-1/SC-2 log file /var/log/message.

LDE scaling writes logging information under SC-1/SC-2 log file /var/log/message.

CoreMW scaling writes logging information under SC-1/SC-2 folder /var/opt/coremw/clustermonitor files clustermonitor.log*.

SS7CAF scaling writes logging information under SC-1/SC-2 folder /opt/sign/log files ss7caf_scaling.log*.

3.3   Core Dumps

This section describes how to troubleshoot with core dump.

A core dump is a file containing a process's address space (memory) when the process terminates unexpectedly. Core dumps may be produced on-demand (such as by a debugger), or automatically upon termination. Core dumps are triggered by the kernel in response to program crashes, and may be passed to a helper program (such as systemd-coredump) for further processing. Core dumps may be useful for developers to debug program crashes, however they are practically useless to the average user, and have been largely obsoleted by modern debuggers.

3.3.1   Locating Core File

Normally the core dump files are stored in the directory /cluster/dumps/.

3.3.2   Core Dump Limitation

By default, there is no limitation for core dump files. This limitation can be checked by ulimit –c. If the operator wants to set the limitation, use ulimit –c 1024k, and change it back to default by using ulimit –c unlimited.

3.3.3   Defining Name of Core Dump File

To define name of core dump files, do the following:

  1. In the configuration file /etc/sysctl.conf, navigate to the parameter kernel.core_pattern, and define a template that is used to name core dump files.

    The template can contain % specifiers which are substituted by the following values when a core file is created:

    %%  a single % character
    %p  PID of dumped process
    %u  (numeric) real UID of dumped process
    %g  (numeric) real GID of dumped process
    %s  number of signal causing dump
    %t  time of dump, expressed as seconds since the Epoch,  1970-01-01
     00:00:00 +0000 (UTC)       
    %h  hostname
    %e  executable filename (without path prefix)
    %c  core  file  size soft resource limit of crashing process (since Linux 2.6.24)
    

    The default value is kernel.core_pattern = /cluster/dumps/%e.%p.%h.core.

  2. Execute the command sysctl –p to take effect without rebooting.

3.3.4   Analyzing Core Dump File

Analyze the core dump file to find the cause of abnormal crash. Before performing the following steps, users must install the tool gdb first.

For example, if a core dump file CoreDumpFile is found under /cluster/dumps.

  1. Find which service crashed and which specific binary file generates the core dump files.
    1. Go to the directory /cluster/dumps.

      Example:

      SC-1:~ # cd /cluster/dumps

    2. List the core dump files.

      Example:

      SC-1:~ # ls -lrt *.core*

      -rw------- 1 root root 140431360 Mar 20 03:00 named.12161.PL-3.core

      Where: the named.12161.PL-3.core is the core dump file.

    3. Based on the dump file, determine what process or service (such as DNS) crashed and what binary file generates the core dump file accordingly.

      Example:

      SC-1:~ # file named.12161.PL-3.core

      named.12161.PL-3.core: ELF 64-bit LSB core file x86-64, version 1 (SYSV), SVR4-style, from '/opt/ipworks/dns/usr/bin/named -f'

      From the command output, the segment dns indicates that the DNS server crashed and the binary file named in the directory /opt/ipworks/dns/usr/bin generates the core dump file.

  2. Save or back up the following proof files:
    • The core dump file like named.12161.PL-3.core in the directory /cluster/dumps.
    • The binary file like named in the directory /opt/ipworks/dns/usr/bin.
    • The log files in the directory /cluster/storage/no-backup/ipworks/logs.
  3. Use the tool gdb to analyze the reason why the process crashed.

    Example:

    PL-3:~ # gdb /opt/ipworks/dns/usr/bin/named named.12161.PL-3.core

    1. Use command bt or where in GDB to view the called and calling stack of the thread that caused the crash.

      (gdb) bt

      Or

      (gdb) where

      Example:

      #12 0x00007fdd0bfa3563 in LmServerProxy::connectToLmServer() () from /usr/lib64/liblmcba64.so
      #13 0x00007fdd0bfa3616 in LmServerProxy::handleConnectionLoss() () from /usr/lib64/liblmcba64.so
      #14 0x00007fdd0bfa48f6 in LmServerProxy::connectionLossThreadFunction(void*) () from /usr/lib64/liblmcba64.so
      #15 0x00007fdd0bd687f6 in start_thread () from /lib64/libpthread.so.0
      #16 0x00007fdd0b84b09d in clone () from /lib64/libc.so.6

    2. Use the following command to view status of all threads in the same process.

      (gdb) thread apply all bt

      Example:

      Thread 17 (Thread 0x7fdd0e007720 (LWP 12161)):
      #0  0x00007fdd0b7a2f6b in sigsuspend () from /lib64/libc.so.6
      #1  0x0000000000640ad1 in isc__app_ctxrun ()
      #2  0x0000000000640b89 in isc__app_run ()
      #3  0x0000000000424770 in main ()
      
      Thread 16 (Thread 0x7fdd0793c700 (LWP 12170)):
      #0  0x00007fdd0bd6c65c in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
      #1  0x000000000066e55b in timer_thread_handler (arg=<optimized out>) at
       /vobs/ims/ipworks/src/common/c_common/c_common_scc/src/ipworks_timer.c:177
      #2  0x00007fdd0bd687f6 in start_thread () from /lib64/libpthread.so.0
      #3  0x00007fdd0b84b09d in clone () from /lib64/libc.so.6
      #4  0x0000000000000000 in ?? ()
      
      Thread 15 (Thread 0x7fdd0450f700 (LWP 12176)):
      #0  0x00007fdd0bd6c65c in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
      #1  0x0000000000672403 in PmUploaderThread::run (this=0xac2410) at 
      /vobs/ims/ipworks/src/common/coremw_adaptor/pm_adaptor_scc/src/PmUploaderThread.cpp:71
      #2  0x00007fdd0d3b6213 in ipworks::Thread::loop (this=0xac2410) at 
      /vobs/ims/ipworks/src/common/cpp_common/cpp_common_scc/src/Thread.cpp:56
      #3  0x00007fdd0c82e5e3 in thread_proxy () from /opt/ipworks/common/usr/lib/libboost_thread.so.1.54.0
      #4  0x00007fdd0bd687f6 in start_thread () from /lib64/libpthread.so.0
      #5  0x00007fdd0b84b09d in clone () from /lib64/libc.so.6
      #6  0x0000000000000000 in ?? ()

Note:  
If users have not installed the GDB, install it first. Or users can ask for support to analyze the core dump files, binary files, and logs. The most important thing is that these proof files must be taken care of.

3.4   Performance Measurements

Generation of the performance measurements by the IPWorks is another way to get useful information when troubleshooting a problem.

The performance management report files are generated in 3GPP compliant XML format and can be transferred outside the system for post processing.

For more information about file format, refer to Performance Management Report File Format.

For more information about the performance measurements, refer to IPWorks Measurement List.

3.5   Software Version Checks

Check the software version on IPWorks. For details, refer to View Software Information.

3.6   Log Level Changes

Table 7   

Server Name

Operation

Comments

Storage Server

#/opt/com/bin/cliss
#config
(config)>ManagedElement=<Node Name>,IpworksFunction=1,
IpworksCommonRoot=1,StorageServer=1,level=<Log level>

Where: <Log Level> specifies the log level for Storage Server. For more information, refer to level in Managed Object Model (MOM).


Note: Changing log level of Storage Server to higher levels of detail might result in large log file that affects the performance of the server. Therefore, it needs to be changed only when there is a problem, and to be changed back once the problem is resolved.

DNS Server

Change DNS Debug Log Level:


#/opt/com/bin/cliss
#config
(config)>ManagedElement=<Node Name>,IpworksFunction=1,
IpworksDnsRoot=1,DnsServer=1,BindService=1,debugLogLevel=90
(config)>commit

Where: debugLoglevel can be any vaule of 1-99. For more information, refer to the attribute debugLogLevel in Managed Object Model (MOM).

Change DNS Log Level:


#/opt/com/bin/cliss
#config
(config)>ManagedElement=<Node Name>,IpworksFunction=1,
IpworksDnsRoot=1,DnsServer=1,BindService=1,DnsLog=1,
level=<Log level>
(config-DnsLog=1)>commit 

Where: <Log Level> specifies the log level for DNS server. It can be DNS_LOG_LEVEL_DEBUG or DNS_LOG_LEVEL_DISABLE . For more information, refer to the attribute level in the class DnsLog in Managed Object Model (MOM).

ASDNS Monitor

#/opt/com/bin/cliss
#config
(config)>ManagedElement=<Node Name>,IpworksFunction=1,
IpworksDnsRoot=1,AsdnsServer=1,AsdnsMonitor=1,
AsdnsMonLog=1,level=<Log Level>
(config-AsdnsMonLog=1)>commit 

Where: <Log Level> specifies the log level for ASDNS Monitor. For more information, refer to level in class AsdnsMonLog in Managed Object Model (MOM).

DNS/ASDNS SM

#/opt/com/bin/cliss
>ManagedElement=<Node Name>,IpworksFunction=1,
IpworksDnsRoot=1,**Server=1, **Sm=1, **SmLog=1
(**SmLog=1)> config
(config-**SmLog=1)>level=<Log level>
(config-**SmLog=1)>timelyRotate=<Timely rotation>
(config-**SmLog=1)>commit

Where:


ENUM Server

#/opt/com/bin/cliss
>ManagedElement=<Node Name>,IpworksFunction=1,
IpworksDnsRoot=1,IpworksEnumRoot=1,EnumServer=1,Log=1
(Log=1)>configure
(config-Log=1)>level=<Log Level>
(config-Log=1)>commit

Where:


<Log Level> specifies the log level for ENUM server. For more information about logging level, refer to IpworksLogLevel in Managed Object Model (MOM).


The changes dynamically take effect.

ENUM FE Sync

#/opt/com/bin/cliss
>ManagedElement=<Node Name>,IpworksFunction=1,
IpworksDnsRoot=1,IpworksEnumRoot=1,EnumFE=1,EnumFELog=1
(EnumFELog=1)>configure
(config-EnumFELog=1)>level=<Log Level>
(config-EnumFELog=1)>commit

ERH LDAP

#/opt/com/bin/cliss
>ManagedElement=<Node Name>,IpworksFunction=1,
IpworksDnsRoot=1,IpworksEnumRoot=1,EnumServer=1,Erh=1,
ErhLdap=1,Log=1
(EnumFELog=1)>configure
(config-EnumFELog=1)>level=<Log Level>
(config-EnumFELog=1)>commit


Note: The log configuration of ERH over LDAP is obsoleted, it is merged into EnumServer log configuration.

ERH SS7

#/opt/com/bin/cliss
>ManagedElement=<Node Name>,IpworksFunction=1,
IpworksDnsRoot=1,IpworksEnumRoot=1,EnumServer=1,Erh=1,
ErhSs7=1,Log=1
(EnumFELog=1)>configure
(config-EnumFELog=1)>level=<Log Level>
(config-EnumFELog=1)>commit

EPC AAA Server

#/opt/com/bin/cliss
>ManagedElement=<Node Name>,IpworksFunction=1,
IPWorksAAARoot=1,IPWorksAAACommonRoot=1,
AAAServer=<PL hostname>,LogManagement=1,
IPWorksLog=AAA_DIAMETER_SERVER
(IPWorksLog=AAA_DIAMETER_SERVER)>configure
(config-IPWorksLog=AAA_DIAMETER_SERVER)>level=<Log Level>
(config-IPWorksLog=AAA_DIAMETER_SERVER)>commit

Where:


<Log Level> specifies the log level for EPC AAA Server. For more information about logging level, refer to IpworksLogLevel in Managed Object Model (MOM).


The changes dynamically take effect.

AAA Server Manager

#/opt/com/bin/cliss
>ManagedElement=<Node Name>,IpworksFunction=1,
IPWorksAAARoot=1,IPWorksAAACommonRoot=1,
AAAServerManager=1
(AAAServerManager=1)>configure
(config-AAAServerManager=1)>level=<Log Level>
(config-AAAServerManager=1)>commit

MySQL NDB Cluster

Not Applicable.

For changing the log level for MySQL NDB Cluster, refer to MySQL online reference.

3.7   Restart

Use the command ipw-ctr restart <component> to restart IPWorks components. For more information, refer to the section Service Life Cycle Management in IPWorks Configuration Management.

3.8   Server Status Checks

Table 8 lists which methods can be used to check the server status:

Table 8    Server Status Checks

Server

Methods

 

ipw-ctr(1)

ipwcli(2)

ps(3)

rndc(4)

dig(5)

Script(6)

Storage Server

 

     

MySQL NDB Cluster

   

   

DNS

 

DNS SM(7)

 

     

ASDNS

     

ASDNS SM(8)

 

     

ENUM

 

 

 

ENUM FE Sync

 

     

AAA

 

     

AAA SM(9)

 

     

(1)  Use ipw-ctr status <component> <hostname>. For details, see Section 2.1.2.

(2)  Use show status in the IPWorks CLI. For more information, refer to Command Line Interface User Guide for IPWorks SS.

(3)  Use ps -ef | grep <process name> to check if the Server process is running. Check Section 2.1.1 for details.

(4)  Use the rndc status command for more detailed status.

(5)  Use dig or another query utility to send a query to the server to monitor that each configured zone is loaded. For more information, see Section 2.1.9.

(6)   For details, refer to the section Showing Status of MySQL NDB Cluster in Configure MySQL NDB Cluster.

(7)  If DNS SM is not running, DNS server cannot be updated from IPWCLI. After IPWorks is installed, DNS SM is not started.

(8)  If ASDNS SM is not running, ASDNS monitor cannot be updated from IPWCLI. After IPWorks is installed, ASDNS SM is not started.

(9)  If AAA SM is not running, AAA server status cannot be received from IPWCLI. After IPWorks is installed, AAA SM is not started.


3.9   IPWorks Common Component

Table 9 lists the links to the Common Components troubleshooting. These Common Components are used by IPWorks software and are provided by Ericsson middleware department. The related detail troubleshooting guides can be found in their own CPI document.

Table 9    Links of IPWorks Common Component

IPWorks Common Component

Troubleshooting Guide Link

COM

COM Advanced Troubleshooting Guideline

Core MW

Core MW Troubleshooting Guideline

eVIP

eVIP Advanced Troubleshooting Guideline

JavaOam

JavaOaM Troubleshooting Guideline

LM (License Management)

LM Troubleshooting Guideline

SS7 CAF

SS7 CAF Troubleshooting Guideline

Note:  
The common components without troubleshooting guide are not listed here.

4   Troubleshooting Procedure

Troubleshooting a problem might require the use of one or more functions described in Section 3. To assure an efficient location of the fault, user can do the following:

  1. Check the alarms and notifications.
  2. Check licenses.
  3. Check the performance management measurements.
  4. Check the logs.
  5. Check the server status.
  6. Check the configuration files.
  7. Start tracing.
  8. Check available information owing to capsule abortion/core dumps.
  9. Collect information.
  10. Check already reported troubles (CSRs).
  11. If writing a CSR, check software version and level.
  12. Consult the next level of maintenance support.

A troubleshooting workflow is shown in Figure 1.

Figure 1   Troubleshooting Workflow

5   Problem-Solving Procedure

5.1   IPWorks VNF Stack Deployment

This section provides information on resolving problems during IPWorks VNF stack deployment.

For more information about CEE related troubleshooting, refer to CEE Troubleshooting Guideline.

5.1.1   Server Groups Forbidden

5.1.1.1   Trouble Symptoms

When you try to launch IPWorks VNF HEAT stack, it fails with the "CREATE_FAILED" stack status, and the reason is "Quota exceeded, too many server groups."

$openstack stack show <Stack Name or ID>

For example:

$openstack stack show ipw6a
....
| parent                | None                                                   |
| stack_name            | ipw6a                                      |
| stack_owner           | admin                                                  |
| stack_status          | CREATE_FAILED                                          |
| stack_status_reason   | Resource CREATE failed: Forbidden: 
|                       | resources.pl34_server_group: Quota exceeded, too many |
|                       | server groups. (HTTP 403) (Request-ID: req-acd057df- |
|                       | 83b1-44e1-84c8-a55e7021b1c8)
|
| stack_user_project_id | 3f8143c8366e45e09083edf4e6845791                       |
| template_description  | IPWorks Stack for CEE HEAT (08-01-2016)                |
| timeout_mins          | None                                                   |
| updated_time          | None                                                   |
+-----------------------+--------------------------------------------------------+

5.1.1.2   Locating Fault

For the default Atlas configuration, the Quota info may not be enough to deploy the IPWorks. In this case, users must increase the resource limitation in quota to make sure that the IPWorks resource can be created successfully:

  1. Log on to the Atlas with the tenant user with admin role.
  2. Source tenant user environment.

    $source openrc

    Note:  
    If you use the new created tenant user, create a new openrc (refer to the format in /home/atlasadm/openrc) for the new user, and then source it.

  3. Verify if the tenant user environment is correct.

    $nova list

    $nova quota-show

    $neutron quota-show

  4. Get the tenant ID from tenant list output.

    $openstack project list

  5. Update Server Groups limitation.

    $nova quota-update --server-groups <Server groups Limitation> <tenant-id>

    For example:

    $nova quota-update --server-groups 20 5a49b043d9ea4666ac4adf6bc821942e

5.1.1.3   Confirming Solution

Check whether the IPWorks VNF stack can be deployed successfully. If the problem persists, contact next level of Ericsson support.

5.1.2   VLAN Conflicts

5.1.2.1   Trouble Symptoms

When you try to launch IPWorks VNF HEAT stack, it fails with the "CREATE_FAILED" stack status, and the reason is "Unable to create the network. The VLAN xxx on physical network default in use.".

$openstack stack show <Stack Name or ID>

For example:

$openstack stack show ipw6a
...
| parent                | None                                                   |
| stack_name            | sub12-release-vnf                                      |
| stack_owner           | admin                                                  |
| stack_status          | CREATE_FAILED                                          |
| stack_status_reason   | Resource CREATE failed: Conflict: resources.ipw_sig_sp:|
|                       | Unable to create the network. The VLAN 213 on physical |
|                       | network default is in use.                             |
| stack_user_project_id | 3f8143c8366e45e09083edf4e6845791                       |
| template_description  | IPWorks Stack for CEE HEAT (08-01-2016)                |
| timeout_mins          | None                                                   |
| updated_time          | None                                                   |
+-----------------------+--------------------------------------------------------+

5.1.2.2   Locating Fault

To detect which network occupies the VLAN ID and the reason for why the VLAN is used, execute the following command in Atlas server:

  1. Check VLAN ID is used by which CEE neutron network.

    $vid=<VLAN_ID>

    $for i in $(neutron net-list -F name -D -f value);do j=$(neutron net-show -F provider:segmentation_id -f value $i); [[ $j == "$vid" ]] && echo "Occupy vlan $vid by network $i" && break ; done

    According to the above example, execute the following commands:

    $vid=213

    $for i in $(neutron net-list -F name -D -f value);do j=$(neutron net-show -F provider:segmentation_id -f value $i); [[ $j == "$vid" ]] && echo "Occupy vlan $vid by network $i" && break ; done

    The command output shows like below:

    Occupy vlan 213 by network network ipw6a_sig_sp

  2. Check whether the VLAN ID is duplicated with other network. If the network data is dirty or the VLAN ID is occupied by other VNF application, delete the network manually in Atlas server:

    $neutron net-delete <NET_NAME>

    According to the above example, execute the following command:

    $neutron net-delete ipw6a_sig_sp

5.1.2.3   Confirming Solution

Check whether the IPWorks VNF stack can be deployed successfully. If the problem persists, contact next level of Ericsson support.

5.1.3   Failed to Create Network

5.1.3.1   Trouble Symptoms

When you try to launch IPWorks VNF HEAT stack, it fails with the “CREATE_FAILED” stack status and the reason is “create_network_postcommit failed”.

$openstack stack show <Stack Name or ID>

For example:

$openstack stack show ipw6a
...
| parent                | None                                                    |
| stack_name            | ipw6a                                                   |
| stack_owner           | ipwvnf                                                  |
| stack_status          | CREATE_FAILED                                           |
| stack_status_reason   | Resource CREATE failed: InternalServerError:            |
|                       | resources.ipw_oam_sp: create_network_postcommit failed. |
| stack_user_project_id | 2326bf1070a94112bb4daf4a6a9e81cd                        |
| template_description  | IPWorks Stack for CEE HEAT (08-01-2016)                 |
| timeout_mins          | 60                                                      |
| updated_time          | None                                                    |
+-----------------------+---------------------------------------------------------+

5.1.3.2   Locating Fault

Detect which network occupies the VLAN ID in BSP DMX. In DMX COM CLI, check whether the VLAN ID exists, if yes, make sure that the VLAN ID is not used by other network like other application VNF. First, confirm this by IP plan or with CEE administrator.

  1. Navigate to the VirtualBridge MO.

    >ManagedElement=1,DmxcFunction=1,Trm=1,VirtualBridge=CEE

    >show Vlanid=<VLAN_ID>

  2. If the VLAN ID is already there, delete it in configuration mode as below:

    >configure

    >no Vlan=<VLAN_ID>

    According to the above example, execute the following command:

    >no Vlan=<ipw_oam_sp VLAN_ID>

5.1.3.3   Confirming Solution

Check if the IPWorks VNF stack can be deployed successfully. If the problem persists, contact next level of Ericsson support.

5.1.4   Policy Problem

5.1.4.1   Trouble Symptoms

When you try to launch IPWorks VNF HEAT stack, it fails with "CREATE_FAILED” stack status and the reason shows that policy does not allow several actions to be performed.

$openstack stack show <Stack Name or ID>

For example:

$openstack stack show ipw6a
&mldr;
| parent                | None                                                       |
| stack_name            | ipw6a                                                      |
| stack_owner           | ipwdemo                                                    |
| stack_status          | CREATE_FAILED                                              |
| stack_status_reason   | Resource CREATE failed: Forbidden:                         |
|                       | resources.ipw_sig_sp: Policy doesn't allow                 |
|                       | ((((rule:create_network and                                |
|                       | rule:create_network:provider:physical_network) and         |
|                       | rule:create_network:shared) and                            |
|                       | rule:create_network:provider:network_type) and             |
|                       | rule:create_network:provider:segmentation_id) to be        |
|                       | performed.                                                 |
| stack_user_project_id | 42322a142af24b9a821475b434ea8152                           |
| template_description  | IPWorks Stack for CEE HEAT (08-01-2016)                    |
| timeout_mins          | 60                                                         |
| updated_time          | None                                                       |
+-----------------------+------------------------------------------------------------+

5.1.4.2   Locating Fault

This issue is caused by trying to launch IPWorks VNF stack by using a user without "admin" role.

Show the user info in Atlas server:

$ openstack role list --user <USER_NAME> --project <TENANT_NAME>

For example, the following user “ipwvnf” has admin role.

$ openstack role list --user ipwvnf --project ipwvnf

+----------------------------------+----------+----------------------------------+----------------------------------+
|                id                |   name   |             user_id              |            tenant_id             |
+----------------------------------+----------+----------------------------------+----------------------------------+
| 9fe2ff9ee4384b1894a90878d3e92bab | _member_ | 50f8c42336d347dbbd1a506428b1fdc6 | 6e4c612850914c7f86041085bf00a2a2 |
| 3e86a80ffab44fd6b489c2d9d2ccaf13 |  admin   | 50f8c42336d347dbbd1a506428b1fdc6 | 6e4c612850914c7f86041085bf00a2a2 |
+----------------------------------+----------+----------------------------------+----------------------------------+

If the IPWorks VNF tenant user does not have “admin” role, contact CEE administrator to add “admin” role to the user first.

5.1.4.3   Confirming Solution

After adding “admin” role to the IPWorks tenant user, check if the IPWorks VNF stack can be deployed successfully. If the problem persists, contact next level of Ericsson support.

5.1.5   Failed to Delete HEAT Stack

5.1.5.1   Trouble Symptoms

When you try to delete a HEAT stack for IPWorks VNF, it fails with the "DELETE_FAILED" stack status.

Execute the following command in Atlas server:

$openstack stack show <Stack Name or ID>

For example:

$openstack stack show ipw6a
...
| parent                | None                                                 |
| stack_name            | ipw6a                                                |
| stack_owner           | admin                                                |
| stack_status          | DELETE_FAILED                                        |
| stack_status_reason   | Resource DELETE failed: Error: resources.ipw_SC-1:   |
|                       | Server ipw6a_SC-1 delete failed: (400) Cannot pin/unpin|
|                       | cpus [8, 16, 18, 6] from the following pinned set [9,  |
|                       | 3, 4, 5, 17]                                           |
| stack_user_project_id | d7920b81148944ba9a8a6400a0d3b593                       |
| template_description  | IPWorks Stack for CEE HEAT (08-01-2016)                |
| timeout_mins          | None                                                   |
| updated_time          | None                                                   |
+-----------------------+--------------------------------------------------------+

5.1.5.2   Locating Fault

To delete the stack, stop the VM (SC-1 here) first by using nova command, and then delete the HEAT stack in Atlas server:

$nova stop <VM_NAME>

$heat stack-delete <STACK_NAME>

According to the above example, execute the following commands:

$nova stop ipw6a_SC-1

$heat stack-delete ipw6a

5.1.5.3   Confirming Solution

Execute the following command to check whether the IPWorks VNF stack can be deleted successfully.

$openstack stack show <Stack Name or ID>

If the problem still remains, contact next level of Ericsson support.

5.2   IPWorks Upgrade

This section provides information on resolving problems during IPWorks Upgrade.

5.2.1   Error: Could Not Find Local Upgrade Package

5.2.1.1   Trouble Symptoms

When user tries to create IPWorks Upgrade Package (UP) by executing the command createUpgradePackage in ECLI, it fails with "Could not find local upgrade package".

SC-X:~ #ls /cluster/UP/

5.2.1.2   Locating Fault

Check the folder /cluster/UP to see whether other files, in addition to the file ERIC-IPW_UP.tar.gz, exist in this folder. If yes, remove the files except for the ERIC-IPW_UP.tar.gz and then try the action again.

5.2.1.3   Confirming Solution

Check whether the IPWorks UP can be created successfully. If the problem still remains, contact next level of Ericsson support with ECIM logs and /var/log/messages.

To generate the ECIM logs, do the following:

  1. Find which SC is active for ECIM process.

    #cmw-status -v csiass | grep -i ecimswm -A 2

  2. Enable ECIM trace log (assume that ECIM is active in SC-1).

    For example:

    SC-1:~ # ps -ef | grep ecim

    cmw-swm 7788 1 0 Dec07 ? 00:00:01 /opt/coremw/lib/ecimswm instantiate

    SC-1:~ # kill -SIGUSR2 7788

  3. View the log under the following folder:

    /var/opt/coremw/ecimswm

5.2.2   Error: Failed to Remove Upgrade Package

5.2.2.1   Trouble Symptoms

When user tries to remove IPWorks Upgrade Package (UP) by executing the command removePackageUpgrade UpgradePackage=<UP Name> in ECLI, it fails with "Failed to remove upgrade package".

5.2.2.2   Locating Fault

Check folder /cluster/UP to find if there is any file (for example, ERIC-IPW_UP.tar.gz) in this folder. If yes, do the following:

  1. Remove all files under the folder.

    SC-X:~ #rm /cluster/UP/*

  2. Try to remove the IPWorks UP again.

    For details, refer to Delete Upgrade Package.

5.2.2.3   Confirming Solution

Check whether the IPWorks UP can be removed successfully. If the problem still remains, contact next level of Ericsson support with ECIM log and /var/log/messages.

To generate the ECIM logs, do the following:

  1. Find which SC is active for ECIM process.

    #cmw-status -v csiass | grep -i ecimswm -A 2 | grep ACTIVE -B 1

  2. Enable ECIM trace log (assume that ECIM is active in SC-1).

    For example:

    SC-1:~ # ps -ef | grep ecim

    cmw-swm 7788 1 0 Dec07 ? 00:00:01 /opt/coremw/lib/ecimswm instantiate

    SC-1:~ # kill -SIGUSR2 7788

  3. View the log under the following folder:

    /var/opt/coremw/ecimswm

5.2.3   Failed to Restore System Data after Upgrade Failure

5.2.3.1   Trouble Symptoms

After upgrade failure, user cannot restore System Data by using ECLI. When this issue occurs, users receive the information resembles the following:

actionName="RESTORE" <read-only>
 additionalInfo <read-only>
 "Restore Backup for SystemData_BKP_preUGLSV16_2017-03-09: Initialized"
 "No active result is reported for one or more groups. BRFC is cancelling Current Request"
 "Restore Backup for SystemData_BKP_preUGLSV16_2017-03-09: Failed"

5.2.3.2   Locating Fault

This issue occurs when upgrade fails and the DRBD is running on SC-2.

To resolve this issue:

  1. Confirm that drbd is running on SC-2. Execute the following command on SC-1:

    SC-1:~ # drbd-overview

    For example:

    0:drbd0/0 Connected Secondary/Primary UpToDate/UpToDate C r-----

    The output shows that SC-1 is secondary, this means that drbd is running on SC-2.

  2. Reboot SC-2 to switch drbd to SC-1.

    SC-2:~ # reboot

  3. After reboot SC-2, execute the command again to see whether DRBD is switched to SC-1 successfully.

    SC-1:~ # drbd-overview

    For example:

    0:drbd0/0 Connected Primary/Secondary UpToDate/UpToDate C r----- lvm-pv: lde-cluster-vg 100.00g 50.06g

    The output shows that the SC-1 is primary, now DRBD is running on SC-1.

  4. Perform System Date backup restore again.

    For more information, refer to the section Restore System Data Backup in Restore Backup.

5.2.3.3   Confirming Solution

Check whether the operation is successful, if not, contact next level of Ericsson support.

5.2.4   Error: Campaign Failed Verification

5.2.4.1   Trouble Symptoms

When user verifies the result of preparation of IPWorks Upgrade Package in ECLI, the result shows that the verification is failed.

For example:

(UpgradePackage=IPWORKS.base-AVA90133-3.0.0-2)>show -v
UpgradePackage=IPWORKS.base-AVA90133-3.0.0-2
   activationFallbackTimer=0 <read-only>
   created="2018-03-04T11:56:44" <read-only>
   creatorActionId=4 <read-only>
   execMethod=ONE_STEP
   ignoreBreakPoints=true <default>
   password=[] <empty>
   state=PREPARE_COMPLETED <read-only>
   upgradePackageId="IPWORKS.base-AVA90133-3.0.0-2"
   uri="sftp://root@10.170.57.148:/cluster/UP"
   userLabel=[] <empty>
   activationStep[@1] <read-only>
      description="not yet supported" <read-only>
      name="not yet supported" <read-only>
      serialNumber=1 <read-only>
   administrativeData[@1] <read-only>
      description="" <read-only>
      productionDate="2018-03-04" <read-only>
      productName="IPWORKS.base" <read-only>
      productNumber="AVA90133" <read-only>
      productRevision="3.0.0-2" <read-only>
      type="OTHER" <read-only>
   reportProgress
      actionId=4
      actionName="Verify"
      additionalInfo ""
      progressInfo="Prepare UpgradePackage"
      progressPercentage=100
      result=FAILURE
      resultInfo="Campaign failed verification"
      state=FINISHED step=1 stepProgressPercentage=0

5.2.4.2   Locating Fault

On SC nodes, check whether the following error log exists in /var/log/messages:

For example:

Mar  4 12:05:31 SC-1 CMW: ERROR (cmw-campaign-verify): ERROR: Verify timeout
Mar  4 12:05:31 SC-1 ecimswm: Campaigned failed verification for ERIC-CSM-Merged-2018_03_04-120152
Mar  4 12:05:31 SC-1 ecimswm: Calling immutil_saImmOmAdminOwnerInitialize with owner CoreMwEcimSwM_140656747996928 and releaseOnFinalize TRUE
Mar  4 12:05:31 SC-1 osafimmnd[7760]: NO Ccb 11464 COMMITTED (CoreMwEcimSwM_140656747996928)

If similar error log exists, on all nodes (SC and PL), check whether the following error log exists in /var/log/messages:

For example:

On PL-3: Mar 4 12:03:51 PL-3 osafsmfnd[6743]: NO Failed to send mds message, rc = 2, SMFD DEST 0

On PL-4: Mar 4 12:03:51 PL-4 osafsmfnd[6750]: NO Failed to send mds message, rc = 2, SMFD DEST 0

This example shows that PL-3 and PL-4 have the problem, execute the following commands to fix the problem for PL-3 and PL-4:

amf-adm restart safComp=SMFND,safSu=PL-3,safSg=NoRed,safApp=OpenSAF

amf-adm restart safComp=SMFND,safSu=PL-4,safSg=NoRed,safApp=OpenSAF

5.2.4.3   Confirming Solution

Try again to verify the preparation of IPWorks Upgrade Package in ECLI. If the issue remains, contact next level of Ericsson support.

5.2.5   Login Fails during Rebooting SC

5.2.5.1   Trouble Symptoms

Login to IPWCLI provision system fails when you shut down the SC which SS and SqlmgmNode is running by the command shutdown. After about 100 s, you can log on to the IPWCLI system successfully.

5.2.5.2   Locating Fault

It is not recommended to shut down the OS system. It can cause time consuming for switching resource from one SC to another.

If shutting down system is required, use the command shutdown -h now. This can reduce the time from about 100 s to 35 s.

5.2.5.3   Confirming Solution

Not Applicable.

5.2.6   Health Check Hang

5.2.6.1   Trouble Symptoms

The operation of health check on IPWorks system fails during the upgrade procedure. The operation will stop at some point and will not proceed.

5.2.6.2   Locating Fault

Part of health check operation is to check whether any error exists in IPWorks application logs. If there are too many errors, the health check script could not handle them in this situation and will hang at the moment.

Execute below steps to confirm solution:

  1. Backup all the logs in /storage/no-backup/ipworks/logs/SC-X/* and /storage/no-backup/ipworks/logs/PL-X/* and delete them.
  2. Stop health check process in SC-1 and SC-2.
    1. Execute the command to clear related log.

      # for log in $(find /storage/no-backup/ipworks/logs -mtime -1 -name '*.log*');do > $log;done

    2. Record the process id of hcfd.

      # ps -ef | grep hcfd | grep -v grep

    3. Kill the process.

      # kill -9 <process id>

  3. Perform the health check operation again.

5.2.6.3   Confirming Solution

Check whether the health check operates successfully. If the problem persists, contact next level of Ericsson support.

5.3   IPWCLI

5.3.1   Network Issues

5.3.1.1   Trouble Symptoms

The Storage Server cannot be started.

Also, when the Storage Server is not running and the IPWorks CLI is started, the CLI gets an Error “Network I/O Error: Opening socket: reason: Connection refused: connect” as CLI tries to send a logon request to the SS on Server port.

5.3.1.2   Locating Fault

Check the SS status by using ipw-ctr status ss command.

Ensure that there are no other processes on the system that uses the TCP/IP port used by the Storage Server. The default TCP/IP port for the Storage Server is 17071.

5.3.1.3   Confirming Solution

After the port of Storage Server is changed, check whether Storage Server can be started successfully.

5.3.2   Provisioning Issues

5.3.2.1   Trouble Symptoms

When user selects a range of resource records to delete (for example, the command select naptrrecord and delete), and it contains one or more resource records marked for deletion, the command delete fails.

5.3.2.2   Locating Fault

This issue occurs because records in the range are already marked for deletion.

To avoid this issue, users must execute the command update dnsserver to remove such resource records that are marked for deletion from the MySQL database, then execute the command select naptrrecord and delete to delete the range of resource records.

When an object (Resource) is in processing state (for example, in transaction state) the object is locked by the Storage Server to prevent the other users to modify or delete the same object. If the users send any requests related to the locked object, the SS sends “Locked By Admin” Exception to IPWorks CLI.

5.3.2.3   Confirming Solution

Check whether the delete operation can be performed successfully after executing the command update dnsserver.

5.3.3   Provisioning Rate Too Low

5.3.3.1   Trouble Symptoms

The provisioning through the IPWorks CLI is too slow.

5.3.3.2   Locating Fault

Use the MySQL Benchmark Tool to test the provisioning rate.

For example, test 10, 000 queries and the average number of seconds falls in the range of 20 seconds - 30 seconds.

# /usr/local/mysql/bin/mysqlslap --engine=ndbcluster --socket=/local/ipworks/mysql-cluster/sqlnode/sqlnode.sock -a --auto-generate-sql-load-type=write --number-char-cols =4 --number-of-queries=10000 Benchmark

Average number of seconds to run all queries: 24.565 seconds
Minimum number of seconds to run all queries: 24.565 seconds
Maximum number of seconds to run all queries: 24.565 seconds
Number of clients running queries: 1
Average number of queries per client: 10000

5.3.3.3   Confirming Solution

Not applicable.

5.4   ECLI

This section provides information on resolving problems with ECLI (COM CLI).

5.4.1   ERROR: Transaction validation failed with error code: ComFailure

5.4.1.1   Trouble Symptoms

When user tries to commit configurations in ECLI, it fails with "ERROR: Transaction validation failed with error code: ComFailure".

5.4.1.2   Locating Fault

Check the DN error log file in /var/log/messages. The fault can be caused by DNS service or DHCPv4 service.

5.4.1.3   Confirming Fault

5.5   IPWorks DNS Management

This section provides information on resolving problems with the IPWorks DNS Management in Web GUI.

5.5.1   Trouble Symptoms

This section describes following common IPWorks DNS Management problems as shown in Table 10.

Table 10    Common Trouble Symptoms

Symptoms

Locating Fault

Session time out

See Section 5.5.2.1

Log in failed

See Section 5.5.2.2

5.5.2   Locating Fault

This section describes how to locate common IPWorks DNS Management problems described in Section 5.5.1.

If the problems persist, users need to relogin their sessions. Alternatively, users need to restart the IPWorks DNS Management.

5.5.2.1   Session Time Out

By default, sessions times out after 30 minutes of inactivity. If this happens, user must log in again.

Figure 2   Session Time Out

5.5.2.2   Log in failed

5.5.2.2.1   Tunnel not work or IPWorks SS down

Normally following two cases can cause the error shown as Figure 3.

Figure 3   Tunnel not work or IPWorks SS down

5.5.2.2.2   IPWorks DNS Management Engine down

If the error message in Figure 4 is displayed, there is error in the DNS Management Engine start.

Figure 4   IPWorks DNS Management Engine down

  1. Close the DNS Management.
  2. Check the port 8080 with the following command and ensure that the port is not occupied by other application.

    >netstat -ano | findstr 8080

    TCP 0.0.0.0:8080 0.0.0.0:0 LISTENING 6288

    TCP [::]:8080 [::]:0 LISTENING 6288

5.5.3   Confirming Solution

Redo the login operation and check whether login is successful, if not, contact next level of Ericsson support.

5.6   Storage Server

This section describes Storage Server troubleshooting cases.

5.6.1   Failed to Stop/Start/Restart Storage Server by ipw-ctr

5.6.1.1   Trouble Symptoms

Failure to start Storage Server will cause the SC reboot.

For example, you might see the following output:

SC-1:~ # ipw-ctr start ss

Start ss ==> failed!

After several seconds, the following output might be displayed:

                                                                             
Broadcast message from root@SC-1 (somewhere) (Wed Mar 15 09:42:36 2017):       
                                                                               
The system is going down for reboot NOW!

5.6.1.2   Locating Fault

Do the following steps to trouble shoot the root cause.

  1. Stop both SS on SC-1 and SC-2 immediately.

    # ipw-ctr stop ss SC-1

    # ipw-ctr stop ss SC-2

  2. Check Storage Server status.

    # ipw-ctr status ss <SC-ID>

    <SC-ID> can be SC-1 or SC-2 which Storage Server is running on.

    If output shows saAmfSUPresenceState is failed, go to Step 3. Otherwise, go to Step 4.

  3. Repair Storage Server.

    # ipw-ctr repaired ss <SC-ID>

    After executing this command, execute Step 2 again to check the status.

    If it is failed, continue Step 4. Otherwise, start both Storage Server in both SCs.

  4. Enable trace log in ECLI for Storage Server.

    >dn ManagedElement=1,IpworksFunction=1,IpworksCommonRoot=1,StorageServer=1
    (StorageServer=1)>configure
    (config-StorageServer=1)>level=LOG_LEVEL_TRACE
    (config-StorageServer=1)>commit
    (StorageServer=1)>exit
    

  5. Start Storage Server by executing script ipworks.ss directly in unhealthy SC.
    1. Start Storage Server by script.

      #cd /opt/ipworks/ss/scripts

      #bash +x ipworks.ss start_debug

      Check the output to find if there is any failure information.

    2. Check Storage Server log.

      #cd /storage/no-backup/ipworks/logs/<SC-ID>

      Check log files ipworks_ss_SC-1.log and ss_amf_wrapper.log to find the if there is any failure information.

      Check /var/log/<SC-ID>/messages, and search ipworks.ss to find Storage Server related log.

5.6.1.3   Confirming Solution

From all above logs, you can find which failure cause Storage Server fails to start. If the problem still remains, collect all the related information to ask for next level support.

5.6.2   Storage Server Not Listen on the Port

5.6.2.1   Trouble Symptoms

The SS started successfully, but the SS function is abnormal.

The following output might be displayed:

SC-1:~ # ipwcli

IPWorks> Login:admin 
IPWorks> Password:********
Unexpected error detected: Could not create connection to database server. Attempted reconnect 3 times. Giving up.

5.6.2.2   Locating Fault

  1. Check Storage Server status.

    # ipw-ctr status ss <SC-ID>

    <SC-ID> can be SC-1 or SC-2 which Storage Server is running on.

    For example:

    SC-1:~ # ipw-ctr status ss sc-1

    ss on SC-1 is running, working as an active node.

  2. Check Storage Server process status on the active SC.

    SC-1:~#ps -ef |grep StorageServer |grep -v grep

    	root     13664     1  0 08:35 ?        00:00:08 java -DTCPSTARTPORT=9701 -DTCPENDPORT=9708 -DMULTICASTADDRESS=224.0.0.1 -DMULTICASTPORT=15663 -DBIND_INTERFACE_ADDRESS=169.254.100.23 -Djboss.server.name=ipwss_SC-1 -Djava.net.preferIPv4Stack=true -Djava.util.logging.config.file=/opt/ipworks/jre/java/lib/logging.properties -server -DApp=ipwss -DSysLogin=root -Xmx512m -Xms512m -cp /opt/ipworks/ss/scripts:/opt/ipworks/common/java/AdventNetLogging.jar:/opt/ipworks/common/java/log4j-1.2.15.jar:/opt/ipworks/common/java/ipwcommon.jar:/opt/ipworks/common/java/AdventNetAgentRuntimeUtilities.jar:/opt/ipworks/common/java/dom4j-1.6.1.jar:/opt/ipworks/common/java/ipwse.jar:/opt/ipworks/common/java/AdventNetSnmp.jar:/opt/ipworks/common/java/AdventNetSnmpAgent.jar:/opt/ipworks/ss/java/mysql-connector-java-commercial-5.1.16-bin.jar:/opt/ipworks/ss/java/ipwss.jar:/home/javaoam/lib/jna-4.0.0.jar:/home/javaoam/lib/cglib-2.2.jar:/home/javaoam/lib/javaoam-core-2.2.0-186.jar:/home/javaoam/lib/javaoam-coremw-spi-2.2.0-186.jar ericsson.ipworks.storage.server.StorageServer

    From the output, you can see the process is running.

  3. Check if the SS port is listening by below command.

    # netstat -anp |grep <ss_port>

    The defult value of <ss_port> is 17071. For more information, refer to the section Storage Server Initial Configuration in IPWorks Initial Configuration.

    For example:

    SC-1:~ # netstat -anp |grep 17071

    tcp        0      0 0.0.0.0:17071       0.0.0.0:*       LISTEN      13664/java
    

    From the output, you can see the port 17071 is listening here.

    If the port is not displayed in the output, check if there is alarm related to SS in FM, the specific problems might are:

    • Storage Server, MySQL Cluster Node Unreachable, for example:

         FmAlarm=40
            activeSeverity=MAJOR
            additionalText="This alarm is issued when the MySQL Cluster [ SC-1:SQL Node ] is down or unreachable from [ SC-1 ] ManageNode;uuid:E02973B0-23DD-418B-9F2C-377734F0B523"
            eventType=COMMUNICATIONSALARM
            lastEventTime="2017-03-17T02:43:57.429+01:00"
            majorType=193
            minorType=860161
            originalAdditionalText="This alarm is issued when the MySQL Cluster [ SC-1:SQL Node ] is down or unreachable from [ SC-1 ] ManageNode;uuid:E02973B0-23DD-418B-9F2C-377734F0B523"
            originalEventTime="2017-03-17T02:43:57.429+01:00"
            originalSeverity=MAJOR
            probableCause=306
            sequenceNumber=90
            source="ManagedElement=ipworks_cba,SystemFunctions=1,Fm=1,FmAlarmModel=ipworksEM,FmAlarmType=ipworksEmMysqlClusterNodeUnreachable,Source=SC-1:ManageNode:SC-1:SQL Node"
            specificProblem="Storage Server, MySQL Cluster Node Unreachable"
      

       

      To clear the alarm, refer to Storage Server, MySQL Cluster Node Unreachable.

    • Storage Server, MySQL Database Unreachable, for example:

         FmAlarm=42
            activeSeverity=CRITICAL
            additionalText="This alarm is issued when Storage Server losts communication with Database;uuid:E02973B0-23DD-418B-9F2C-377734F0B523"
            eventType=COMMUNICATIONSALARM
            lastEventTime="2017-03-17T02:44:01.690+01:00"
            majorType=193
            minorType=860162
            originalAdditionalText="This alarm is issued when Storage Server losts communication with Database;uuid:E02973B0-23DD-418B-9F2C-377734F0B523"
            originalEventTime="2017-03-17T02:44:01.690+01:00"
            originalSeverity=CRITICAL
            probableCause=306
            sequenceNumber=92
            source="ManagedElement=ipworks_cba,SystemFunctions=1,Fm=1,FmAlarmModel=ipworksEM,FmAlarmType=ipworksEmSsDbUnreachable,Source=Storage Server"
            specificProblem="Storage Server, MySQL Database Unreachable"
      

       

      To clear the alarm, refer to Storage Server, MySQL Database Unreachable.

After MySQL alarm ceased, wait for 10 seconds to check again.

5.6.2.3   Confirming Solution

If the problem still remains, enable the trace login ECLI for Storage Server as the Step 4 in Section 5.6.1.2 in Section 5.6.1 and contact next level of Ericsson support.

5.7   Server Manager

This section provides information on resolving problems with the IPWorks Server Manager (SM).

For DNS and ASDNS, each has an associated Server Manager component residing on the same machine. The Server Manager serves as the link between the DNS or ASDNS server and the rest of the IPWorks system. All communication between the Storage Server and the DNS or ASDNS server is through the Server Manager.

When the Server Manager starts up, it connects to the SS, logs on, registers as a remote agent for the PS, and reports the status of the PS to the SS. Use ECLI on which the Server Manager is running to configure the data that the Server Manager uses in contacting the SS.

5.7.1   Server Manager Failed to Start

5.7.1.1   Trouble Symptoms

The Server Manager failed to start.

5.7.1.2   Locating Fault

Use the ECLI to configure a higher logging level for the Server Manager (see Section 3.6). Then restart the Server Manager (see Section 2.1.2) and check the log file to find the specific problem.

Note:  
Use debug logging only to diagnose problems. Turn it off during normal operation. This is because the log file grows rapidly when debugging is enabled. This will degrade server performance, especially at higher levels of debug logging.

5.7.1.3   Confirming Solution

5.7.2   Problem in Deleting Server Instance

5.7.2.1   Trouble Symptoms

When an instance is in running status, it cannot be deleted.

5.7.2.2   Locating Fault

To delete a server instance from the IPWorks CLI, ensure that the Server Manager for that server is not running. For instance, to delete a DNS Server from machine 10.0.0.1, stop the DNS Server Manager in 10.0.0.1.

5.7.2.3   Confirming Solution

After stopping the Server Manager, test the behavior again. If the problem persists, contact Ericsson support.

5.7.3   Network Unreachable Exception

5.7.3.1   Trouble Symptoms

If the log of Server Manager reports an exception "Network unreachable" when the Server Manager starts up, the machine is not configured to route packets to the machine on which the Storage Server is running.

5.7.3.2   Locating Fault

Check the interfaces configured for the machine using ifconfig -a and check the routing table using netstat -r.

5.7.3.3   Confirming Solution

When correctly configured, the Storage Server machine can be ping from the Server Manager machine.

5.7.4   Access Denied Exception

5.7.4.1   Trouble Symptoms

When the IPWorks username and password configured for the Server Manager is not a valid combination in the Storage Server, the logon attempt fails. Also, the alarm DNS, Storage Server Unreachable from Server is raised.

5.7.4.2   Locating Fault

Use the ECLI to check the configuration parameters of Server Manager. Ensure that the Storage Server address is pointing to a Storage Server that is running.

Example 14   Verify DNS SM Configuration

>show -v ManagedElement=<Node Name>,IpworksFunction=1,IpworksDnsRoot=1,DnsServer=1,DnsSm=1
DnsSm=1
   dnsSmId="1"
   ssAddress="ipw_ss" <default>
   ssPassword="<Encrypted Password>"
   ssUserName="admin" <default>
   DnsSmLog=1

Then, check the SS status through ipw-ctr. If it is running, try to log on to the ipwcli to verify whether the Storage Server is reachable.

Example 15   Check SS Status

SC-2:~ # ipw-ctr status ss
ss on SC-2 is running, working as an active node

5.7.4.3   Confirming Solution

After correcting the configuration of Server Manager, try the logon attempt again.

Example 16   Verify whether SS is Reachable

# ipwcli
IPWorks> Login: <SS Username>
IPWorks> Password:
Login to server successful.
IPWorks>

5.7.5   Connection Time-out Exception

5.7.5.1   Trouble Symptoms

When the IP address configured for the Primary Storage Server is pointing at a machine that is down or not reachable on the network, the Server Manager tries to contact the Secondary Storage Server. If the Secondary Storage Server is unreachable, then the Server Manager reports a "connect time out" exception. This exception is reported after a delay of 60 seconds by default.

5.7.5.2   Locating Fault

Use ECLI to verify (and correct) the address or password, or both of the Storage Server.

5.7.5.3   Confirming Solution

After correcting the configuration of Storage Server, check whether the Storage Server machine is up and reachable using ping.

5.7.6   Failed Attempting to Get Machine Information

5.7.6.1   Trouble Symptoms

If the machine is not properly configured with a DNS name, the Server Manager reports the message "Failed attempting to get machine information.".

5.7.6.2   Locating Fault

Check that the domain name parameter is properly configured in the /etc/resolv.conf file.

Ensure that the ' hostname' has a corresponding entry in the file /etc/hosts, otherwise the Server Manager does not start.

5.7.6.3   Confirm Fault

After configuring the parameter properly in the configuration file, check whether the exception will be raised again.

5.7.7   New or Renamed Object Already Exists Exception

5.7.7.1   Trouble Symptoms

When there is a DNS server object in the Storage Server with the same hostname but different IP address, the Server Manager reports an exception "New or renamed object already exists".

5.7.7.2   Locating Fault

5.7.7.3   Confirm Fault

After changing the IP address, check whether the exception will be raised again.

5.7.8   Permission Denied Exception

5.7.8.1   Trouble Symptoms

After the Server Manager has connected and registered, it attempts to write the status of the DNS server to the Storage Server. When the user under which the Server Manager is running has no write privileges, the Server Manager reports an exception "Permission to create/change/delete object denied".

5.7.8.2   Locating Fault

Configure the Server Manager using the DNS Server Manager configuration file (see Table 4 or Section 2.3.3) or change the permissions of the user to allow writing.

5.7.8.3   Confirming Solution

After the changing of user permission and correcting the Server Manager configuration, try to perform some write operations to check whether the exception will be raised again.

5.7.9   Cannot Stop the Server Manager

5.7.9.1   Trouble Symptoms

The Server Manager does not stop when the user tries to stop it using ipw-ctr.

5.7.9.2   Locating Fault

Make sure that the root user (or other user under which the Server Manager is running) has a path to /opt/ipworks/common/scripts/. Manually run from a terminal window:

#/opt/ipworks/common/scripts/ipw-ctr stop <type-of-server>sm <hostname>

Where: <type of server> stands for dns or asdns.

If this fails to stop the Server Manager, make sure that the file /var/run/*sm.port exists and has not been modified. If it is necessary to stop the Server Manager using the kill command, use kill without the -9 parameter. This allows the Server Manager to clean up the file /var/run/*sm.port.

5.7.9.3   Confirming Solution

Not applicable.

5.7.10   Failed Sending Command to the DNS Server

5.7.10.1   Trouble Symptoms

The IPWorks CLI reports " Failed sending the RNDC <cmd> command to <servername> server", where <cmd> is " stop", " reload", and so on. The server is the name of the machine on which the DNS server and DNS Server Manager are running. This message is also displayed in the Server Manager log file at LOG_LEVEL_INFO, LOG_LEVEL_DEBUG, or LOG_LEVEL_TRACE.

5.7.10.2   Locating Fault

Make sure that the root user (or other user under which the Server Manager is running) has a path to /opt/ipworks/dns/usr/bin/ and that the file /etc/rndc.key exists and contains a valid TSIG key.

5.7.10.3   Confirming Solution

Run rndc from the command line to verify that the server responds correctly.

Example:

rndc status
version: 2.6.32.12-0.7-default
CPUs found: 1
worker threads: 1
UDP listeners per interface: 1
number of zones: 99
debug level: 90
xfers running: 0
xfers deferred: 0
soa queries in progress: 0
query logging is OFF
recursive clients: 0/0/1000
tcp clients: 0/100
server is up and running

5.7.11   Cannot Find Script

5.7.11.1   Trouble Symptoms

The runscript operation and possibly the update operation cause the Server Manager to execute a script on the DNS server machine. These scripts must be placed in the appropriate scripts directory. If the Server Manager cannot find the script, it reports the message " script not found", where script is the absolute path of the script.

5.7.11.2   Locating Fault

Move the script to the correct scripts directory on the DNS server machine.

5.7.11.3   Confirming Solution

After moving the script file to the correct directory, check whether update operation will raise the message again.

5.7.12   Cannot Execute Message When Running a Script

5.7.12.1   Trouble Symptoms

If the script is in the scripts directory but does not have execute permission for the Server Manager process user (usually root), the Server Manager reports the message " cannot execute".

5.7.12.2   Locating Fault

Change the permissions on the script to allow the Server Manager to execute the script.

Example:

>chmod 555 script_file

5.7.12.3   Confirming Solution

After changing the permissions for the script, perform certain execution for the Server Manager to check whether the message will be reported.

5.7.13   IPWorks CLI Displays DNS Records Slowly

5.7.13.1   Trouble Symptoms

Dynamic resource records are retrieved from the DNS server to be presented to the IPWorks CLI. If the query requires excessive data, it takes a long time to transfer it from the DNS server to the Server Manager, to the Storage Server, then to the user interface.

5.7.13.2   Locating Fault

Formulate queries for dynamic data using filters that minimize the amount of data that is retrieved.

5.7.13.3   Confirming Solution

Check whether the new query still takes a long time.

5.7.14   Large Data Queries Cause Memory Problems

5.7.14.1   Trouble Symptoms

The machine on which the DNS server and Server Manager are running must have enough physical memory to avoid excessive paging. If there is not enough physical memory, the query takes a long time. If necessary, increase the physical memory of the DNS server machine.

Sufficient memory must also be made available to the Java Virtual Machine for the Server Manager to create enough resource record or lease objects. The Server Manager log may record an " Out of memory" exception in response to a query for a large amount of data.

5.7.14.2   Locating Fault

Configure the Java Virtual Machine to use more of the machine memory. To do this, edit the file /opt/ipworks/IPWsm/scripts/ipwsm. In the last line of this script, the Java Virtual Machine is started with the parameter -mx128m, indicating a maximum memory use of 128 MB. Increasing this value allows the Server Manager to use more of the system memory.

5.7.14.3   Confirming Solution

After configuring the machine memory, check whether the queries still take a long time.

5.7.15   DNS Server Performance Drops during Queries

5.7.15.1   Trouble Symptoms

Querying a DNS server for a large amount of data can affect the performance of the machine and thus the performance of the DNS server.

5.7.15.2   Locating Fault

Limit the queries that the Server Manager performs in ipworks_*sm.conf to prevent degradation of the DNS services. For more information, see Table 4 and Section 2.3.3.

5.7.15.3   Confirming Solution

Check whether the revised query still affects the performance.

5.7.16   Status of Server in Interface Disagrees with Current Status

5.7.16.1   Trouble Symptoms

The status shown on a DNS server object displayed in the IPWorks CLI can be down when in fact the DNS server is running, or conversely. The DNS server does not automatically inform the Server Manager (SM) of a change in status. The status field on a DNS server also contains a time stamp, for example, " On 04/30/03 at 09:51:30 server is 'down'”. This does not indicate the status of the server; it only indicates that at a particular time the server had this status.

5.7.16.2   Locating Fault

There is a communication problem possibly between the DNS server and the DNS SM.

For example, a service used the same port as the DNS server. If so, follow the procedures to solve the problem:

  1. Check the alarm for more information.
  2. Stop the DNS server, then the DNS SM.
  3. Start the DNS SM, then the DNS server.
  4. If the problem remains, consult the next level of maintenance support.
Note:  
Through restarting DNS SM, a new port is assigned. For how to start or stop DNS server, see Section 2.1.2.

5.7.16.3   Confirming Solution

Use ipw-ctr to get server status. For more information, see Section 2.1.2.

Example 17   Check Status of DNS Server on PL-3

ipw-ctr status dns pl-3
dns on PL-3 is running.

5.7.17   RNDC Statistics History Is Lost

5.7.17.1   Trouble Symptoms

The rndcstats and clearrndcstats operations use the RNDC command to display BIND server statistics. This works as a history, appending the results for every rndcstats operation until the clearrndcstats operation is called to delete the previous results. The DNS Server stores all RNDC statistics in a single file. A clearrndcstats operation by one user clears the history for all users.

5.7.17.2   Locating Fault

Multiple users of rndcstats must coordinate their use of this operation for any particular DNS Server.

5.7.17.3   Confirming Solution

After the coordination of the users is performed, check the history by using ipwcli.

Note:  
The rndcstats command is issued from the CLI.

Example 18   Show RNDC Statistics History

# ipwcli
IPWorks> select dnsserver <dns-server>
IPWorks> show rndcstats

5.8   DNS Server

This section provides information on resolving problems with the IPWorks DNS Server.

The DNS Server manages DNS data and responds to queries from DNS clients. For more information on DNS management, refer to the section DNS Management in IPWorks Configuration Management.

5.8.1   Master Server Errors

This section describes some common mistakes in configuring master servers.

5.8.1.1   Forgetting to Reload

After changing to a zone, administrators sometimes forget to reload the master server. Thus, while the change was made to the zone configuration, the server is not using the updated information.

5.8.1.2   Forgetting to Update PTR Records

Some applications require that there exist a reverse mapping for each name to address mapping. This is done using PTR records.

Also, when removing forward entries (A and AAAA records) do not forget to delete the corresponding PTR records.

Note:  
The IPWorks CLI can automatically generate PTR records when adding A or AAAA records.

5.8.1.3   Forgetting to Set up Delegations

It is important to have the proper delegations set up in both the parent and child zones. While IPWorks normally takes care of the zones it manages, there may be other DNS servers that need to have delegations to the IPWorks servers and zones and these must be properly configured.

5.8.2   Slave Server Errors

This section describes some common mistakes in configuring slave servers.

5.8.2.1   Forgetting Slave Files

A filename of DNS slave zone should generally be configured in the filename field of slavezone object. So that a backup copy of the zone is kept when a loss of network connectivity with the master servers. It ensures that a backup copy is available for loading if the slave server reboots. Without this, a disconnected slave server that is not able to connect to a master server will have no DNS data to serve.

5.8.2.2   Caching Server Errors

DNS Servers not only serve authoritative data, but they can also be used to answer queries where the answer is not in their authoritative zone. The answers to these queries are then cached for future use.

For this to work, the DNS Servers that are not authoritative for the root "." zone should have a hint zone configured. The hint zone lists the servers authoritative for root. The root zone delegates authority for all top-level domains such as .com, .net, .uk, or .se.

By knowing where the top of the DNS namespace is, the server has a starting point to look for and find the DNS Servers that are authoritative for the name being queried.

Without the hint zone, queries are likely not answered and the DNS Server returns an error code SERVFAIL(server failure).

5.8.2.3   Forwarding Server Errors

It is often an error to configure a forwarding server to forward all requests even when the server is authoritative for one or more zones. If the forwarding server is authoritative for a zone, then the administrator should override the default forwarders setting in the DNS Server object by configuring a null forwarders option for the appropriate zones.

For example, a DNS Server configuration, causing all requests except for those for example.com (and any sub zones) to be forwarded, would have a null forwarders option in the master or slave zone object.

5.8.2.4   Connectivity Errors

If a client cannot connect to one or more DNS Servers, perform the following:

5.8.2.5   Delegation Errors

A DNS delegation is the relationship between a parent and a child zone. It consists of NS and A or AAAA records that allow the parent to tell DNS clients or servers where to send queries that belong to the child. Common delegation errors include the following:

5.8.3   DNS Server Fails to Start after System Boot

5.8.3.1   Trouble Symptoms

After system boot, DNS server can neither run, nor start request, nor reload configuration.

5.8.3.2   Locating Fault

  1. Check the syslog utility for errors. The corresponding path is /var/log/messages. Search for named.
  2. Use the ECLI to enable the debug logging for DNS server (see Example 12), and check the file ipworks_dns.log . Other errors may include:
    • Running the server from the wrong account.
    • Improperly configured network interfaces – use the command ifconfig -a to check.
  3. Use the IPWorks CLI to check the server configuration.

5.8.3.3   Confirming Solution

Use ipw-ctr to check the DNS server status.

5.8.4   Slave Server Fails to Transfer Zone Data from the Master

5.8.4.1   Trouble Symptoms

Slave server fails to transfer zone data from the master server.

5.8.4.2   Locating Fault

  1. Check for errors in the slave zone configuration. See Section 5.8.2.
  2. Check for errors in the master zone configuration. See Section 5.8.1.
  3. Check connectivity between the master and slave servers. For information on Connectivity Errors, see Section 5.8.2.4.
  4. Check the serial numbers of the slave and SOA of master zone. The slave does not initiate a transfer if its serial number is higher than the master’s.

5.8.4.3   Confirming Solution

Not applicable.

5.8.5   Server Query Problems

If a server does not respond to queries by using query utility dig , fails to provide an answer for data it should have, or returns an error for queries, then do the following steps. If only specific clients are having problems, run a query utility on one of those systems.

If the utility reports a time-out, check for connectivity problems. Connectivity problems can include the following:

If the query utility reports a status of NXDOMAIN, then the server is indicating that no resource record exists for the domain name, resource record type, and class. Perform the following to solve a query problem:

If the status is SERVFAIL, the server does not have the answer to the query and may have configuration problems that are preventing it from getting the answer. Check if the server being queried contains a hint file. As DNS is a distributed system, servers need a common connection point. The hint file contains the location of the authoritative servers for the root zone. They provide delegation information for all servers in the namespace.

If the status is REFUSED, the server is configured not to allow queries to proceed. If possible check the settings of the allow-query option or check the match-client and match-destinations.

Check the internetDNS attribute through (BindService=1)>show -v.

Note:  
Internet DNS license is only introduced for IPWorks deployed in KVM.

For information about how to check KeyId, refer to View License Information, Reference [16].

If dig returns ANSWER: 0, it means that the domain name requested does exist but there is no resource record for the type requested. For example, if the user wants PTR records and type dig example.com but forget to mention the wanted PTR records, use dig example.com PTR.

If the query utility does not help, look in the ipworks_dns.log server log file (see Section 3.2.5).

If IncludeRecord is used, consider any content error in the related IncludeFile. This is because IncludeFile can be expanded into the masterzone that is affected by the content in IncludeFile.

For more information about the IncludeRecord and IncludeFile objects, refer to the IncludeRecord and IncludeFile sections of IPWorks DNS, ASDNS, ENUM Parameter Description.

5.8.6   Operations Protected by TSIG Fail

If TSIG is used to restrict access to a server, the following is required:

5.8.7   Incorrect Data Returned for Queries

5.8.7.1   Trouble Symptoms

A DNS answer to a query differs from the expected answer.

5.8.7.2   Locating Fault

For more information on Master Server errors, see Section 5.8.1.

Note:  
If ActiveSelect DNS is enabled on the domain name under question, it may also alter the data returned based on the state of the monitored systems, the source of the query and the ActiveSelect DNS configuration.

5.8.7.3   Confirming Solution

Not applicable.

5.8.8   Bad Data from a Malicious External DNS Server

5.8.8.1   Trouble Symptoms

When a Cache DNS server sends a request to a malicious external DNS server, and the external DNS server probably returns with a negative answer.

5.8.8.2   Locating Fault

  1. Restart the Cache DNS Server.
  2. Stop the communication between the Cache DNS server and the external DNS server if the problem exists.
    Note:  
    How to stop the communication is out of the scope of this document.

5.8.8.3   Confirming Solution

Not applicable.

5.8.9   Bad Data from a Roaming Partner

5.8.9.1   Trouble Symptoms

When a Cache DNS server sends a request to a roaming partner, while the roaming partner is updating the NS records without updating the related Glue records together, the Cache DNS server probably receives a negative answer.

5.8.9.2   Locating Fault

  1. Control the negative cache TTL locally using the parameter max-ncache-ttl (default value: 10,800 s; recommended value: 60 s).

    IPWorks>modify dnsserver dns1 \
    -add option="max-ncache-ttl 60"

    Working on 1 object(s).
    1 object(s) were updated.

    IPWorks> update dnsserver

  2. Flush the stored local cache.

    #rndc flush

  3. Send the query to the roaming partner again.
    • Contact the roaming partner to update the related Glue Records if the problem exists, and IPWorks DNS resumes the query automatically when the roaming partner updates the Glue Records.

5.8.9.3   Confirming Solution

Not applicable.

5.8.10   External Clients Are Unable to Query the Server

5.8.10.1   Trouble Symptoms

The external clients cannot query the server.

5.8.10.2   Locating Fault

5.8.10.3   Confirming Solution

Not applicable.

5.8.11   Dynamic DNS Update Failed

5.8.11.1   Trouble Symptoms

When users try to perform dynamic DNS update, the update fails.

5.8.11.2   Locating Fault

Check that the master zone configuration allows updates. Verify that each dynamic zone (both forward and reverse) includes an allow-update option with an IP address value that includes the IP address of the DHCP server or other DDNS update clients.

It is recommended that users use TSIG for dynamic updates. In this case, make sure that the TSIG keys are the same and that the server security allows updates through the desired TSIG keys.

5.8.11.3   Confirming Solution

Not applicable.

5.8.12   Authoritative Server for Dynamic Zone Crashes

5.8.12.1   Trouble Symptoms

The DNS Server may crash if a zone file is modified for a dynamic zone while the DNS Server is running.

5.8.12.2   Locating Fault

To change a dynamic zone manually, the user must use the following procedure:

  1. Use the ipw-ctr to stop the DNS Server.
  2. Wait for the server to exit.
  3. Delete the zone .jnl file. The path of the file is /etc/ipworks/dns. Removing the .jnl file is critical because the manual edits are not present in the journal, rendering it inconsistent with the contents of the zone file.
  4. Edit the zone file.
  5. Use the ipw-ctr to start the DNS Server.
Note:  
If the journal file is deleted, all the dynamic data will be lost next time the server is restarted.

5.8.12.3   Confirming Solution

Not applicable.

5.8.13   Rename the DNS Server

5.8.13.1   Trouble Symptoms

If user wants to rename the DNS Server, a friendly message is displayed. For example:

IPWorks> modify dnsserver dns1 -set name=dns2
Working on 1 object(s).
DnsServers name cannot be renamed.
No object(s) were updated.

5.8.13.2   Locating Fault

This is because there are too many things such as view, key, acl, masterzone in the dnsserver. The dnsserver name is an important piece of information to maintain the relationship for the related objects. If there are thousands of data existing in DB, it takes long time to finish this. For all the related masterzone objects, zoneid is changed. It is ambiguous whether the records in this zone should be changed as well when a zoneid is changed. So it should be a prevented activity and a friendly error message is given.

Note:  
If the user must change the name of the DNS Server, the user has to delete the dnsserver and all the related objects, then create everything again.

5.8.13.3   Confirming Solution

Not applicable.

5.9   ActiveSelect DNS Server

This section provides information on resolving problems with the IPWorks ActiveSelect DNS (also called ASDNS) and ActiveSelect DNS Monitor.

ActiveSelect DNS is an IPWorks specific feature that allows redundancy to be defined in the network, and allows the performance of complex load balancing than normally possible within the DNS protocol. ActiveSelect DNS is an extension to the IPWorks DNS Server. This extension makes DNS more dynamic when responding to queries. The IPWorks DNS Server with ActiveSelect DNS uses information sent to it from ActiveSelect DNS Monitors so that it can make more intelligent decisions about what information to include in a response.

5.9.1   Order of Returned Addresses Changes

5.9.1.1   Trouble Symptoms

The order of returned addresses might change with each query.

5.9.1.2   Locating Fault

ActiveSelect DNS results are dynamic and depend on the reported status and load of the resources and on statistics that are used to balance the load across the available resources.

By default, round robin is used to balance the load between resources.

5.9.1.3   Confirming Solution

Not applicable.

5.9.2   Address Is Displayed in Responses When the Resource Is Down

5.9.2.1   Trouble Symptoms

An address can appear in responses when the resource is down.

5.9.2.2   Locating Fault

To avoid this issue, try to avoid the following conditions:

5.9.2.3   Confirming Solution

Not applicable.

5.9.3   Address Does Not Appear in Responses When Resource Is Up

5.9.3.1   Trouble Symptoms

An address may not appear in responses when the resource is up.

5.9.3.2   Locating Fault

To avoid this issue, try to avoid the following conditions:

5.10   ENUM Server

This section provides information on resolving problems with the IPWorks ENUM Server.

The IPWorks ENUM Server provides mapping from telephone numbers to domain names or SIP URIs that can be used to route a call.

For more information on concepts of ENUM management, refer to the Section ENUM Management of IPWorks Configuration Management. For more information on ENUM configuration, refer to the Section Configuring ENUM of Configure DNS and ENUM.

5.10.1   ENUM Server Connectivity Errors

5.10.1.1   Trouble Symptoms

A client failed to connect to one or more ENUM servers.

The ENUM server does not respond to requests.

5.10.1.2   Locating Fault

Use the following methods to locate the fault:

5.10.1.3   Confirming Solution

The ENUM server can get successful reply.

5.10.2   Failed to Stop/Start/Restart ENUM Server by ipw-ctr

5.10.2.1   Trouble Symptoms

The ENUM Server works normal, but it cannot be started/stopped/restarted by using ipw-ctr.

The question can be showed by following exampled procedure:

  1. Check the ENUM Server status.

    # ps -ef|grep enum

    root 7234 1 0 Oct27 ? 01:10:58 /opt/ipworks/enum/bin/ipwenum 
    root 29996 16881 0 16:52 pts/1 00:00:00 grep enum
    

  2. Check the ENUM Server status by using ipw-ctr.

    # ipw-ctr status enum pl-3

    safSu=PL-3,safSg=NWA,safApp=ERIC-sv.SVENUM saAmfSUAdminState=UNLOCKED
    (1) saAmfSUOperState=DISABLED(2) saAmfSUPresenceState=UNINSTANTIATED(1) saAmfSUReadinessState=OUT-OF-SERVICE(1)
    

    The output shows that the ENUM Server is stopped or out of service.

  3. Under this condition, execute the command to restart ENUM by using ipw-ctr.

    # ipw-ctr restart enum pl-3

    Stop enum ==> success. 
    Start enum ==> failed!
    

    The output shows that the restart is failed.

  4. Check the status of ENUM Server again.

    # ps -elf | grep enum

    root 7234 1 0 Oct27 ? 01:10:58 /opt/ipworks/enum/bin/ipwenum 
    root 29996 16881 0 16:52 pts/1 00:00:00 grep enum
    

5.10.2.2   Locating Fault

Use amf native command to repair the fault by the following exampled procedure on PL-3:

  1. Repair ENUM AMF status.

    # ipw-ctr repaired enum PL-3

  2. Execute the amf commands.

    # amf-adm lock-in safSu=PL-3,safSg=NWA,safApp=ERIC-sv.SVENUM

  3. Check the ENUM Server status.

    # ps -elf | grep enum

    0 S root 3424 16881 0 80 0 - 1433 pipe_w 17:14 pts/1 00:00:00 grep enum 
    4 S root 30293 1 0 80 0 - 154894 futex_ 16:52 ? 00:00:05 /opt/ipworks/enum/bin/ipwenum 
    

    The ENUM process is running.

  4. Check the ENUM Server status again.

    # ps -elf | grep enum

    0 R root 3560 16881 0 80 0 - 1433 - 17:15 pts/1 00:00:00 grep enum

    The ENUM process is stopped.

5.10.2.3   Confirming Solution

  1. Start ENUM Server by using ipw-ctr.

    # ipw-ctr start enum PL-3

    Start enum ==> success

  2. Check the ENUM Server status.

    # ps -elf | grep enum

    4 S root 4402 1 1 80 0 - 180389 futex_ 17:17 ? 00:00:00 /opt/ipworks/enum/bin/ipwenum 
    0 S root 4562 16881 0 80 0 - 1433 pipe_w 17:17 pts/1 00:00:00 grep enum 
    

    The ENUM process is running.

  3. Check the status of ENUM Server by using ipw-ctr.

    # ipw-ctr status enum pl-3

    enum on PL-3 is running

5.10.3   Error Responses to ENUM Requests

The operator can check the Rcode field to determine the problem reason when ENUM server returns an error response.

Rcode Field

Possible Cause

Solution

1 (Format Error)

The ENUM request is incorrect or contains a syntax error.

Try sending the request again in case it was corrupted during the transmission.

2 (Server Failure)

The ENUM Server cannot connect to an NDB Cluster.

Use the show command in the ndb_mgm tool to check the status of the NDB Cluster.

3 (Name Error)

The query is below an equipped ENUM zone but the specific domain name is not provisioned in the database.

Make sure the specific domain name is provisioned in the database.

4 (Not Implemented)

The ENUM server does not support the Opcode value in the request.

 

16 (Bad Version)

The request contains an OPT resource record with a non-zero version.

 

5.10.4   Errors Related to ERH

The following table contains the error messages related to the ERH and provides the possible cause:

Error Message in Log

Possible Cause

Solution

2008/11/27 10:07:46|AIN|stat|SSN 100 UserId 40 Instance 1 Bind confirmed Failure

Wrong SSN or SPC has been configured, or SS7 stack has the wrong status.

See Section 5.10.4.1

2008/11/27 10:34:19|AIN|warning|Received T_NOTICE with SSN 200 userId 40 DID 1, report casue No trans for Addr of such Nature

No translation type has been mentioned or wrong translation type has mentioned.

See Section 5.10.4.2

2014/07/29 21:18:27|ENUM+|Debug|not found dn


2014/07/29 21:18:27|ENUM+|Debug|not found dnrange


2014/07/29 21:18:27|ENUM+|Debug|sendto in.

The query is not sent to the NPDB by the LDAP protocol.

See Section 5.10.4.3

2014/07/29 22:13:02|ENUM+|Warning|Invalid NPHandler has been used.

The ENUM LDAP switch is not open in ECLI.

See Section 5.10.4.4

5.10.4.1   Check SSN and SPC Configuration

To resolve this problem, do the following:

  1. Check the configuration of AINNode, MAPNode, and INAPNode in IPWCLI. The configuration of LocalSPC and LocalSSN must be the same as SS7 stack which has been installed in the local machine.

    If the configuration is inconsistent, correct the configuration of the objects either in IPWCLI or Signaling Manager.

  2. Set the NPSwitch field of AINNode, MAPNode, and INAPNode to 0, and wait for ENUM to unload the ERH module.
  3. Set the NPSwitch field of AINNode, MAPNode, and INAPNode to 1 , then try again.
  4. Check the status of SS7.

    For information on how to check SS7 stack, refer to the section Verifying Stack Configuration in Configure SS7 for ENUM Number Portability.

5.10.4.2   Check Translation Type and GT

To resolve this problem, do the following:

  1. Check whether the value of translation type in IPWCLI is the same as the configuration of SS7 stack.
  2. Check whether GT has been configured in the SS7 stack. For details, refer to Reconfiguring SS7 Network, Creating and Defining GT Routing.

5.10.4.3   Check EnumDnRange Configuration

To resolve this problem, do the following:

  1. Check the configuration of EnumDnRange.

    # ipwcli

    IPWorks> list enumdnrange

    For example:

    [EnumDNRange 50 8652]
      enumZoneId: 50
      viewId: 0
      enumDnRange: 8652
      scope: 
      destNode: ldap
      updateLevel: 0
    Working on 1 object(s).
    IPWorks>
    

    The destNode must be ldap when this EnumDnRange is configured for the NP by LDAP.

    For example:

    IPWorks> modify enumdnrange <2.5.6.8...> -set destnode=ldap
    Working on 1 object(s).
    1 object(s) were updated
    

5.10.4.4   Open ENUM LDAP Switch

To resolve this problem, do the following:

  1. Enter the ECLI.

    # /opt/com/bin/cliss

    > ManagedElement=<Node Name>,IpworksFunction=1,IpworksDnsRoot=1,IpworksEnumRoot=1,EnumServer=1,Erh=1

  2. Set the parameter ldap to true to open the ENUM LDAP switch.

    (Erh=1>) configure

    (config-Erh=1)> ldap=true

    (config-Erh=1)> commit

5.10.5   NP Traffic Loss

5.10.5.1   Trouble Symptoms

When NP traffic starts, the system drops packets before sending them to the ENUM server. The number of dropped packets is equal to the lost on the client side.

5.10.5.2   Locating Fault

This may be caused by the small value of net.core.rmem_default, which is the default buffer size in FUnit bytes for receiving socket. The problem can be solved by resetting the net.core.rmem_default value to its maximum.

The maximum value can be retrieved by issuing the following command:

# sysctl -a|grep net.core

...
net.core.rmem_max = <max value>
...

The default value can be increased by issuing the following command:

# sysctl -w net.core.rmem_default=<max value>

5.10.5.3   Confirming Solution

After configuring the value, check if any packet lost on the client side.

5.11   ENUM Front End

IPWorks ENUM Frond End (FE) is a component of data layered architecture (DLA), where application and user data are separated in different layers that are implemented in different network functional entities. The role of ENUM FE is to provide the application logic and enable ENUM server to access to CUDB instead of local NDB. CUDB is an extensible, high-performance, subscriber-centric database system, which communicates with IPWorks by LDAP protocol and SOAP protocol.

Figure 5 illustrates the architecture of ENUM FE:

Figure 5   ENUM Front End Generic Architecture

ENUM Server implements a business logic layer. The data of IPWorks ENUM FE is on the CUDB. For traffic handling, ENUM FE queries the user data from the CUDB by LDAP protocol.

ENUM FE Sync implements the cache mechanism of ENUMDnRange and ENUMDnSched. Following list describes how the cache mechanism functions:

  1. ENUM FE Sync acts as SOAP server to handle SOAP notifications from CUDB when ENUMDnRange and ENUMDnSched provisioning.
  2. ENUM FE Sync caches ENUMDnRange and ENUMDnSched in local and re-caches them when they expire.
  3. ENUM FE Sync provides a method to manually refresh the cached ENUMDnRange and ENUMDnSched.

ENUM FE Configuration Pre-Check

Enable ENUM FE Function in ECLI:

To make ENUM FE functions, it must be enabled first:

  1. Log on to the ECLI interface on the SC.

    # ssh <username>@<SC MIP Address> -t -s cli

  2. Configure the MO EnumFE.

    >configure

    (config)>dn ManagedElement=<Node Name>,IpworksFunction=1,IpworksDnsRoot=1,IpworksEnumRoot=1,EnumFE=1

    (config-Erh=1)>enableEnumFE=true

    (config-Erh=1)>commit

    (config-Erh=1)>exit

  3. Restart ENUM server and ENUM FE Sync to make the change takes effect.

    # ipw-ctr restart enum <PL hostname>

    # ipw-ctr restart fesync <PL hostname>

Make Connection Available:

Make sure that all connections are available with databases (CUDB and local MySQL DB cluster).

If any DB connection related alarms are raised, follow the procedures described in the following alarm OPIs:

License is Valid:

Make sure that license for ENUM FE function is valid.

For details, see Section 5.14.1.

5.11.1   No LDAP Connection

5.11.1.1   Trouble Symptoms

The following error message is logged in the log file ipworks_enum.log:

"LDAPProvider::find ldapConnection is null!"

However, no related alarms are raised.

5.11.1.2   Locating Fault

This issue occurs when the connection configuration for ENUM server is not configured. An example is provided as below:

For example:

SC-X:~ # /opt/com/bin/cliss

>ManagedElement=<Node Name>,IpworksFunction=1,IpworksCommonRoot=1,DataBaseInfo=1,CudbManager=1,CudbServiceSite=ENUM,CudbSiteManager=1,CudbSite=<CudbSite Name>, CudbNode=<CudbNode Name>

(CudbNode=<CudbNode Name>)>show -v

CudbNode=<CudbNode Name>
   address="192.168.20.14"
   cudbNodeId="<CudbNode Name>"    
   distinguishedName=""
   password=""
   poolSize=16
   port=389 <default>

5.11.1.3   Confirming Solution

Check whether the same issue occurs after the configuration.

5.11.2   Server Fail in ENUM Response

5.11.2.1   Trouble Symptoms

The Rcode field of ENUM response is Server Fail.

5.11.2.2   Locating Fault

  1. When ENUM Zone in IPWorks does not match the ENUM record in CUDB. Make sure that there is an ENUM Zone matching this query.

    Each ENUM record in CUDB must match an ENUM Zone. Otherwise, it’s unavailable to ENUM Server. For NAPTR queries, only those can find valid zones in ENUM Server will be continued with the following ENUM processing. For example, two NAPTRs in CUDB:

    fqdn=1.2.3.4.5.6.7.8.9.0.3.3.1.e164.iptelco.com 
    fqdn=1.2.3.4.5.6.7.8.9.0.3.3.2.e164.iptelco.com
    

    An EnumZone object e164.iptelco.com must be created:

    IPWorks>create enumzone 1 -set enumzonename="e164.iptelco.com"

    IPWorks>exit

    Note:  
    When the ENUM FE is running, the configuration of EnumZone impacts the performance.

  2. Make sure that DB connection is available, including CUDB and NDB. For details, see Make Connection Available.
  3. Make sure that ENUM server process is running by executing the command ps –ef|grep ipwenum.

5.11.2.3   Confirming Solution

Check whether the Rcode is still Server Fail after the configuration.

5.11.3   Failed to Cache ENUMDnSched to Local MySQL Cluster (for ENUM)

5.11.3.1   Trouble Symptoms

The following warning message is logged in ipwenum.log:

"Tuple already existed when attempting to insert"

5.11.3.2   Locating Fault

The same ENUMDnSched are cached into MySQL Cluster at the same time on both ENUM servers. The reason is, both ENUM servers receive the same ENUM query at the same time, then search in CUDB and cache fetched record to IPWorks MySQL Cluster, this results in that one success and other fail because this record is existed.

5.11.3.3   Confirming Solution

This is warning message, and there is no side effect for any ENUM FE function.

5.11.4   Failed to Cache ENUMDnSched to Local MySQL Cluster (for ENUM FE Sync)

5.11.4.1   Trouble Symptoms

The following error message is logged in the ipworks_fesync.log:

"enumDnSchedCache is disable"

5.11.4.2   Locating Fault

ENUM FE Sync receives an EnumDnSched SOAP message, but the switch enableEnumDnSchedCache is disabled.

  1. Log on to the ECLI.

    # ssh <username>@<OAM IP Address> -t -s cli

  2. Enable EnumDnSched cache by configuring MO EnumFE.

    >configure

    (config)>dn ManagedElement=<Node Name>,IpworksFunction=1,IpworksDnsRoot=1,IpworksEnumRoot=1,EnumFE=1

    (config-EnumFE=1)>enableEnumDnSchedCache=true

    (config-EnumFE=1)>commit

    (config-EnumFE=1)>exit

    Note:  
    When the value of enableEnumDnSchedCache is set to false, all the cached EnumDnSched in the local will be removed.

  3. Restart the ENUM server and ENUM FE Sync to make the changes take effect.

    # ipw-ctr restart enum <PL hostname>

    # ipw-ctr start fesync <PL hostname>

5.11.4.3   Confirming Solution

Check whether the same issue occurs after the configuration.

5.11.5   Failed to Refresh EnumDnRange

There are totally 4 typical cases about this chapter.

5.11.5.1   Case 1

5.11.5.1.1   Trouble Symptoms

When you perform manual refresh on EnumDnRange by using /opt/ipworks/enumfe/scripts/manual_refresh ENUMDnRange, you will receive the following error message:

"EnumDnRange is initialing."

5.11.5.1.2   Locating Fault

When ENUM FE Sync is starting, if there is no EnumDnRange in local MySQL cluster, ENUM FE Sync will get the EnumDnRange from CUDB and store it in the local MySQL Cluster. Meanwhile, ENUM FE Sync receives a command "Manual refresh EnumDnRange", then it will report this message "EnumDnRange is initialing.".

5.11.5.1.3   Confirming Solution

It is recommended that the manual refresh is performed after the initial.

5.11.5.2   Case 2

5.11.5.2.1   Trouble Symptoms

When you perform manual refresh on EnumDnRange by using /opt/ipworks/enumfe/scripts/manual_refresh ENUMDnRange, you will receive the error message by executing the following steps:

  1. Execute the following command:

    PL-3:~ # /opt/ipworks/enumfe/scripts/manual_refresh ENUMDnRange

      % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                     Dload  Upload   Total   Spent    Left  Speed
    100   891    0   295  100   596   3908   7897 --:--:-- --:--:-- --:--:--  7946
    
     keep old enum dnrange in cache,enum dnrange refresh fail, detial reason refer to output.log
    

     
  2. Execute the following command:

    PL-3:~ # vi output.log

    The error message is displayed as follows:

    <?xml version='1.0' encoding='UTF-8'?><soapenv:Envelope xmlns:soapenv=
    "http://schemas.xmlsoap.org/soap/envelope/"><soapenv:Body><soapenv:Fault>
    <faultcode>soapenv:Server</faultcode><faultstring>Can't call rollback when 
    autocommit=true</faultstring><detail /></soapenv:Fault></soapenv:Body></soapenv:Envelope>
    ~                                                   
    

5.11.5.2.2   Locating Fault

We need to delete the record in the DNRANGEEVENTHANDLE table since it has not rollback. The steps are as follows:

  1. Log in the database:

    SC-1: /usr/local/mysql/bin/mysql -P 3307 --protocol=tcp

  2. Choose the ipworks database:

    mysql> use ipworks;

  3. Query all the record(s) of the DNRANGEEVENTHANDLE table:

    mysql> select * from DNRANGEEVENTHANDLE;

    +----+----------------+
    | id | eventhandletag |
    +----+----------------+
    |  1 |              0 |
    +----+----------------+
    1 row in set (0.00 sec)
    

  4. Delete the record:

    mysql> delete from DNRANGEEVENTHANDLE;

    Query OK, 1 row affected (0.00 sec)

  5. Check if there is any record left in the DNRANGEEVENTHANDLE table:

    mysql> select * from DNRANGEEVENTHANDLE;

    Empty set (0.00 sec)

  6. Exit:

    mysql> quit

5.11.5.2.3   Confirming Solution

It is recommended that the manual refresh is performed after the above action.

5.11.5.3   Case 3

5.11.5.3.1   Trouble Symptoms

When you perform manual refresh on EnumDnRange by using /opt/ipworks/enumfe/scripts/manual_refresh ENUMDnRange, you will receive the error message by executing the following command:

PL-3:~ # /opt/ipworks/enumfe/scripts/manual_refresh ENUMDnRange

The error message is displayed as follows:

% Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0curl: 
(7) Failed to connect to 127.0.0.1 port 8080: Connection refused

 keep old enum dnrange in cache,enum dnrange refresh fail, detial reason refer to output.log

5.11.5.3.2   Locating Fault

Execute the following command:

PL-3:~ # ipw-ctr status fesync

If the output is as one of the below items, you can conclude that the issue occurs when fesync in PL-3 is stopped, out of service or working as a standby node.

Or

5.11.5.3.3   Confirming Solution

The command manual_refresh must be executed only on the PL with active fesync.

5.11.5.4   Case 4

5.11.5.4.1   Trouble Symptoms

When you perform manual refresh on EnumDnRange by using /opt/ipworks/enumfe/scripts/manual_refresh ENUMDnRange, you will receive the error message by executing the following steps:

  1. Execute the following command:

    PL-3:~ # /opt/ipworks/enumfe/scripts/manual_refresh ENUMDnRange

    % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                     Dload  Upload   Total   Spent    Left  Speed
    100   891    0   295  100   596   1067   2157 --:--:-- --:--:-- --:--:--  2167
    
     keep old enum dnrange in cache,enum dnrange refresh fail, detial reason refer to output.log
    

  2. Execute the following command:

    PL-3:~ # vi output.log

    The error message is displayed as follows:

    <?xml version='1.0' encoding='UTF-8'?><soapenv:Envelope xmlns:soapenv=
    "http://schemas.xmlsoap.org/soap/envelope/"><soapenv:Body><soapenv:Fault>
    <faultcode>soapenv:Server</faultcode><faultstring>no available ldap connection
    </faultstring><detail /></soapenv:Fault></soapenv:Body></soapenv:Envelope>

5.11.5.4.2   Locating Fault

Refer to Section 5.11.1.2 of Section 5.11.1.

5.11.5.4.3   Confirming Solution

The command manual_refresh must be executed after the problem is solved.

5.11.6   Cannot Find ENUM Zone

5.11.6.1   Trouble Symptoms

The following error message is logged in ipworks_fesync.log:

"can't find enum zone!"

5.11.6.2   Locating Fault

When ENUM FE Sync receives an ENUMDnSched SOAP message, but there is no matched ENUM zone for this record.

Refer to the section Configuring EnumZone according to CUDB ENUM records in Configure DNS and ENUM.

5.11.6.3   Confirming Solution

Check whether the same issue occurs after the configuration.

5.12   Radius AAA Server

This section provides information on resolving problems with the IPWorks Radius AAA Server.

5.12.1   Radius AAA Server Process Not Running

5.12.1.1   Trouble Symptoms

Radius AAA sever processes cannot be started.

5.12.1.2   Locating Fault

Use any of the following methods to locate the fault:

5.12.1.3   Confirming Solution

Contact the support to fix the issue that is reported in Radius AAA error log.

5.12.2   Unreachable Radius Traffic

5.12.2.1   Trouble Symptoms

Radius AAA server cannot receive traffic from client when Radius AAA server is in operation.

5.12.2.2   Locating Fault

Use any of the following methods to locate the fault:

5.12.2.3   Confirming Solution

Correct the configuration of eVIP policy flow in ECLI.

5.12.3   AAA Rejects Authentication or Authorization Request

5.12.3.1   Trouble Symptoms

Authentication or authorization request from Radius client is rejected by IPWorks AAA server.

Use tcpdump to capture the packet, you receive the following error message:

Reply-Message : fail to verify user password

For how to capture the packet, refer to Section 7 Appendix A: Example of PM, FM, LM, and AMF Logs.

5.12.3.2   Locating Faults

This issue occurs when the configurations of ShareSecret are not synchronized between ECLI and client.

  1. Check the value of ShareSecret in a Radius client.

    The actual procedure depends on the customer's environment. Details are out of the scope of IPWorks documents.

  2. Check the value of ClientSharedSecret in ECLI.
    # ssh <Username>@<MIP_OAM_IP>
    Password: <Password>
    dn ManagedElement=1,IpworksFunction=1,IPWorksAAARoot=1,IPWorksRadiusAAARoot=1,⇒
    RadiusStack=1,SharedSecretMgr=1,ClientSharedSecretMgr=1,ClientSharedSecret=1
    (ClientSharedSecret=1)>show -v
    ClientSharedSecret=1
       clientIPAddr=<Client IP>
       clientSharedSecretId="1" <default>
       sharedSecretValue=<Shared Secret Value>
       type=ALL <default>
    
    Note:  
    To get OAM IP address, check the oam in /etc/hosts.

  3. Ensure that the value of ShareSecret fetched in Step 1 is the same with the value of shareSecretValue fetched in Step 2.

5.12.3.3   Confirming Solution

Check whether you can send out the requests to the client successfully.

5.12.4   AAA Does Not Proxy Radius Message

5.12.4.1   Trouble Symptoms

Authentication or authorization request from Radius client is not forwarded to target server by IPWorks AAA server.

Use tcpdump to capture the packet, you find the radius message IPWorks AAA server received is not forwarded to other servers.

For how to capture the packet, refer to Section 7 Appendix A: Example of PM, FM, LM, and AMF Logs.

5.12.4.2   Locating Faults

This issue occurs when the configurations of proxy rule is not updated to PL-X from IPWCLI.

  1. Check the configure file /etc/ipworks/<AAA Server host, PL-x>/aaa_radius/aaa_realm.conf. exists in each blade which AAA is running on, for example:

    cat /etc/ipworks/PL-3/aaa_radius/aaa_realm.conf

    Exampled output:

    [REALM]
    name=Ericsson.com
    striprealm=false
    access
    {
    destination={192.168.10.1}
    requestchecklist={( Service-Type = 1 || Service-Type = 2 ) && User-Password ? 1}
    replychecklist={( Service-Type = 1 || Service-Type = 2 )}
    requestchangelist={add:Framed-Protocol="1",delete:Service-Type="2"}
    replychangelist={add:User-Name="AAA-Test@Ericsson.com",add:Framed-Protocol="2",delete:Service-Type="2",replace:Reply-Message="'PAP authenticate successfully.':'Hello,user!'"}
    }
    
    accounting
    {
    destination={192.168.10.1}
    }
    

    The actual content depends on the environment of customer. Details are out of the scope of IPWorks documents.

  2. Check the AAA server configuration in IPWCLI, make sure AAA server are created for each PL that AAA will be running.

    #ipwcli

    #list aaaserver

    [AAAServer aaasrv1]
    Name: aaasrv1
    Address: 169.254.100.3
    
    [AAAServer aaasrv2]
    Name: aaasrv1
    Address: 169.254.100.4
    

  3. Update the configured proxy and realm information is updated to each blade that aaa server will be running.

    #ipwcli

    #update aaaserver

    Result of performing an export is:
    Exported aaa realm Ericsson.com
    Updated the configuration
    Reload proxy realm configuration successfully
    Reload proxy realm configuration container successfully
    

5.12.4.3   Confirming Solution

Check whether AAA server can proxy the requests to the target server.

5.12.5   AAA Rejects EAP-AKA/SIM Authentication Request

5.12.5.1   Trouble Symptoms

IPWorks AAA server rejects the athentication request from Radius client.

Use tcpdump to capture the packet, you can find the following flow:

| ----- Access Request --> |
| <---Access Challenge --- |
| ------Access Request --> |
| <------Access Reject --- | 

For how to capture the packet, refer to Section 7 Appendix A: Example of PM, FM, LM, and AMF Logs.

5.12.5.2   Locating Faults

This issue occurs when the AAA cannot connect HLR, do the followings:

  1. Check the SS7 Stack in IPWorks AAA Server.

    The actual output depends on the environment of customer. Details are out of the scope of IPWorks documents.

    1. Check the SS7 stack configuration by signal manager.

      #/opt/sign/EABss7050/bin/signmgui -own.conf /opt/sign/etc/signmgr.cnf &

      For more details, refer to section Configuring SS7 for Wi-Fi AAA in Configure SS7 for AAA.

    2. Check the SS7 configuration in Radius AAA by COMCLI.

      >ManagedElement=ipworks_cba,IpworksFunction=1,IPWorksAAARoot=1,IPWorksRadiusAAARoot=1,RadiusAAAService=1,IWLANService=1,RadiusSS7Stack=1
      (RadiusSS7Stack=1)>show -v
      RadiusSS7Stack=1
      cpmAddress="ss7cafcpmaddress:6669"
      isdnNumber="1234567"
      isdnNumberNature=NOA_NATIONAL_SIGNIFICANT <default>
      nodeType=1 <default>
      numberOfAAAProcess=10 <default>
      numberOfBEInstance=10
      originalSignalingPointCode=100 <default>
      radiusSs7StackId="1"
      sgsnAddress="192.168.10.13"
      useGT4CallingPartyAddress=false <default>
      
      

  2. Ensure that the Raidus AAA Server is connected to SS7 Stack successfully.

5.12.5.3   Confirming Solution

Check whether you can receive Access Accept from AAA Server.

5.13   EPC AAA Server

This section provides information on resolving problems with the IPWorks EPC AAA Server.

5.13.1   EPC AAA Server Process Not Running

5.13.1.1   Trouble Symptoms

EPC AAA sever processes cannot be started.

5.13.1.2   Locating Fault

Use any of the following methods to locate the fault:

5.13.1.3   Confirming Solution

Contact the support to fix the issue that is reported in EPC AAA error log.

5.13.2   C-diameter Stack Not Running

For details, see Section 5.17 C-Diameter.

5.13.3   Ineffective Diameter over SCTP

5.13.3.1   Trouble Symptoms

The traffic of SCTP is down.

5.13.3.2   Locating Fault

Use the following methods to locate the fault:

5.13.3.3   Confirming Solution

Correct the SS7 Stack configuration. Refer to the Section Configuring SS7 for Diameter over SCTP in Configure SS7 for AAA.

Restart the C-Diameter Stack:

  1. Restart the C-Diameter Stack:
    1. List installed CDIA Service Unit (SU).

      SC-X # cmw-status -v su|grep CDIA

      safSu=PL-4,safSg=NWA,safApp=ERIC-sv.SVCDiameter

      safSu=PL-3,safSg=NWA,safApp=ERIC-sv.SVCDiameter

    2. Restart CDIA SU one by one.

      SC-X # amf-adm restart safSu=PL-4,safSg=NWA,safApp=ERIC-sv.SVCDiameter

      SC-X # amf-state su all safSu=PL-4,safSg=NWA,safApp=ERIC-sv.SVCDiameter

      SC-X # amf-adm restart safSu=PL-3,safSg=NWA,safApp=ERIC-sv.SVCDiameter

      SC-X # amf-state su all safSu=PL-3,safSg=NWA,safApp=ERIC-sv.SVCDiameter

  2. Restart the EPC AAA Server:

    SC-X:~ #ipw-ctr restart aaa_diameter PL-3

5.13.4   High failure ratio caused by discarding DERs

5.13.4.1   Trouble Symptoms

A High failure ratio is caused by discarding DERs.

5.13.4.2   Locating Fault

AAA server deals with DERs from UE and gets the response from HSS then replies DEAs to UE correctly.

AAA server can deal with DERs in the case of different sessionid from UE.

When UE sends DERs with same sessionid to IPWorks, AAA server deals with only the first DER and deletes the remaining requests, then sends DEAs "DIAMETER_UNABLE_TO_COMPLY". It is a normal working way for AAA server.

In this case, if the AAA server can not deal with the first DER from UE, please report the problem to the maintenance support through a CSR.

5.14   License Problems

This section describes licenses related troubleshooting cases.

5.14.1   License Control Problem

5.14.1.1   Trouble Symptoms

When the user creates an ENUMDNSCHED object, the operation is rejected.

5.14.1.2   Locating Fault

The ENUMDNSCHED object is controlled by ENUMDNSCHED Capacity license. When any problem in license control happens, the output " License exception detected: <Fault Reason>" is shown in the ipwcli. See Table 11 for details.

When the problem happens, the specific server might receive license-related alarms, such as License Management, License Key Not Available, License Management, Capacity Usage Threshold Reached.

Table 11    Fault Reason and How to Locate the Fault

ID

Fault Reason

Locating Fault

1

The license key file used by LM is not available. For details, refer to License Management, Key File Fault.

>ManagedElement=<Node Name>,SystemFunctions=1,Lm=1
>show lmState


Check whether the output is LOCKED. If so, refer to License Management, Key File Fault.

2

The operation mode in the current version of License Manager is not supported by JavaOaM.

Collect the cmw-repository-list and contact Ericsson support.

3

License is expired. Please update license. For details, refer to License Management, License Key Not Available

>ManagedElement=<Node Name>,SystemFunctions=1,Lm=1
>show all
CapacityKey=<Id>
   expiration="<yyyy-mm-dd>"
   keyId="FAT1023219/2"


Check whether the license has expired through the expiration attribute. If the license has expired, refer to License Management, License Key Not Available.

4

The requested license is not yet available for use. It will become valid in the future.

>ManagedElement=<Node Name>,SystemFunctions=1,Lm=1
>show all
CapacityKey=<Id>
   keyId="FAT1023219/2"
   validFrom="<yyyy-mm-dd>"


Check whether the date in validFrom is reached. If the day and time is not reached, refer to License Management, License Key Not Available, or wait for the license to be available.

5

The requested licensed capacity cannot be used because the corresponding license keys are unavailable.

>ManagedElement=<Node Name>,SystemFunctions=1,Lm=1
>show all


Check whether there is CapacityKey=<Id> with keyId="FAT1023219/2" in the output.


If no, refer to License Management, License Key Not Available.

6

  • License capacity limitation has been exceeded. For details, refer to License Management, Capacity Usage Threshold Reached

  • License capacity limitation has been reached or exceeded. For details, refer to License Management, Capacity Usage Threshold Reached

  • License capacity limitation will be exceeded. For details, refer to License Management, Capacity Usage Threshold Reached

>ManagedElement=<Node Name>,SystemFunctions=1,Lm=1
>show all
CapacityKey=<Id>
      licensedCapacityLimit
         value=<LmCapacityValue>


Check whether the number of provisioned ENUMDNSCHED is equal or larger than the value. If so, refer to License Management, Capacity Usage Threshold Reached.

7

  • The requested license key is not installed, check if the requested license is installed, /etc/ipworks/root_cert.cfg exists, connection between License Management server and client is good and inspects the Storage Server logs.

  • Requesting license failed, check if the requested license is installed, /etc/ipworks/root_cert.cfg exists, connection between License Management server and client is good and inspects the Storage Server logs.

Step 1:


>ManagedElement=<Node Name>,SystemFunctions=1,Lm=1
>show all


Check whether there is CapacityKey=<Id> with keyId="FAT1023219/2" in the output.

Step 2:


Check whether the file /etc/ipworks/root_cert.cfg exists or corrupted. If so, contact Ericsson support.

Step 3:


>ManagedElement=<Node Name>,SystemFunctions=1,Lm=1
>publishLicenseInventory


If the result is “ERROR: Call command failed, error code: ComNotExist”. Use the following command:


amf-adm unlock safSu=SC-1,safSg=2N,safApp=ERIC-lm.server.aggregation


amf-adm unlock safSu=SC-2,safSg=2N,safApp=ERIC-lm.server.aggregation

Step 4:


If the fault cannot be located by the above methods, collect the log under /cluster/storage/no-backup/ipworks/logs/SC-1/ipworks_ss_SC-1.log and /cluster/storage/no-backup/ipworks/logs/SC-2/ipworks_ss_SC-2.log.

8

Software issue, restart Storage Server to fix the issue.

Software fault, restart SS.


pw-ctr stop ss <active SC>
ipw-ctr start ss <active SC>

5.14.1.3   Confirming Solution

After applying corresponding methods to resolve the issues, check whether the license is available. For details, refer to View License Information.

5.14.2   Clear the Emergency Unlock Alarm

5.14.2.1   Trouble Symptoms

"Emergency Unlock Reset Key Required" alarm is raised by IPWorks.

5.14.2.2   Locating Fault

Emergency Unlock mode is NOT supported by IPWorks LM component. If Emergency Unlock mode is activated by mistake, the "Emergency Unlock Reset Key Required" alarm will be raised by LM.

5.14.2.3   Confirming Solution

Make sure the license key is existed.

To clear the alarm "Emergency Unlock Reset Key Required", run the following command:

SC-1:~ # ntfsend -s 0 -c 193,6,0 -n "lmId=1" -N "lmId=1" -a "" -p 74 -e 16384

5.15   MySQL NDB Cluster

This section describes NDB Cluster troubleshooting cases.

5.15.1   SQL Node Not Started

5.15.1.1   Trouble Symptoms

The following example shows an error message after executing the command: /etc/init.d/ipworks.mysql show-status. This output indicates that one of SQL Nodes is not started.

[...]
[mysqld(API)]	24 node(s)
id=3	(not connected, accepting connect from any host)
[...]

5.15.1.2   Locating Fault

This issue occurs because Data Node has not been started completely. Check Data Node status by using /etc/init.d/ipworks.mysql show-status, following is a sample output:

[ndbd(NDB)]	2 node(s)
id=27	@169.254.100.1  (mysql-5.6.27 ndb-7.4.8, starting, Nodegroup: 0, *)
id=28	@169.254.100.2  (mysql-5.6.27 ndb-7.4.8, starting, Nodegroup: 0)
[...]

ndbd (NDB), id=27 and id=28 show that the status of Data Node is starting. When any Data Node is no longer starting, which means Data Nodes are started completely, then the SQL Node can be started successfully.

5.15.1.3   Confirming Solution

After the Data Nodes are started completely, start the SQL Node and check whether SQL Node is started successfully.

The following output indicates that both the SQL Nodes are started.

[...]
[mysqld(API)]	24 node(s)
id=3	@169.254.101.1  (mysql-5.6.27 ndb-7.4.8)
id=4 (not connected, accepting connect from SC-2)
[...]

5.15.2   Management Node Down

5.15.2.1   Trouble Symptoms

The Storage Server, MySQL Cluster node Unreachable might be raised when the Management Node is down.

5.15.2.2   Locating Fault

To check if the Management Node is down, use either of the following ways:

To fix the issue, preform the following steps based on the status of Data Node and SQL Node:

5.15.2.3   Confirming Solution

After performing the solution, check whether the Management Nodes are started through ps -ef | grep ndb_mgmd or /etc/init.d/ipworks.mysql show-status.

5.15.3   Data Node Down

5.15.3.1   Trouble Symptoms

The Storage Server, MySQL Cluster Node Unreachable might be raised when the Data Node is down.

5.15.3.2   Locating Fault

In some situations, the data node is down. Users need to start the data node manually by using ipworks.mysql script or must configure some files to avoid the data node down.

Check if there is any error log in /local/ipworks/mysql-cluster/datanode/ndb_<id>_out.log.

The users need to start the data node manually by using ipworks.mysql script, see Section 5.15.3.2.1.

5.15.3.2.1   Starting Data Node

To troubleshoot the issues caused by the Data Node down, perform one or all the following steps:

  1. Check whether the Data Node is down and start the Data Node by using ipworks.mysql script.

    # /etc/init.d/ipworks.mysql show-status

    If the status of Data Node is displayed like the following, it means that the Data Node (id=27) is down:

    [ndbd(NDB)]	2 node(s)
    id=27 (not connected, accepting connect from SC-1)
    

    If the Data Node (id=27) is down, use the following command to start it.

    # /etc/init.d/ipworks.mysql start-ndbd

  2. Check if the issue is caused by the data node memory size problem and fix the specific issues according to Section 5.15.3.2.2.
5.15.3.2.2   Data Node Memory Size Problem

The memory size value of Data Node depends on the size of IPWorks application need. Large Data requires large memory size of Data Node.

Too small memory size also causes several problems, such as ENUM Server or Data Node cannot be started successfully, slow response speed of machine.

Users can adjust the data node memory in /home/ipworks/mysql/confs/ipworks_mgm.conf.

5.15.4   SQL Node Down

5.15.4.1   Trouble Symptoms

The Storage Server, MySQL Cluster Node Unreachable might be raised when the Management Node is down.

5.15.4.2   Locating Fault

To troubleshoot the issues caused by the SQL Node down, do the following:

  1. Check whether the SQL Nodes on SC-1 and SC-2 are started and fix the issue described in Section 5.15.1.
  2. Log on SC-1.

    # ssh <Username>@<SC-1 or SC-2 IP Address>

  3. Check whether the accessing privilege is granted to the NDB.

    #/usr/local/mysql/bin/mysql \
    -P 3307 -h localhost --protocol=tcp

    mysql>select user, host from mysql.user;

    Check the output to see if <SS OAM IP Address> is assigned to the user. For example,

    +------+-----------+
    | user | host      |
    +------+-----------+
    | root | 127.0.0.1 |
    | root | ::1       |
    |      | SC-1      |
    | root | SC-1      |
    |      | ipw_ss    |
    |      | localhost |
    | root | localhost |
    +------+-----------+
    7 rows in set (0.01 sec)
    

    The example shows that the privilege is assigned.

    If an output shows that the privilege is not assigned, use the following commands to grant the privilege on the NDB side:

    # /usr/local/mysql/bin/mysql \
    -P 3307 -h localhost --protocol=tcp

    mysql> grant all privileges on *.* to ''@'ipw_ss'

  4. Repeat Step 2 to Step 3 to check the accessing privilege on SC-2.

5.15.4.3   Confirming Solution

After performing the solution, check whether the SQL Nodes are started through /etc/init.d/ipworks.mysql show-status.

5.15.5   MySQL NDB Cluster Status Abnormal

This section describes how to troubleshoot the issues caused by the abnormal status of MySQL NDB Cluster.

5.15.5.1   Trouble Symptoms

Table 12 lists the figures of the abnormal status of MySQL NDB Cluster.

Table 12    Abnormal Status of MySQL NDB Cluster


Abnormal Status of NDB Cluster (1)


Abnormal Status of NDB Cluster (2)


Abnormal Status of NDB Cluster (3)


Abnormal Status of NDB Cluster (4)

5.15.5.2   Locating Fault

Table 13 lists the situations causing the abnormal status of MySQL NDB Cluster and provides the solutions to the issues.

Table 13    Abnormal Status of MySQL NDB Cluster

Situation

Solution

Command

The Management Node is stopped, and both of the Data Nodes are running (as shown in the Figure 1).

Start the Management Node.

# /etc/init.d/ipworks.mysql start-mgmd

The Management Node is stopped, and only one of the Data Nodes is stopped or in the starting state (as shown in the Figure 2).

Start the Data Node.

# /etc/init.d/ipworks.mysql start-ndbd

The Management Node and one of the Data Nodes are stopped(as shown in the Figure 3).

Start the Management Node and the Data Node.

# /etc/init.d/ipworks.mysql start-mgmd


# /etc/init.d/ipworks.mysql start-ndbd

Both of the Data Nodes are stopped or in the starting state (as shown in the Figure 4).

Start the MySQL NDB cluster.

# /etc/init.d/ipworks.mysql start-ndbcluster

5.15.5.3   Confirming Solution

Users can check the status of MySQL NDB Cluster nodes by /etc/init.d/ipworks.mysql show-status. Figure 6 shows the normal status of NDB Cluster, the Management Node and both of the Data Nodes are running.

Figure 6   Normal Status of MySQL NDB Cluster

5.15.6   MySQL NDB Cluster Cannot Work Normally

This section describes how to recover the NDB Cluster by performing the initial start of the cluster.

5.15.6.1   Trouble Symptoms

The MySQL Cluster cannot work normally, and some serious error might occur. For example, mysql table is missing, Data Node cannot start.

5.15.6.2   Locating Fault

To recover the NDB Cluster, do the following:

  1. Stop Storage Server.

    SC-1:~# ipw-ctr stop ss SC-1

    SC-1:~# ipw-ctr stop ss SC-2

  2. Stop all the running ENUM server and AAA server.

    SC-1:~# ipw-ctr stop enum <PL of running enum>

    SC-1:~# ipw-ctr stop aaa-diameter <PL of running aaa diameter>

  3. Initial start NDB Cluster.

    SC-1:~# /opt/ipworks/ss/scripts/init_ndb.sh

  4. Initialize the Storage Server.

    SC-1:~# /opt/ipworks/ss/scripts/init_ss.sh

5.15.6.3   Confirming Solution

This issue is fixed when the operator can log on to the IPWCLI successfully.

5.15.7   SQL Node Start Failure with Wrong Folder Permission

5.15.7.1   Trouble Symptoms

When you try to start the SQL Node, it fails. Additionally, you receive with the following error message in the error log file /local/ipworks/mysql-cluster/sqlnode/sqlnode.err:

"Fatal error: Can't open and lock privilege tables: Table 'host' is read only"

This issue occurs because of the wrong permission for the SQL Node data related folders.

Note:  
In the normal condition, the folder permission must not be changed. However, if the folder permission is changed, and this change causes the issue, the operator should follow this method to recover the startup of SQL Node.

5.15.7.2   Locating Fault

Check permission for the following folder, make sure that the permission for each level of the folder is 755:

/local/ipworks/mysql-cluster/sqlnode

Use the following command to change the folder permission:

# chmod 755 <folder name>

5.15.7.3   Confirming Solution

After the folder permission is changed to 755, the SQL Node can start successfully.

For more information on how to start SQL Node, refer to Configure MySQL NDB Cluster.

5.15.8   MySQL Data Lost on an SC

5.15.8.1   Trouble Symptoms

MySQL data is lost on an SC. This issue results in the abnormal work of the MySQL nodes.

5.15.8.2   Locating Fault

Log on to SC-1 and SC-2 respectively, and check whether MySQL data (located in /local/ipworks) is lost.

If the MySQL data on one of the SCs is lost (for example, SC-1), use the following way to recover the data:

  1. Stop all the MySQL Nodes on the SC that have the problem.

    SC-1:~ # /etc/init.d/ipworks.mysql stop

  2. Recover the lost data on the SC.

    SC-1:~ # /etc/init.d/ipworks.mysql recover

5.15.8.3   Confirming Solution

After the recovery operation is performed successfully, you can login the MySQL successfully, and all the data in /local/ipworks is restored.

5.16   Backup and Restore

This section provides information on resolving problems in backing up or restoring IPWorks data.

Backup handling enables the operator to schedule backups at periodic intervals, at a fixed time, or at one point of time. It provides a complete backup for all the configured and provisioned data or a partial backup for only the configured data for IPWorks, and it is possible to restore the system fully to a point of time from when the backup was taken or partially restore the system without the provisioned data as well.

There are several problems could cause backup and restore handling failed. First check the log file if the process has not been finished successfully. The detail error information should be recorded in the log file in /cluster/storage/no-backup/ipworks/logs/<hostname>/ipwbrf.log.

5.16.1   No Enough Space in the Disk

5.16.1.1   Trouble Symptoms

If there is not enough space there, backup or restore handling is failed.

5.16.1.2   Locating Fault

When backup or restore handling started, ensure that there is enough space in the disk, especially for the directory /cluster/ipwbrf and which stored the backup archived file. The tool df can be used to check the capability of disk space. See Section 2.1.11 for details.

5.16.1.3   Confirming Solution

Try the backup or restore operation again, check whether the operation is successful. For details, refer to Create Backup and Restore Backup.

5.16.2   Complete Backup or Restore Failed due to MySQL NDB Process Not Started

5.16.2.1   Trouble Symptoms

The backup or restore operation is failed if MySQL NDB process is not started.

5.16.2.2   Locating Fault

When the complete backup or restore operation is started, MySQL NDB is used to dump the database or restore the database. This issue occurs if the MySQL NDB process is not started or the MySQL NDB process is killed during backup or restore phase.

To fix this issue, do the following:

  1. Stop the MySQL NDB cluster.

    # /etc/init.d/ipworks.mysql stop-ndbcluster

  2. Start the MySQL NDB cluser.

    # /etc/init.d/ipworks.mysql start-ndbcluster

  3. Check the MySQL NDB cluster status.

    # /etc/init.d/ipworks.mysql show-status

  4. Try to perform complete backup or restore again.

    Refer to Create Backup and Restore Backup.

5.16.2.3   Confirming Solution

The complete backup or restore operation is successful.

5.16.3   Restart Server Failed

5.16.3.1   Trouble Symptoms

Even though the restore operation is completed successfully, certain processes does not start automatically.

5.16.3.2   Locating Fault

The restore handling would stop all the running IPWorks processes except MySQL process. These stopped processes will start automatically after the restore operation is completed. Sometimes, certain processes do not restart successfully. Try to start the process manually. If the process still cannot start, refer to service-specific troubleshooting (for example, Section 5.8)

5.16.3.3   Confirming Solution

The process starts.

5.16.4   Slow Backup or Restore Operation

5.16.4.1   Trouble Symptoms

The backup or restore operation takes more time (about 10 minutes) than expected.

5.16.4.2   Locating Fault

This issue occurs when one System Controller (SC) is down in an abnormal way (such as, the power outage).

This is a limitation of the CBA common component BRF-C.

To resolve the issue, start the SC that is down, and make sure that the startup performs successfully and the SC works normally.

Note:  
If the SC is down in a normal way (such as, using poweroff command), the backup or restore will not be affected.

5.16.4.3   Confirming Solution

Check whether the operation is complete in a normal time.

5.17   C-Diameter

This section provides information on resolving problems with C-Diameter.

5.17.1   C-Diameter OperState is DISABLED

5.17.1.1   Trouble Symptoms

C-Diameter OperState is DISABLED and C-Diameter processes cannot be started.

5.17.1.2   Locating Fault

Use the following methods to locate the fault:

5.17.1.3   Confirming Solution

If the output shows that “ OperState=DISABLED”, it represents the C-diameter status is abnormal.

Repair the C-diameter stack on any SC or PL:

# amf-adm repaired safSu=PL-3,safSg=NWA,safApp=ERIC-sv.SVCDiameter

# amf-adm repaired safSu=PL-4,safSg=NWA,safApp=ERIC-sv.SVCDiameter

If the C-diameter stack is repaired successfully, the output of cmw-status will be shown:

safSu=ERIC-CDIA-Runtime-1,safSg=ERIC-CDIA-SG,safApp=ERIC-CDIA-Runtime
AdminState=UNLOCKED(1)
OperState=ENABLED(1)
PresenceState=UNINSTANTIATED(1)
ReadinessState=IN-SERVICE(2)

5.17.2   C-Diameter Stack Cannot Listen the Listening Port (3868)

5.17.2.1   Trouble Symptoms

C-diameter stack cannot listen the listening port (3868).

5.17.2.2   Locating Fault

Use the following methods to locate the fault:

5.17.2.3   Confirming Solution

If any process information or dictionary is not found. Use the following method to repair the environment.

  1. If the dictionaries of Diameter are not installed, use DiaDictManager command to install them on all PLs.

    PL-X:~ # /opt/diacc/bin/DiaDictManager add /etc/ipworks/aaa_diameter/dict/ dictionary_ts29273

    PL-X:~ # /opt/diacc/bin/DiaDictManager add /etc/ipworks/aaa_diameter/dict/*

    After the directories are installed successfully, command output is shown as below:

    PL-X:~ # /opt/diacc/bin/DiaDictManager list

    dictionary_sh
    dictionary_s13
    dictionary_s6b
    dictionary_sta
    dictionary_swm
    dictionary_swx
    dictionary_ts29273
    

  2. Restart the C-Diameter Stack.
    1. List installed CDIA Service Unit (SU).

      SC-X # cmw-status -v su|grep -i CDIA

      safSu=PL-4,safSg=NWA,safApp=ERIC-sv.SVCDiameter

      safSu=PL-3,safSg=NWA,safApp=ERIC-sv.SVCDiameter

    2. Restart CDIA SU one by one.

      SC-X # amf-adm restart safSu=PL-4,safSg=NWA,safApp=ERIC-sv.SVCDiameter

      SC-X # amf-state su all safSu=PL-4,safSg=NWA,safApp=ERIC-sv.SVCDiameter

      SC-X # amf-adm restart safSu=PL-3,safSg=NWA,safApp=ERIC-sv.SVCDiameter

      SC-X # amf-state su all safSu=PL-3,safSg=NWA,safApp=ERIC-sv.SVCDiameter

    3. Restart the EPC AAA Server.

      PL-X:~ #ipw-ctr restart aaa_diameter

5.18   Geographic Redundancy

This section provides information on resolving problems with Geographic Redundancy.

5.18.1   MySQL Replication for Geographic Redundancy Failed on One Site

5.18.1.1   Trouble Symptoms

When the alarm of MySQL Replication for Geographic Redundancy Failed appears on only one site, it means that the MySQL Replication have some problem on this node.

5.18.1.2   Locating Fault

5.18.1.2.1   Checking the AAANSDUser Data (For Non-SIM service)

The replicated AAA user data contains aaansduser, aaapolicy, aaauser, aaauser_policy, aaauser_groupname, and aaausergroup_policy, check if any of them on the two sites are different.

Note:  
All other AAA user data is not replicated automatically, they also must be same on both sites.

Take AAANSDUser as example, check that if the AAANSDUser data on the two sites are different:

  1. Perform checksum on AAANSDUser on SC-1 or SC-2 of Site A:

    # mysql -P3307 -h ipw_sql --protocol=tcp -e "select sum(crc32(concat_ws(',', name, password, imsi, msisdn, apn, userstatus, certificateissuername, certificateid))) from ipw_prov_aaa.aaansduser;"

    Record the output integer value as [CHECKSUM_A].

    This command will perform about 30 s to show the output.

  2. Perform checksum on SC-1 or SC-2 of Site B:

    # mysql -P3307 -h ipw_sql --protocol=tcp -e "select sum(crc32(concat_ws(',', name, password, imsi, msisdn, apn, userstatus, certificateissuername, certificateid))) from ipw_prov_aaa.aaansduser;"

    Record the output integer value as [CHECKSUM_B].

If [CHECKSUM_A] equals to [CHECKSUM_B], it is almost certain that the tables are the same. There is no need to recovering data synchronization. Refer to Storage Server, The MySQL Replication for Geographic Redundancy Failed.

If [CHECKSUM_A] does not equals to [CHECKSUM_B], do the following steps in Section 5.18.1.2.3.

5.18.1.2.2   Checking ENUM User Data (For ENUM Service)

The replicated ENUM user data contains enumzone, enumview, enumzvrel, enumacl, destnode, enumdnrage, enumdnsched, check if any of them on the two sites are different.

Note:  
All other ENUM user data is not replicated automatically, they also should be same on both sites.

Take ENUMZONE as example:

  1. Perform checksum on ENUMZONE on SC-1 or SC-2 of Site A:

    # mysql -P3307 -h ipw_sql --protocol=tcp -e "select sum(crc32(concat_ws(',', id, enumzoneid, enumzonename, indefaultview, defaultttl))) from ipw_enum. ENUMZONE;"

    Record the output integer value as [CHECKSUM_A].

    This command will perform about 30 s to show the output.

  2. Perform checksum on SC-1 or SC-2 of Site B:

    # mysql -P3307 -h ipw_sql --protocol=tcp -e "select sum(crc32(concat_ws(',', id, enumzoneid, enumzonename, indefaultview, defaultttl))) from ipw_enum. ENUMZONE;"

    Record the output integer value as [CHECKSUM_B].

If [CHECKSUM_A] equals to [CHECKSUM_B], it is almost certain that the tables are the same. There is no need to recovering data synchronization. Refer to Storage Server, The MySQL Replication for Geographic Redundancy Failed.

If [CHECKSUM_A] does not equals to [CHECKSUM_B], do the following steps in Section 5.18.1.2.3.

5.18.1.2.3   Recovering Data Synchronization

All these following steps are performed on either SC-1 or SC-2, take AAANSDUser as example:

Note:  
  • All AAANSDUser data on Site B will be erased and resynchronized to that on Site A.
  • AAA User data mentioned above is stored in database ipw_prov_aaa and ENUM user data is stored in database ipw_enum. So, the mysql commands should be applied with corresponding database name and table name in different scenario.

  1. Stop AAANSDUser provision on both Site A and Site B.
  2. On both Site A and Site B, stop MySQL slave:

    # mysql -P3307 -h ipw_sql --protocol=tcp -e “stop slave;”

  3. On Site A, dump AAANSDUser data:

    # mysqldump -P3307 -h ipw_sql --protocol=tcp --no-create-info --opt ipw_prov_aaa.aaansduser > ~/aaansduser_dump.sql

  4. On Site A, transfer the SQL dump file to Site B.

    # scp ~/aaansduser_dump.sql root@[OAM IP of SiteB]:~

  5. On Site A, reset MySQL slave:

    # mysql -P3307 -h ipw_sql --protocol=tcp

  6. On Site B, delete aaansduser:

    # mysql -P3307 -h ipw_sql --protocol=tcp -e “delete from ipw_prov_aaa.aaansduser;”

  7. On Site B, restore AAANSDuser data:

    # mysql -P 3307 -h ipw_sql --protocol=tcp -f ipworks_prov_aaa < ~/aaansduser_dump.sql

  8. On Site B, record File and Position in the output of the following command as [BINLOG_NAME_SITEB] and [BINLOG_POS_SITEB]:

    # mysql -P 3307 -h ipw_sql --protocol=tcp -e “show master status;”

  9. On Site A, configure and start MySQL slave:

    mysql> change master to master_host='<MIP of MySQL Cluster SQL Node in Site B>', master_log_file='<BINLOG_NAME_SITEB>', master_log_pos=<BINLOG_POS_SITEB>,master_user='ipworks',master_password='ipworks',master_port=3307, master_retry_count=86400,master_connect_retry=5;
    mysql> start slave;
    mysql> exit;
    

  10. On Site B, start MySQL slave:

    # mysql -P 3307 -h ipw_sql --protocol=tcp -e “start slave;”

5.18.1.3   Confirming Fault

After MySQL Replication for Geographic Redundancy Failed alarm cleared, use Check steps Section 5.18.1.1 to verify if [CHECKSUM_A] equals to [CHECKSUM_B].

5.18.2   MySQL Replication for Geographic Redundancy Failed On All Sites

5.18.2.1   Trouble Symptoms

When the alarm MySQL Replication for Geographic Redundancy Failed appears on only all site, it means that the MySQL Replication have some problem on all site.

5.18.2.2   Locating Fault

5.18.2.2.1   Checking the AAANSDUser Data (For Non-SIM service)

Before recovery steps, you must check that if the AAANSDUser data on the two sites are different:

  1. Make sure the AAANSDUser data on the two sites are different.
    1. Perform checksum on AAANSDUser on SC-1 or SC-2 of Site A:

      # mysql -P3307 -h ipw_sql --protocol=tcp -e "select sum(crc32(concat_ws(',', name, password, imsi, msisdn, apn, userstatus, certificateissuername, certificateid))) from ipw_prov_aaa.aaansduser;"

      Record the output integer value as [CHECKSUM_A].

      This command will perform about 30 s to show the output.

    2. Perform checksum on AAANSDUser on SC-1 or SC-2 of Site B:

      # mysql -P3307 -h ipw_sql --protocol=tcp -e "select sum(crc32(concat_ws(',', name, password, imsi, msisdn, apn, userstatus, certificateissuername, certificateid))) from ipw_prov_aaa.aaansduser;"

      Record the output integer value as [CHECKSUM_B].

    If [CHECKSUM_A] equals to [CHECKSUM_B], the AAANSDUser data on the two sites are the same, it is almost certain that the tables are the same. So, there is no need to recovering data synchronization. For more detail, refer to Storage Server, The MySQL Replication for Geographic Redundancy Failed.

  2. Make sure perl DBI is installed.

    # perl -e “use DBI;”

    The output shall contain no error messages. If not, install perl-DBI and perl-DBD-mysql.

    # cd /opt/ipworks/sqlnodemgr/scripts/

    # rpm -i libmysqlclient18-10.0.11-6.4.x86_64.rpm perl-DBI-1.628-3.214.x86_64.rpm perl-DBD-mysql-4.021-7.178.x86_64.rpm

5.18.2.2.2   Recovering Data Synchronization for AAA User Data

If the replication of both directions are down, you can recover data synchronization by the following steps.

  1. Check AAANSDUser consistency on SC-1 or SC-2 of Site A.

    # cd /opt/ipworks/sqlnodemgr/scripts/

    # ./ipw-db-checker --mysqld1 h=ipw_sql:P=3307:u=root --mysqld2 h=[MIP prv of Site B]:P=3307:u=ipworks:p=ipworks --database ipw_prov_aaa --tables aaansduser

  2. Check the output.

    If the following output is displayed, the data is consistent. No further operation is needed.

    If the following output is displayed, the data is inconsistent. Continue with next step.

  3. If the result is inconsistent, two cli scripts are generated under /tmp:

    /tmp/sync_commands_for_sqlnode1_aaansduser.cli

    The script contains commands that can make the data on Site A be the same as Site B.

    /tmp/sync_commands_for_sqlnode2_aaansduser.cli

    The script contains commands that can make the data on Site B be the same as Site A.

  4. Review and modify the scripts mentioned above according to the need.
  5. Execute the modified script on SC-1 or SC-2 of Site A.

    #ipwcli -user=[ipwcli User Name] -password=[ipwcli Password] /tmp/sync_commands_for_sqlnode1_aaansduser.cli

  6. Transfer the second script to Site B.

    # scp /tmp/sync_commands_for_sqlnode2_aaansduser.cli root@[OAM IP of Site B]:/tmp/

  7. Execute the modified script on SC-1 or SC-2 of Site B.

    #ipwcli -user=[ipwcli User Name] -password=[ipwcli Password] /tmp/sync_commands_for_sqlnode2_aaansduser.cli

  8. Go back to step 2.
5.18.2.2.3   Checking the ENUM User Data and Radius User Data
  1. Stop the provisioning of user data.
  2. Stop the MySQL Slave on both sites.

    # mysql -P 3307 --protocol=tcp -h ipw_sql

    mysql> stop slave;

    mysql> exit;

  3. Checking the data consistency.

    # mkdir /tmp/db_checker
    # cp /opt/ipworks/common/bin/ipw-db-checker /tmp/db_checker
    # cp /opt/ipworks/common/etc/DbChecker.conf /tmp/db_checker
    # cd /tmp/db_checker
    # ./ipw-db-checker <MIP_PROV_IP of the other Site>  <Database name needed to be checked>
    

    Note:  
    ENUM user data is stored in database ipw_enum while Radius user data is stored in database ipw_prov_aaa.

    For example:

    ./ipw-db-checker "10.175.171.76" "ipw_enum"
    Tables in ipw_enum is: 
    DESTNODE;ENUMACL;ENUMDNRANGE;ENUMDNSCHED;ENUMVIEW;ENUMZONE;ENUMZVREL;
    
    
    Checking table DESTNODE start.
    connect to sqlnode1 ipw_sql:::ipw_enum:DESTNODE
    connect to sqlnode2 10.175.171.76:ipworks:ipworks:ipw_enum:DESTNODE
    reading data...please wait...finished
    comparing data...please wait...finished
    Checking table DESTNODE ------------------------------------------------ Consistent
    
    Checking table ENUMACL start.
    connect to sqlnode1 ipw_sql:::ipw_enum:ENUMACL
    connect to sqlnode2 10.175.171.76:ipworks:ipworks:ipw_enum:ENUMACL
    reading data...please wait...finished
    comparing data...please wait...finished
    Checking table ENUMACL ------------------------------------------------- Consistent
    
    ...

5.18.2.2.4   Recovering Data Synchronization for ENUM User Data and Radius User Data

If the checking result is inconsistent, sql files will be generated in path /tmp/db_checker.

For example:

SC-1:/#ls -l /tmp/db_checker

total 4935052
-rw-r--r-- 1 root root       1504 Jul 17 11:50 DbChecker.conf
-rw-r--r-- 1 root root      14900 Jul 18 15:28 dbchecker.log
-rwxr-xr-x 1 root root    7898108 Jul 17 12:56 ipw-db-checker
-rwxr-xr-x 1 root root    7893923 Jul 17 11:48 ipw-db-checker_back
-rw-r--r-- 1 root root 5032718815 Jul 18 15:28 sync_commands_for_sqlnode1.sql
-rw-r--r-- 1 root root 5032718815 Jul 18 15:28 sync_commands_for_sqlnode2.sql

To synchronize the data between two sites, load the sync_commands_for_sqlnode1.sql in Site A and sync_commands_for_sqlnode2.sql in Site B.

  1. Synchronize the data in Site.

    If sync_commands_for_sqlnode1.sql is not generated, then just skip this step.

    1. Login SC-1 or SC-2 in Site A
    2. Login SQL Node.

      # mysql -P 3307 --protocol=tcp -h ipw_sql
      mysql> use ipw_enum;
      mysql> source /tmp/db_checker/sync_commands_for_sqlnode1.sql;
      mysql> exit;
      

      Note:  
      If Radius user data is to be synchronized, execute use ipw_prov_aaa.

  2. Synchronize the data in Site B.

    If sync_commands_for_sqlnode2.sql is not generated, then just skip this step.

    1. Login SC-1 or SC-2 in Site B.
    2. Login SQL Node.

      # mysql -P 3307 --protocol=tcp -h ipw_sql
      mysql> use ipw_enum;
      mysql> source /tmp/db_checker/sync_commands_for_sqlnode2.sql;
      mysql> exit;
      
      

      Note:  
      If Radius user data is to be synchronized, execute use ipw_prov_aaa.

  3. Changing Master-Host and Setting Binlog.

    Refer to section Change Master-Host and Setting Binlog in IPWorks Geographic Redundancy.

5.18.2.3   Confirming Fault

After the alarm MySQL Replication for Geographic Redundancy Failed is cleared on all sites, use the checking steps in Section 5.18.2.2.1 to verify if [CHECKSUM_A] equals to [CHECKSUM_B].

5.19   Data Migration

The section is a quick troubleshooting guide for the data migration from HP to IPWorks 1.

5.19.1   Backup failed

5.19.1.1   Trouble Symptoms

"Error Copying Configuration files:&mldr;" is displayed.

5.19.1.2   Locating Fault

When the configuration file or folder specified in the rule file does not exist on current environment.

Redo the backup.

5.19.2   Required configuration files did not migrate from HP to IPWorks 1

5.19.2.1   Trouble Symptoms

5.19.2.2   Locating Fault

If configuration files are not backed up, add them into ipw_service_backup_rule.csv.

Redo the backup.

5.19.3   Files missing in the migration process

5.19.3.1   Trouble Symptoms

"Dest file &mldr; does not exists." is displayed.

5.19.3.2   Locating Fault

The issue occurs, when the destination file is not configured correctly in the corresponding rule file.

Check the file name in the rule file, correct it and redo the migration steps.

5.19.4   Failed to import the netconf xml file to ECIM with netconf command

5.19.4.1   Trouble Symptoms

This issue occurs when import the netconf xml file into ECIM using netconf command.

5.19.4.2   Locating Fault

For details of NETCONF for importing the netconf configuration, refer to 5.3 Operation <edit–config> in Ericsson NETCONF Interface.

5.20   IPWorks Scaling

5.20.1   Unable Scale-In PL in ECLI

5.20.1.1   Trouble Symptoms

Take PL-5 as example.

When scale-in IPWorks in ECLI, the error No scale operation possible is reported.

>ManagedElement=1,SystemFunctions=1,SysM=1,CrM=1,ComputeResourceRole=PL-5
	(ComputeResourceRole=PL-5)>configure
	(config-ComputeResourceRole=PL-5)>no provides
	(config-ComputeResourceRole=PL-5)>up
	(config-CrM=1)>commit
	ERROR: Transaction not committed due to validation errors
	Transaction validation failed!
	No scale operation possible, maintenance lock not available

5.20.1.2   Locating Fault

  1. Check PL-5 status.

    >ManagedElement=1,SystemFunctions=1,SysM=1,CrM=1
    (CrM=1)>ComputeResourceRole=PL-5
    (ComputeResourceRole=PL-5)>show -v
    ComputeResourceRole=PL-5
       adminState=UNLOCKED
       computeResourceRoleId="PL-5"
       instantiationState=INSTANTIATING <read-only>
       operationalState=DISABLED <read-only>
       provides="ManagedElement=1,SystemFunctions=1,SysM=1,CrM=1,Role=Default-Role"
       uses="ManagedElement=1,Equipment=1,ComputeResource=PL-5" <read-only>
    

  2. Remove PL-5 after the value of instantiationState changes to INSTANTIATED.

    >ManagedElement=1,SystemFunctions=1,SysM=1,CrM=1,ComputeResourceRole=PL-5
    	(ComputeResourceRole=PL-5)>configure
    	(config-ComputeResourceRole=PL-5)>no provides
    	(config-ComputeResourceRole=PL-5)>up
    	(config-CrM=1)>commit
    

5.20.1.3   Confirming Solution

If the problem still remains, contact next level of Ericsson support.

5.20.2   Failed to Start Scale-Out VM on KVM

5.20.2.1   Trouble Symptoms

On KVM Platform, it fails to start PL with following errors:

cluster1-b-2:~ # virsh start Scale1 
		error: Failed to start domain Scale1
		error: monitor socket did not show up: No such file or directory

5.20.2.2   Locating Fault

  1. Restarts the service libvirtd to fix the error.
  2. If the issue is still there, check the service libvirtd restart log, and find the bug in /etc/hosts.

    cluster1-b-2:~ # service libvirtd status
    	* libvirtd.service - Virtualization daemon
    	   Loaded: loaded (/usr/lib/systemd/system/libvirtd.service; enabled; vendor preset: enabled)
    	   Active: active (running) since Sun 2017-08-27 01:38:58 EDT; 3min 9s ago
    	     Docs: man:libvirtd(8)
    	           http://libvirt.org
    	 Main PID: 19509 (libvirtd)
    	    Tasks: 16 (limit: 512)
    	   CGroup: /system.slice/libvirtd.service
    	           `-19509 /usr/sbin/libvirtd --listen
    	
    	Aug 27 01:38:58 cluster1-b-2 libvirtd[19509]: 2017-08-27 05:38:58.102+0000: 19509: warning : virGetHostnameImpl:707 : getaddrinfo failed for 'cluster1-b-2': Name or service not known
    	Aug 27 01:38:58 cluster1-b-2 systemd[1]: Started Virtualization daemon.
    	Aug 27 01:39:05 cluster1-b-2 libvirtd[19509]: libvirt version: 2.0.0
    	Aug 27 01:39:05 cluster1-b-2 libvirtd[19509]: hostname: cluster1-b-2
    	Aug 27 01:39:05 cluster1-b-2 libvirtd[19509]: getaddrinfo failed for 'cluster1-b-2': Name or service not known
    

  3. Add host name cluster1-b-2 in /etc/hosts and restart service libvirtd.

5.20.2.3   Confirming Solution

If the problem still remains, contact next level of Ericsson support.

5.20.3   Unable Scale-Out PL for Core Middleware

5.20.3.1   Trouble Symptoms

After scale-out operation is taken by heat stack-update, the new PL-5 cannot be scaled out and the compute resource cannot be found in ECLI DN:

>ManagedElement=1,SystemFunctions=1,SysM=1,CrM=1

5.20.3.2   Locating Fault

Check the failure reason by following steps:

  1. Check both SCs /var/log/messages.

    #grep -E "CSM|clustermonitor" /var/log/messages

    You can find the SC-1 has more PL-5 scale-out log and related CSM log as below:

    Sep  7 07:41:11 SC-1 clustermonitor: cmw-node-up received
    Sep  7 07:41:11 SC-1 clustermonitor: ClusterMonitorTimer::stop()
    Sep  7 07:41:11 SC-1 clustermonitor: ClusterMonitorTimer::start() called with timeout 140
    Sep  7 07:43:31 SC-1 clustermonitor: ClusterMonitorTimer::stop()
    Sep  7 07:43:31 SC-1 clustermonitor: searchObjectNames error 12
    Sep  7 07:43:31 SC-1 clustermonitor: send signal to thread-Scaleout, vector.size 1
    Sep  7 07:43:31 SC-1 clustermonitor: got scaleout signal, starting scale out...
    Sep  7 07:43:31 SC-1 clustermonitor: ElasticEngine_Impl::scaleOut : node<PL-5> : not in cluster
    Sep  7 07:43:31 SC-1 clustermonitor: ElasticEngine_Impl::scaleOut : node<PL-5> : continue scale out operation
    Sep  7 07:43:31 SC-1 clustermonitor: scaleOut: maint_lock_cnt =1
    Sep  7 07:43:31 SC-1 clustermonitor: successful to set state <1> for EE
    Sep  7 07:43:32 SC-1 clustermonitor: Create ComputeResourceRole object request
    Sep  7 07:43:32 SC-1 clustermonitor: ComputeResourceRole object Successfully created
    Sep  7 07:43:32 SC-1 clustermonitor: addNodeToScalingList <PL-5>
    Sep  7 07:43:32 SC-1 clustermonitor: CSM job started, EE-state=<1>.
    Sep  7 07:43:33 SC-1 clustermonitor: successful to set state <2> for EE
    Sep  7 07:43:33 SC-1 clustermonitor: successful to set state <3> for EE
    Sep  7 07:43:33 SC-1 clustermonitor: error, csm-apply, err <89>
    Sep  7 07:43:34 SC-1 clustermonitor: Calling /opt/csm/bin/csm-repair after /opt/csm/bin/csm-apply failure
    Sep  7 07:43:34 SC-1 clustermonitor: successful to set state <4> for EE
    Sep  7 07:43:34 SC-1 clustermonitor: error, /opt/csm/bin/csm-repair failed, rc <89>
    Sep  7 07:43:35 SC-1 clustermonitor: Delete ComputeResourceRole object request
    Sep  7 07:43:35 SC-1 clustermonitor: ComputeResourceRole object Successfully deleted
    Sep  7 07:43:35 SC-1 clustermonitor: clearScalingList
    Sep  7 07:43:35 SC-1 clustermonitor: successful to set state <0> for EE
    

    If the error log can not provide enough information, go to next step to check clustermonitor log.

  2. Check SC clustermonitor log.

    #cd /var/opt/coremw/clustermonitor

    #cat clustermonitor.log

    Setting CDF_CONFIGPATH to /tmp/tmp.a456B4Ng47
    Updated unit SH/IPWRAD in directory /usr/lib/ericsson/cba/csm/plugin/SH-IPWRADStuff-SH_IPWRAD
    Updated unit SH/SS7CAF2 in directory /usr/lib/ericsson/cba/csm/plugin/SH-SS7CAF2
    Updated unit SH/IPWDIA in directory /usr/lib/ericsson/cba/csm/plugin/SH-IPWDIAStuff-SH_IPWDIA
    Updated unit SH/CoreMW1 in directory /usr/lib/ericsson/cba/csm/plugin/SH-CoreMW1-CXC12345
    Updated unit SH/CoreMW2 in directory /usr/lib/ericsson/cba/csm/plugin/SH-CoreMW2-CXC12345
    Updated unit SH/IPWENUM in directory /usr/lib/ericsson/cba/csm/plugin/SH-IPWENUMStuff-SH_IPWENUM
    Updated unit SH/LDE in directory /usr/lib/ericsson/cba/csm/plugin/LDE_SH
    Updated unit SH/SS7CAF1 in directory /usr/lib/ericsson/cba/csm/plugin/SH-SS7CAF1
    Updated unit SH/IPWDNS in directory /usr/lib/ericsson/cba/csm/plugin/SH-IPWDNSStuff-SH_IPWDNS
    ERROR exception caught
    <type 'exceptions.IndentationError'>
      File "/usr/share/ericsson/csm/repo/DT-CSM-DT_CSM/lib/python2.7/csm/csmapply.py", line 203, in <module>
        environments = CSMEnvironments)
      File "/usr/share/ericsson/csm/repo/DT-Cdf-DT_Cdf/lib/python2.7/cdf/clicommon.py", line 350, in loadPlugins
        for module in getPluginsInDirectory(pythonDir, filter, verbose):
      File "/usr/share/ericsson/csm/repo/DT-Cdf-DT_Cdf/lib/python2.7/cdf/clicommon.py", line 305, in getPluginsInDirectory
        module = imp.load_source("plugin%s" % (postfix), file)
    unexpected indent (csmplugin.py, line 57)
    

5.20.3.3   Confirming Solution

For this kind of issue, collect the log and then contact next level of Ericsson support.

5.20.4   Unable Scale-Out PL for SS7CAF

5.20.4.1   Trouble Symptoms

After scale-out operation is taken by heat stack-update, the new PL-6 can not scale-out and the compute resource can not be found in ECLI DN:

>ManagedElement=1,SystemFunctions=1,SysM=1,CrM=1

5.20.4.2   Locating Fault

Check the failure reason by following steps:

  1. Check both SCs /var/log/messages.

    #grep -E "CSM|clustermonitor" /var/log/messages

    May 18 13:56:00 SC-1 CSM: ss7caf_csm_plugin_scale_out: EXCEPTION: Command:"sudo /opt/sign/EABss7077/ss7caf_scaling.sh  -t OUT -s  PL-6" returned non zero exit code 1
    May 18 13:56:00 SC-1 osafimmnd[6594]: NO Ccb 1006 COMMITTED (EquipmentOwner)
    May 18 13:56:00 SC-1 clustermonitor: successful to set state <3> for EE
    May 18 13:56:00 SC-1 clustermonitor: error, csm-apply, err <1>
    May 18 13:56:00 SC-1 osafimmnd[6594]: NO Ccb 1007 COMMITTED (EquipmentOwner)
    May 18 13:56:00 SC-1 clustermonitor: Calling /opt/csm/bin/csm-repair after /opt/csm/bin/csm-apply failure
    May 18 13:56:01 SC-1 CSM: scale in started
    May 18 13:56:01 SC-1 CSM: SH/SS7CAF1 started prepare step
    May 18 13:56:01 SC-1 CSM: ss7caf_csm_plugin_scale_in: SS7CAF Scale In plugin - prepare() called for use case ScaleIn
    May 18 13:56:01 SC-1 CSM: SH/SS7CAF1 finished prepare step
    May 18 13:56:01 SC-1 CSM: SH/IPW1 started prepare step
    May 18 13:56:01 SC-1 CSM: IPWorks-ScaleIn: IPWorks plugin - prepare() called for use case ScaleIn
    May 18 13:56:01 SC-1 CSM: SH/IPW1 finished prepare step
    May 18 13:56:01 SC-1 CSM: SH/EVIP started prepare step
    May 18 13:56:01 SC-1 CSM: SH/EVIP finished prepare step
    May 18 13:56:01 SC-1 CSM: SH/CoreMW1 started prepare step
    May 18 13:56:01 SC-1 CSM: CMW-scale_in: prepare uc: ScaleIn
    May 18 13:56:02 SC-1 CSM: CMW-scale_in: scale in node: PL-6
    May 18 13:56:02 SC-1 clustermonitor: Received cluster update, Number of members in cluster=4
    May 18 13:56:02 SC-1 clustermonitor: EE update node leave <PL-6>.
    May 18 13:56:02 SC-1 clustermonitor: searchObjectNames error 12
    May 18 13:56:02 SC-1 clustermonitor: Failure seraching for <CmwMonitorImmCkptId=PL-6,CmwMonitorId=1,CmwSysConfigId=1> object
    May 18 13:56:02 SC-1 clustermonitor: searchObjectNames error 12
    May 18 13:56:02 SC-1 clustermonitor: Successfully write 'downTime' for rdn <CmwMonitorImmCkptId=PL-6> : 1495086962
    May 18 13:56:02 SC-1 clustermonitor: Node "safNode=PL-6,safCluster=myClmCluster" is no longer a member of cluster
    May 18 13:56:04 SC-1 CSM: CMW-scale_in: Clm node already locked
    May 18 13:56:07 SC-1 CSM: CMW-scale_in: exec: sudo /opt/coremw/lib/cmwmdf_gcc cleanup PL-6
    May 18 13:56:07 SC-1 CSM: CMW-scale_in: scale-in node: PL-6 done
    May 18 13:56:07 SC-1 CSM: SH/CoreMW1 finished prepare step
    May 18 13:56:07 SC-1 CSM: SH/LDE started prepare step
    May 18 13:56:07 SC-1 CSM: LDE OS plugin - prepare called for use case ScaleIn (repair: True)
    May 18 13:56:07 SC-1 CSM: SH/LDE finished prepare step
    May 18 13:56:07 SC-1 CSM: SH/SS7CAF1 started perform step
    May 18 13:56:07 SC-1 CSM: ss7caf_csm_plugin_scale_in: SS7CAF Scale In plugin - perform() called for use case ScaleIn
    May 18 13:56:07 SC-1 CSM: ss7caf_csm_plugin_scale_in: model.yml file not found. Will run check based on SwM 1.0.
    May 18 13:56:07 SC-1 CSM: ss7caf_csm_plugin_scale_in: Checking that at least one SS7CAF payload is included in Scaling Domain...
    May 18 13:56:07 SC-1 CSM: ss7caf_csm_plugin_scale_in: PL-3 is in Scaling Domain([u'PL-6', u'PL-4', u'PL-3'])
    May 18 13:56:07 SC-1 CSM: ss7caf_csm_plugin_scale_in: Call /opt/sign/EABss7077/ss7caf_scaling.sh with the following args:  -t IN -s  PL-6
    May 18 13:56:07 SC-1 systemd[1]: Starting Session c128 of user root.
    

5.20.4.3   Confirming Solution

For this kind of issue, collect the log and then contact next level of Ericsson support.

  1. Collect SS7CAF scaling log in PL-6.

    #/opt/sign/log/ss7caf_scaling.log[<log number>]

  2. Collect SS7caf log by using SS7CAF tool. Execute the command in PL-6.

    #/opt/sign/EABss7049/bin/sysCollTool.sh

  3. Collect core middleware log.

    Collect clustermonitor log in SC which report many CSM and clustermonitor log in /var/log/messages. The core middleware log is:

    #/var/opt/coremw/clustermonitor/clustermonitor.log

5.20.5   AAA Cannot Start in Scale-Out PL

5.20.5.1   Trouble Symptoms

In the scale-out PL, AAA service cannot start. Take PL-5 as an example, PL-5 is a scale-out PL.

SC-1:/cluster # ipw-ctr status all | grep PL-5 -A20

on PL-5:

        
        aaa_diameter         need repair.
        aaa_radius_stack     need repair.
        aaa_radius_backend   need repair.
        aaasm                is running.

5.20.5.2   Locating Fault

Check the failure reason by following steps:

  1. Check the serviceType, ensure the serviceType includes “AAA”.

    SC-X:~ #/opt/com/bin/cliss

    >ManagedElement=<Node Name>,IpworksFunction=1,IpworksCommonRoot=1

    (IpworksCommonRoot=1)>show -v

    IpworksCommonRoot=1
       ipworksCommonRootId="1"
       serviceType="AAA" 
       DataBaseInfo=1
       StorageServer=1
    

  2. Ensure AAAServer=PL-5 exists under IPWorksAAACommonRoot.

    >ManagedElement=<Node Name>,IpworksFunction=1,IPWorksAAARoot=1,IPWorksAAACommonRoot=1

    (IPWorksAAACommonRoot=1)>show -v

    IPWorksAAACommonRoot=1
       ipworksAAACommonRootId="1"
       AAAServer=PL-3
       AAAServer=PL-4
       AAAServer=PL-5
       AAAServerManager=1
       GTConvertManager=1
    

    If AAAServer=PL-5 doesn’t exist, the following procedures are needed on SC:

    1. Open a new file.

      #vi /tmp/addAAAServer.sh

    2. Insert following content into /tmp/addAAAServer.sh. Note to change the aaaServer to a corresponding PL name.

      #!/bin/bash
      aaaServer=PL-5
      immcfg << EOF
              immcfg -u -c AAAServer aaaServerId=$aaaServer,ipworksAAACommonRootId=1,ipworksAAARootId=1
              immcfg -u -c LogManagement logManagementId=1,aaaServerId=$aaaServer,ipworksAAACommonRootId=1,ipworksAAARootId=1
              immcfg -u -c ThreadControlManager threadControlManagerId=1,aaaServerId=$aaaServer,ipworksAAACommonRootId=1,ipworksAAARootId=1
              immcfg -u -c IPWorksLog logId=AAA_DIAMETER_SERVER,logManagementId=1,aaaServerId=$aaaServer,ipworksAAACommonRootId=1,ipworksAAARootId=1
              immcfg -u -c IPWorksLog logId=AAA_RADIUS_BACKEND,logManagementId=1,aaaServerId=$aaaServer,ipworksAAACommonRootId=1,ipworksAAARootId=1
              immcfg -u -c IPWorksLog logId=AAA_RADIUS_STACK,logManagementId=1,aaaServerId=$aaaServer,ipworksAAACommonRootId=1,ipworksAAARootId=1
              immcfg -u -c ThreadControl processId=AAA_DIAMETER_SERVER,threadControlManagerId=1,aaaServerId=$aaaServer,ipworksAAACommonRootId=1,ipworksAAARootId=1
              immcfg -u -c ThreadControl processId=AAA_RADIUS_BACKEND,threadControlManagerId=1,aaaServerId=$aaaServer,ipworksAAACommonRootId=1,ipworksAAARootId=1
              immcfg -u -c ThreadControl processId=AAA_RADIUS_STACK,threadControlManagerId=1,aaaServerId=$aaaServer,ipworksAAACommonRootId=1,ipworksAAARootId=1
      EOF
      

    3. Execute the script.

      #bash /tmp/addAAAServer.sh

  3. Repair the AAA.

    #ipw-ctr repaired aaa_diameter PL-5

    #ipw-ctr repaired aaa_radius_stack PL-5

    #ipw-ctr repaired aaa_radius_backend PL-5

  4. Start the AAA

    #ipw-ctr start aaa_diameter PL-5

    #ipw-ctr start aaa_radius_stack PL-5

    #ipw-ctr start aaa_radius_backend PL-5

5.20.5.3   Confirming Solution

Use ipw-ctr to get server status. The AAA services should be running.

SC-1:/cluster # ipw-ctr status all | grep PL-5 -A20

on PL-5:

        
        aaa_diameter         is running.
        aaa_radius_stack     is running.        
        aaa_radius_backend   is running.
        aaasm                is running.

5.20.6   Restore User Backup in Superset Cluster

5.20.6.1   Trouble Symptoms

Restore in a superset cluster is used in scenarios where backup was taken in a smaller cluster than the current size of the cluster. Cluster has been scaled out after the backup was taken.

In this situation, the restore operation will be failed.

5.20.6.2   Locating Fault

You can do followings:

  1. Scale-in the IPWorks to remove the PLs which are not included in backup package.
  2. Restore user data with backup package.
  3. Scale out to desired PLs.

5.20.6.3   Confirming Solution

If the problem still remains, contact next level of Ericsson support.

5.20.7   Scale-Out Failure Triggers Scale-Out/Scale-In Cyclically

5.20.7.1   Trouble Symptoms

When scale-out PL-X failed because of incorrect configuration, CMW triggers automatic scale-in, but CMW dose not shutdown VM resource of PL-X. Then IPWorks continues "DHCP recovery", and triggers scale-out/scale-in cyclically. During a scale-in operation, LDE attempts to power off the node(s) being scaled in. This operation relies on ssh connectivity to the payload node, and should shutdown -h now remote command not succeed, there is a risk that the node will remain alive, with active TIPC and IP configuration but is no longer reachable by LDE or middleware. This is a limitation of LDE, details refer to section "Fencing during a scale in" in LDE Scaling User’s Guide.

For scale-out/scale-in cyclically, check SC-X /var/opt/coremw/clustermonitor/clustermonitor.log file.

For example:

SC-1:~ #grep -E 'addNodeToScalingList|hostname "PL-5"' /var/opt/coremw/clustermonitor/clustermonitor.log
Dec 28 10:04:47.105663 clustermonitor [9869][../../../src/clmon/ClusterMonitorImm.cc:0633] IN addNodeToScalingList <PL-5>
Deleting ComputeResource node with hostname "PL-5"
Dec 28 10:10:23.318797 clustermonitor [9869][../../../src/clmon/ClusterMonitorImm.cc:0633] IN addNodeToScalingList <PL-5>
Deleting ComputeResource node with hostname "PL-5"
Dec 28 10:15:59.651078 clustermonitor [9869][../../../src/clmon/ClusterMonitorImm.cc:0633] IN addNodeToScalingList <PL-5>
Deleting ComputeResource node with hostname "PL-5"

5.20.7.2   Locating Fault

After scale-out failed, remove VM instance to fix the issue:

5.20.7.3   Confirming Solution

If the problem still remains, contact next level of Ericsson support.

5.21   IPWorks Deployment for KVM

5.21.1   Both SCs Cyclic Reboot after Deployment

5.21.1.1   Trouble Symptoms

On KVM Platform, after deployment, both SC cyclic reboot. The console log is as below:

[ 1302.862068] drbd drbd0: meta connection shut down by peer.
	[ 1449.584045] drbd drbd0: PingAck did not arrive in time.
	         Starting NFS Mount Daemon...
	[  OK  ] Started NFS Mount Daemon.
	         Starting NFS Server...
	[  OK  ] Started NFS Server.
	[  OK  ] Created slice system-lde\x2dtftpd.slice.
	         Starting LDE tftpd...
	[  OK  ] Started LDE tftpd.
	         Stopping ISC DHCPv4 Server...
	[  OK  ] Stopped ISC DHCPv4 Server.
	         Starting ISC DHCPv4 Server...
	[  OK  ] Started ISC DHCPv4 Server.
	         Starting LDE dumpd...
	[  OK  ] Started LDE dumpd.
	[  OK  ] Stopped LDE CSM update service.
	         Starting LDE CSM update service...
	[  OK  ] Started LDE CSM update service.
	[FAILED] Failed to start NTP Daemon.
	See "systemctl status lde-ntp.service" for details.
	         Stopping NTP Daemon...
	[  OK  ] Stopped NTP Daemon.
	         Starting NTP Daemon...
	[FAILED] Failed to start NTP Daemon.
	See "systemctl status lde-ntp.service" for details.
	         Stopping NTP Daemon...
	[  OK  ] Stopped NTP Daemon.
	         Starting NTP Daemon...
	[  OK  ] Reached target Network is Online.

5.21.1.2   Locating Fault

This issue is mostly caused by disk performance issue. Try to suspend SC2 and start SC1 firstly:

  1. Suspend SC2.

    # virsh suspend SC-2

  2. Wait until SC-1 startup successfully and SC-1 login can be launched.
  3. Resume SC2.

    #virsh resume SC-2

  4. Check drdb status.

    #cat /proc/drbd

5.21.2   Failed to Execute Scripts ipwInit.sh after a Re-deployment of IPWorks for KVM

5.21.2.1   Trouble Symptoms

The following warning message is logged when user executes scripts ipwInit.sh:

" CMW: ERROR (cmw-sdp-import): Already imported [ERIC-LmClientLibrary-CXP9022092_5-R2B30] (/cluster/lm/lm/ERIC-LmClientLibrary-CXP9022092_5-R2B30.sdp), Failed to import '/cluster/lm/lm/ERIC-LmClientLibrary-CXP9022092_5-R2B30.sdp'
 cmw-sdp-import /cluster/lm/lm/*.sdp execute failed, exit"

5.21.2.2   Locating Fault

The issue occurs when the qcow2 image on Host1 is not replaced by the original qcow2 image from the image package.

The following procedure is an example to fix this issue:

  1. Check the parameter QCOW2_DIR configured in ipwenv.conf.

    # grep -r “QCOW2_DIR” /root/auto_deployment/kvm_deployment/config/ipwenv.conf

    Example output:

    #QCOW2_DIR
    QCOW2_DIR=/root/auto_deployment/images
    

  2. Stop VMs and remove image files on both Host1 and Host2.
    • On Host1:

      # virsh destroy SC-1 2>/dev/null

      # rm /root/auto_deployment/images/ipw-sc-22.qcow2

    • On Host2:

      # virsh destroy SC-2 2>/dev/null

      # rm /root/auto_deployment/images/ipw-sc-22.qcow2

  3. Unzip the image package into /root/auto_deployment to get the qcow2 image on Host1.

    # cd /root/auto_deployment

    #tar -zxvf /root/19010-CXP9023809_2_Ux_<Revision Number>.tar.gz

    Example output:

    images/
    images/pxeboot.qcow2
    images/ipw-sc-22.qcow2
    temp/
    temp/mode22/
    temp/mode22/ipw-vnf-22-zone.yaml
    temp/mode22/ipw-vnf-22.yaml
    

  4. Clean up IPWorks.

    #./ipwdeploy.sh -a cleanup

  5. Re-execute the script ipwdeploy.sh on Host1 to re-deploy IPWorks VNF.

    # ./ipwdeploy.sh -a deploy

5.21.2.3   Confirming Solution

Check whether the same issue occurs when running scripts ipwInit.sh.

5.22   IPWorks Deployment for CEE

5.22.1   Fault Symptoms

When you deployed IPWorks successfully, the hosts timezone is mismatching with the /cluster/etc/cluster.conf.Then, you must manually synchronize the timezone.

5.22.2   Locating Fault

You can execute bellow steps to check if need to manually synchronize the timezone.

  1. Log on to host, for example, log on to SC-1.

    #ssh root@<SC-1_IP_Address>

  2. Open cluster.conf file to check timezone information.

    SC-1:~# vi /cluster/etc/cluster.conf

    For example, The timezone in /cluster/etc/cluster.conf as below:

    #Define time zone
    #See/usr/share/zoneinfo/ for supported time zones
    #timezone Asia/Shanghai
    #timezone Asia/Shanghai
    ...

  3. Check the timezone link and the host time.

    SC-1:~# ll /etc/localtime

    For example, execute the command ll /etc/localtime and output as below:

    lrwxrwxrwx 1 root root 38 Mar 6 2017 /etc/localtime

    ../usr/share/zoneinfo/Europe/Stockholm

    SC-1:~# date

  4. Chck if the Step 2 and Step 3 timezone information is matching.
  5. If not matched, you must manually synchronize the timezone.

    #lde-config -r

5.22.3   Confirming Solution

Not applicable.

5.23   "COM SA, AMF Component Instantiation Failed" on SC-1

5.23.1   Trouble Symptoms

An alarm “COM SA, AMF Component Instantiation Failed” is issued on SC-1 node. And SC-1 node will be failed to take ownership of Management VIP (MIP_OAM_IP) when SC-2 is rebooting.

5.23.2   Locating Fault

The RPM com-comsa-cxp*.sle12 is installed on SC-1. However, the folder /opt/com/lib/comp and files under this folder are missing. This causes the COM process to hang before invoking AMF API.

Check the alarm by using ECLI:

>show ManagedElement=1,SystemFunctions=1,Fm=1 -m FmAlarm
...
FmAlarm=148
activeSeverity=MAJOR
additionalText="Instantiation of Component safComp=Cmw,safSu=SC-1,safSg=2N,safApp=ERIC-com.oam.access.aggregation failed"
eventType=PROCESSINGERRORALARM
lastEventTime="2017-07-10T04:19:08.168+00:00"
majorType=18568
minorType=131074
originalAdditionalText="Instantiation of Component safComp=Cmw,safSu=SC-1,safSg=2N,safApp=ERIC-com.oam.access.aggregation failed"
originalEventTime="2017-07-10T04:19:08.168+00:00"
originalSeverity=MAJOR
probableCause=418
sequenceNumber=325
source="ManagedElement=UVIW-DEFRA-03-0001,SaAmfApplication.safApp=ERIC-ComSa,SaAmfSG.safSg=2N,SaAmfSU.safSu=Cmw1,SaAmfComp.safComp=Cmw"
specificProblem="COM SA, AMF Component Instantiation Failed"
additionalInfo
name=""
value="ManagedElement=1,SaAmfCluster.safAmfCluster=myAmfCluster,SaAmfNode.safAmfNode=SC-1"
...

Check the alarm by using CMW command:

SC-1:~ # cmw-status si |grep -A2 -i "comsa"
...
safSi=2N,safApp=ERIC-ComSa AdminState=UNLOCKED(1) AssignmentState=PARTIALLY_ASSIGNED(3)
...

The following procedure is an example to fix this issue:

  1. Run "cluster rootfs -c -o -n 1" on SC-1, reboot SC-1. Then the COMSA RPM will be re-installed, and the directory /opt/com/lib/comp/ and files will be created automatically.
    1. SC-1:~ # cluster rootfs -c -o -n 1
    2. SC-1:~ # reboot
  2. Check if the alarm is still existed.

    SC-1:~ # cmw-status si |grep -A2 -i "comsa"

  3. If the alarm exists, remove it.

    SC-1:~ # amf-adm -t 200 repaired safSu=SC-1,safSg=2N,safApp=ERIC-com.oam.access.aggregation

5.23.3   Confirming Solution

Check the alarm again by using ECLI:

>show ManagedElement=1,SystemFunctions=1,Fm=1 -m FmAlarm

The previous alarm information will be removed when the issue is fixed.

Check the alarm by using CMW command:

SC-1:~ # cmw-status si |grep -A2 -i "comsa"

The previous alarm information will be removed when the issue is fixed.

If the alarm remains or the folder and files are still missing, contact next level of Ericsson support.

5.24   IPWorks Workflows Problems

This section provides information on resolving problems on IPWorks workflows.

All the tasks status is shown on the workflow application GUI. In Workflow Diagram, tasks with blue frame are passed, tasks with yellow frame are in process, and tasks with red frame are failed.

You can find the detailed information about the task on Workflow Log. And the logs are recorded in /ericsson/3pp/jboss/standalone/log/server.log.

For more information about IPWorks Workflow, refer to IPWorks VNF Life Cycle Management.

5.24.1   Authentication Failed

5.24.1.1   Trouble Symptoms

The termination workflow failed at "Collect User Data" task.

The status of workflow is failed.

5.24.1.2   Locating Fault

Log on the VNF-LCM services VM:

#vi /ericsson/3pp/jboss/standalone/log/server.log

Search "Authentication Failed" to view the detailed log.

5.24.1.3   Confirming Solution

Ensure the cloud VIM configuration properties (such as cloudUserName, cloudUserPassword, cloudBaseURL, and cloudTenantId) are configured correctly. For how to check the VIM details, refer to the document VNF-Lifecycle Manager System Administration Guide, Reference [33].

If the issue remains, collect the log and then contact next level of Ericsson support.

5.24.2   Parameter Value Is Wrong

5.24.2.1   Trouble Symptoms

The instantiation workflow failed at "Perform Stack Create" task.

The status of workflow is failed.

The workflow log on GUI shows "Instance cancelled".

5.24.2.2   Locating Fault

Log on the VNF-LCM services VM:

#vi /ericsson/3pp/jboss/standalone/log/server.log

Search "is invalid: Error validating value" to locate the invalid parameter.

5.24.2.3   Confirming Solution

Ensure the parameter value is correct in env.yaml.

If the issue remains, collect the log and then contact next level of Ericsson support.

5.24.3   Missing File in Configuration Directory

5.24.3.1   Trouble Symptoms

The instantiation workflow failed at "Post Instantiation" task, but "Perform Stack Create" task succeeded.

The status of workflow is failed.

The workflow log on GUI shows "No such file or directory".

5.24.3.2   Locating Fault

Log on the VNF-LCM services VM:

#vi /ericsson/3pp/jboss/standalone/log/server.log

Search "No such file or directory" to locate the missing file.

5.24.3.3   Confirming Solution

Check onboarding steps. Refer to the section Onboarding in IPWorks VNF Life Cycle Management.

Ensure the configure file is put under the configuration path.

5.24.4   Environment Has Been Used

5.24.4.1   Trouble Symptoms

The instantiation workflow failed at "Perform Stack Create" task.

The status of workflow is failed.

The workflow log on GUI shows "Instance cancelled".

5.24.4.2   Locating Fault

Log on the VNF-LCM services VM:

#vi /ericsson/3pp/jboss/standalone/log/server.log

Search "In Used" to locate the network or environment resources (such as vlan) that has been used.

5.24.4.3   Confirming Solution

Delete the server that is using the environment, or start a new available environment.

If the issue remains, collect the log and then contact next level of Ericsson support.

5.24.5   IPWorks lm or sql init Failed

5.24.5.1   Trouble Symptoms

The instantiation workflow failed at "Post Instantiation" task, but "Perform Stack Create" task succeeded.

The status of workflow is failed.

The workflow log on GUI shows "Instance Failed".

5.24.5.2   Locating Fault

Log on the VNF-LCM services VM:

#vi /ericsson/3pp/jboss/standalone/log/server.log

Search "ipw_init_phase_one failed" or "ipw_init_phase_two failed" to view the detailed failure of ipw_init_phase_failed.

5.24.5.3   Confirming Solution

Terminate the IPWorks, then instantiate it again.

If the issue remains, collect the log and then contact next level of Ericsson support.

5.24.6   Missing Parameter Value

5.24.6.1   Trouble Symptoms

The instantiation workflow failed at "Perform Stack Create" task.

The status of workflow is failed.

The workflow log on GUI shows "Instance cancelled".

5.24.6.2   Locating Fault

Log on the VNF-LCM services VM:

#vi /ericsson/3pp/jboss/standalone/log/server.log

Search "is not configured" to see what parameter is not configured, such as “EMERGENCY_USER”.

5.24.6.3   Confirming Solution

Ensure the parameter value is correct in env.yaml. Then run instantiation steps, which will regenerate new env.yaml and main.yaml files. For more information about env.yaml and main.yaml, refer to the section Instantiate VNF in IPWorks VNF Life Cycle Management.

If the issue remains, collect the log and then contact next level of Ericsson support.

5.24.7   Termination Script Missed in IPWorks

5.24.7.1   Trouble Symptoms

The termination workflow failed at "Pre Termination" task.

The status of workflow is failed.

The workflow log on GUI shows "Instance Failed".

5.24.7.2   Locating Fault

Log on the VNF-LCM services VM:

#vi /ericsson/3pp/jboss/standalone/log/server.log

Search "No such file or directory" to locate which file is missed.

5.24.7.3   Confirming Solution

Collect the log and then contact next level of Ericsson support.

5.24.8   Workflow Gets no Stacks

5.24.8.1   Trouble Symptoms

The termination workflow failed at "Collect Stack Details" task.

The status of workflow is failed.

The workflow log on GUI shows "Instance Failed".

5.24.8.2   Locating Fault

Log on the VNF-LCM services VM:

#vi /ericsson/3pp/jboss/standalone/log/server.log

You can find the detailed information about this problem, such as "stacklist is none".

IPWorks Workflows can only manage the stacks with tags.

5.24.8.3   Confirming Solution

Workflow can only manage the stacks with tags. Use OpenStack command to delete this stack.

#heat stack-delete <stack-name or stack-id>

If the issue remains, collect the log and then contact next level of Ericsson support.

6   Trouble Reporting

Problems identified that cannot be solved by using this document must be reported to the next level of maintenance support through a Customer Service Report (CSR).

The details of the trouble reporting process is outside the scope of this document.

When collecting information for further support, ensure that all current logs are recorded. See time and date for the logs.

For more information on how to collect information, refer to Data Collection Guideline for IPWorks.

When sending crash dumps, ensure that the dump is of the actual scenario. See time and date for the dump.

7   Appendix A: Example of PM, FM, LM, and AMF Logs

This section gives examples of the Common Component logs.

Example 19   Performance Management Logs

==================
2015/04/29 10:30:31|DNS|Error|PM_Adaptor|system 140548769142528 - 
 /vobs/ims/ipworks/src/common/coremw_adaptor/pm_adaptor_scc/src/PmObserver
 .cpp:27 initialize. saPmInitialize FAILED: 4
2015/04/29 10:30:42|DNS|Error|PM_Adaptor|system 140548733282064 - 
 /vobs/ims/ipworks/src/dns/dnspm_scc/src/PmObserverImpl.cpp:374 uploadPmData. 
 PM re-initialize FAILED: 4
2015/04/29 10:30:42|DNS|Error|PM_Adaptor|system 140548733282064 - 
 /vobs/ims/ipworks/src/dns/dnspm_scc/src/PmObserverImpl.cpp:671 uploadPmData. 
 pm not intialized!
2015/04/29 10:30:43|DNS|Error|PM_Adaptor|system 140548769142528 - 
 /vobs/ims/ipworks/src/dns/dnspm_scc/src/PmObserverImpl.cpp:144 initialize. 
 saPmPGaugeRefGet FAILED: 9
2015/04/29 10:30:53|DNS|Error|PM_Adaptor|system 140548733282064 - 
 /vobs/ims/ipworks/src/dns/dnspm_scc/src/PmObserverImpl.cpp:404 uploadPmData. 
 saPmPGaugeIntegerSet FAILED: 9

Example 20   Fault Management Logs

==================
2015/04/23 14:27:14|DNS|Info|DNSFM|user 140542940722944 - 
 /vobs/ims/ipworks/src/dns/dnsfm_ou/src/IpworksFmInterfaceImpl.cpp:74 finalize. 
 /vobs/ims/ipworks/src/dns/dnsfm_ou/src/IpworksFmInterfaceImpl.cpp74 finalize -finalize successfully!
2015/04/23 14:27:14|DNS|Debug|DNSFM|user 140542940722944 - 
 /vobs/ims/ipworks/src/dns/dnsfm_ou/src/IpworksFmService.cpp:364 run. 
 /vobs/ims/ipworks/src/dns/dnsfm_ou/src/IpworksFmService.cpp:364 run exit the thread.

Example 21   License Management Logs

==================
2015/04/09 00:00:18|DNS|Info|LM|user 139788399494912 - 
 /vobs/ims/ipworks/src/common/coremw_adaptor/lm_adaptor_scc/src/IpworksLmCallbacks.cpp:24 
 operationalModeNotificationCallback. 
 /vobs/ims/ipworks/src/common/coremw_adaptor/lm_adaptor_scc/src/IpworksLmCallbacks.cpp:24 
 operationalModeNotificationCallback >> currentMode:0
2015/04/09 00:00:18|DNS|Warning|LM|user 139788399494912 - 
 /vobs/ims/ipworks/src/common/coremw_adaptor/lm_adaptor_scc/src/IpworksLmService.cpp:212 
 notifyLmChangeToApp. /vobs/ims/ipworks/src/common/coremw_adaptor/lm_adaptor_scc/src/
 IpworksLmService.cpp:212 notifyLmChangeToApp. ⇒
 Local license info is not in a good status! currentLicenseStatus = 5
2015/04/09 00:00:18|DNS|Warning|LM|user 139788399494912 - 
 /vobs/ims/ipworks/src/common/coremw_adaptor/lm_adaptor_scc/src/IpworksLmService.cpp:226 ⇒
 notifyLmChangeToApp. /vobs/ims/ipworks/src/common/coremw_adaptor/lm_adaptor_scc/src/IpworksLmService.cpp:226 
 notifyLmChangeToApp. License Expired! No Service provided! 
2015/04/09 00:00:18|DNS|Info|LM|user 139788399494912 - 
 /vobs/ims/ipworks/src/common/coremw_adaptor/lm_adaptor_scc/src/IpworksLmCallbacks.cpp:72 
 operationalModeNotificationCallback. /vobs/ims/ipworks/src/common/coremw_adaptor/lm_adaptor_scc/src/IpworksLmCallbacks.cpp:72 
 operationalModeNotificationCallback Update License Done!

Example 22   AMF Logs

-------------

2015/04/09 00:15:51|amfwrapper|Trace|AMF_Adaptor|system 140376849086208 - 
 /vobs/ims/ipworks/src/common/coremw_adaptor/amf_adaptor_scc/src/AmfMonitorThread.cpp:251 
 amfHealthCheck. Healthcheck successful
2015/04/09 00:15:51|amfwrapper|Trace|AMF_Adaptor|system 140376849086208 - 
 /vobs/ims/ipworks/src/common/coremw_adaptor/amf_adaptor_scc/src/AmfMonitorThread.cpp:267 
 amfHealthCheck. << saAmfResponse aisRet = 1
2015/04/09 00:16:02|amfwrapper|Trace|AMF_Adaptor|system 140376849086208 - 
 /vobs/ims/ipworks/src/common/coremw_adaptor/amf_adaptor_scc/src/AmfMonitorThread.cpp:242 
 amfHealthCheck. >>
2015/04/09 00:16:02|amfwrapper|Trace|AMF_WRAPPER|TRACE 140376849086208 - 
 /vobs/ims/ipworks/src/common/amfwrapper/amfwrapper_scc/src/AmfObserverImpl.
 cpp:85 doHealthCheck. >>

8   Appendix B: Capturing and Tracing the Messages

8.1   Capturing and Tracing the Access-Request Messages

To capture and analyze the Access-request messages between GGSN node and IPWorks Radius node, do the following:

Note:  
Type whatever you want to filteror search directly in the Filter area.

  1. Capture the authentication/authorization traces between GGSN node and IPWorks Radius node.

    #tcpdump -i sig_data_sp -s 0 port 1812 -w trace20130104_PS1.cap

    trace20130104_PS1.cap is the name of trace file that you want to use to save the captured message.

  2. Download the trace file trace20130104_PS1.cap and open it by the package analyzer-Wireshark.
  3. In Wireshark, analyze the captured message by the following steps:
    1. Filter the string radius.code == 1 to get the number of Access-request messages.
    2. Filter the string radius.code == 2 to get the number of Access-accept messages.
    3. Filter the string radius.code == 3 to get the number of Access-reject messages.

Based on the filter output:

8.2   Capturing and Tracing the Accounting-request Messages

To capture and analyze the Accounting-request messages between GGSN node and IPWorks Radius node, do the following:

Note:  
Type whatever you want to filteror search directly in the Filter area.

  1. Capture the accounting traces between GGSN node and IPWorks Radius node.

    # tcpdump -i bond0 -s 0 port 1813 -w trace20130104_PS1.cap

    trace20130104_PS1.cap is the name of trace file that you want to use to save the captured message.

  2. Download the trace file and open it by the package analyzer-Wireshark.
  3. In Wireshark, analyze the captured message by the following steps:

    Prerequisite: The proxy function is enabled, the interim update function is enabled, and the Disconnection message (DM) is disabled. For how to enable and disable the previous functions, see the following subsections.

    1. Filter the string radius.Acct_Status_Type == 1 to get the number of accounting-start messages.
    2. Filter the string radius.Acct_Status_Type == 2 to get the number of accounting-stop messages.
    3. Filter the string radius.Acct_Status_Type == 3 to get the number of accounting-update messages.
    4. Filter the string radius.code == 5 to get the number of accounting-response messages.

Reference List

Ericsson Documents
[1] IPWorks Manual Health Check.
[2] Glossary of Terms and Acronyms.
[3] Trademark Information.
[4] Typographic Conventions.
[5] Check Alarm Status.
[6] Fault Management.
[7] Data Collection Guideline for IPWorks.
[8] IPWorks Alarm List.
[9] IPWorks Measurement List.
[10] IPWorks Performance Measurements.
[11] Performance Management Report File Format.
[12] View Software Information.
[13] IPWorks DNS, ASDNS, ENUM Parameter Description.
[14] Configure MySQL NDB Cluster.
[15] IPWorks Configuration Management.
[16] View License Information.
[17] Storage Server, MySQL Cluster Node Unreachable.
[18] Create Backup.
[19] Restore Backup.
[20] Managed Object Model (MOM).
[21] Storage Server, MySQL Cluster Node Unreachable.
[22] Storage Server, MySQL Database Unreachable.
[23] Storage Server, The MySQL Replication for Geographic Redundancy Failed.
[24] IPWorks Initial Configuration, 5/1553-AVA 901 33/3 Uen
[25] IPWorks VNF Life Cycle Management, 31/1553-AVA 901 33/3 Uen
[26] CEE Troubleshooting Guideline, 2/1553-AZE 102 01 Uen
[27] COM Advanced Troubleshooting Guideline, 3/154 51-CAA 901 2587/7
[28] Core MW Troubleshooting Guideline, 6/154 51-CAA 901 2624/4
[29] eVIP Advanced Troubleshooting Guideline, 1/154 51-APR 901 0467/3
[30] JavaOaM Troubleshooting Guideline, 1/154 51-APR 901 0487/2
[31] LM Troubleshooting Guideline, 1/154 51-APR 901 0503/5
[32] SS7 CAF Troubleshooting Guideline, 154 51-ANA 901 37
[33] VNF-Lifecycle Manager System Administration Guide, 1543-APR 901 0578 Uen
[34] LDE Scaling User’s Guide, 3/1553-ANA 901 39/4 Uen
Online References
[35] MySQL 5.5 Reference Manual.


Copyright

© Ericsson AB 2017, 2018. All rights reserved. No part of this document may be reproduced in any form without the written permission of the copyright owner.

Disclaimer

The contents of this document are subject to revision without notice due to continued progress in methodology, design and manufacturing. Ericsson shall have no liability for any error or damage of any kind resulting from the use of this document.

Trademark List
All trademarks mentioned herein are the property of their respective owners. These are shown in the document Trademark Information.

    IPWorks Troubleshooting Guideline