1 Introduction
This document describes how to perform the troubleshooting procedure in the Ericsson IPWorks product.
The purpose of this document is to provide information on how to troubleshoot and diagnose problems found in IPWorks. It also describes the available troubleshooting tools and how to use them.
The following procedures are NOT covered in this document:
- Installation and initial configuration instructions.
- Periodic maintenance tasks. For more information, refer to IPWorks Manual Health Check.
- Parameter configurations.
1.1 Prerequisites
This section describes the prerequisites for this document.
This guide is intended for system and network administrators working with Ericsson IPWorks. It is assumed that users of this document are familiar with performing operations within Operation and Maintenance (O&M) in general. The following prior knowledge is required:
- Intermediate Linux skills
- Ericsson Command-Line Interface (ECLI)
- Managed Object Model (MOM) related concepts
- Concepts, terminologies, and telecommunication abbreviations, such as TCP/IP, public data networks, and processor system (SC and PL)
1.1.1 Tools
This section lists the tools that can be used to troubleshoot the IPWorks.
For more information about these tools, see Section 2 Tools.
1.1.2 Conditions
The following conditions must apply:
- An Ericsson Command-Line Interface (ECLI) session in Exec mode is in progress.
- Certain troubleshooting activities can have an impact on node performance. For example, trace or log activation can affect traffic throughput and is not recommended without first consulting Ericsson.
1.2 Related Information
Definition and explanation of acronyms and terminology, trademark information, and typographic conventions can be found in the following documents:
2 Tools
This section describes the tools that can be used to troubleshoot the IPWorks.
2.1 Toolbox
2.1.1 ps
Use the ps command to obtain information about a process:
# ps -ef | grep <name>
Table 1 lists the corresponding name for each IPWorks component. Select the appropriate name from the table. The "Node" column indicates on which node the command is executed.
|
Component |
Name |
Node |
|---|---|---|
|
DNS Server |
named |
Payload |
|
* DNS Server Manager |
ipwdnssm |
Payload |
|
ASDNS Monitor |
asdnsmon |
Payload |
|
* ASDNS Monitor Server Manager |
ipwasdnsmonsm |
Payload |
|
ENUM Server |
ipwenum |
Payload |
|
ipwfesync |
Payload | |
|
ipwa3d |
Payload | |
|
*AAA Server Manager |
aaasm |
Payload |
|
* Storage Server |
ipwss |
System Controller |
|
MySQL NDB Cluster Management Node |
ndb_mgmd |
System Controller |
|
MySQL NDB Cluster Data Node |
ndbmtd |
System Controller |
|
mysqld |
System Controller | |
|
DHCP Server |
dhcpd |
Payload |
|
* DHCP Server Manager |
ipwdhcpv4sm |
Payload |
- Note:
- * denotes a Java process
The appropriate line for the process shows the command (on the right) either starting with the name shown in Table 1 or, for Java processes, starting with java followed by -DApp=<process name> in the java arguments.
For example, to find the pid for the DNS Server Manager:
# ps -ef | grep ipwdnssm | grep -v grep
root 32479 1 0 Mar13 ? 00:53:51 java
-DApp=ipwdnssm -mx128m
-DTCPSTARTPORT=9701 -DTCPENDPORT=9708
-Djboss.server.name=DNS15 -DMULTICASTAD
DRESS=224.0.0.1 -DMULTICASTPORT=15663
-DBIND_INTERFACE_ADDRESS=169.254.43.15
-Djava.net.preferIPv4Stack=true
-classpath /opt/ipworks/sm/scripts:/opt/ipworks
/common/java/ipwcommon.jar:/opt/ipworks/sm/java/ipwsm.jar:/opt/ipworks/common/ja
va/log4j-1.2.15.jar:/opt/ipworks/common/java/ipwse.jar:/opt/ipworks/common/java
/dom4j-1.6.1.jar:/home/mmas/javaoam/lib/shoal-gms-impl-1.5.29.ericsson.7.jar:/
home/mmas/javaoam/lib/javaoam-coremw-spi-R3E05.jar:/home/mmas/javaoam/lib/javaoam
-core-R3E05.jar:/home/mmas/javaoam/lib/grizzly-utils-1.9.24.jar:/home/mmas/javaoam
/lib/grizzly-framework-1.9.24.jar:/opt/ipworks/common/java/AdventNetSnmp.jar:/opt
/ipworks/common/java/AdventNetSnmpAgent.jar ericsson.ipworks.sm.ServerManager
ServerType=DNS
The desired pid is 32479.
2.1.2 ipw-ctr
Users can use ipw-ctr to start, stop, or check the status of IPWorks services (such as SS, DNS, ASDNS, ENUM).
Usage:
ipw-ctr <option> <component> [<hostname>]
For more information about this tool, refer to the section Service Life Cycle Management in IPWorks Configuration Management.
If certain services cannot be stopped by ipw-ctr, use kill command to terminate the process.
2.1.3 kill
For the services that cannot be stopped by ipw-ctr smoothly, try to use the kill command to terminate the processes.
- Note:
- Use ipw-ctr to stop the services after the kill command is executed, because the services are started by AMF automatically when the processes are terminated by the kill command.
Users can stop the process using the kill command as follows:
- Use the ps command as described in Section 2.1.1 to identify the pid of the process.
- Use the kill command to send
a SIGTERM signal to the process as follows:
# kill <pid>
or:
# kill -15 <pid>
or:
# kill -TERM <pid>Each of these commands has the same effect, giving the process an opportunity to terminate gracefully.
- Use the ps command again to check if the process has gone away.
- If the process is still running, use the kill command to send a SIGKILL signal to the process as follows:
# kill -9 <pid>
or:
# kill -KILL <pid>Each of these commands has the same effect, forcing the process to terminate.
2.1.4 rndc
The following table lists the rndc commands for DNS service.
The following commands are executed on the PL nodes on which DNS service is running.
|
Operation |
Shell Command |
|---|---|
|
Reload DNS Configuration |
rndc -s 0 reload |
|
Dump database |
rndc -s 0 dumpdb(1) |
|
Dump statistics |
rndc -s 0 stats |
|
Toggle query logging |
rndc -s 0 querylog |
|
Set debugging level 0 debug-level |
rndc -s 0 notrace rndc -s 0 trace <debug-level>(2) |
(1) If the
data size in cache is too large, it is possible the named process
crashed after running “rndc -s 0 dumpdb”. This is a BUG
of BIND. Before the bug is fixed, if the process is crashed, restart
the DNS process.
(2) Where: <debug-level> is integer ranging from 1 to 99.
2.1.5 named-checkconf
named-checkconf is used to do validation for zone configuration file in path /etc/ipworks/<host_name>/dns on all PL nodes. The <host_name> is the host name of PL node, for example, PL-3.
Here is the example of using named-checkconf to validate the zone configuration file on PL-3 node:
- Go to the location of DNS configuration DB file.
#cd /etc/ipworks/PL-3/dns
- Generate the test report.
#named-checkconf -z named.conf > /tmp/report
- Abstract error message.
#grep -i -e 'error' -e 'unexpected' -e 'unknown option' /tmp/report
|
Error Message |
Actions |
Description |
|---|---|---|
|
named.conf:22: unknown option '.' |
1. Clear the syntax error in the 22nd row of the file named.conf. 2. Use named-checkconf to check if the error still exists. 3. Reload the DNS configuration: #rndc reload |
Named file is located in the file path /etc/ipworks/PL-3/dns. Clear syntax error and check if it still exists. If it is cleared successfully, reload the DNS configuration dynamically. |
|
dns_rdata_fromtext: db.ims.etisalat.ae.Site1_NNIView:27: syntax error zone ims.etisalat.ae/IN: loading from master file db.ims.etisalat.ae.Site1_NNIView failed: syntax error zone ims.etisalat.ae/IN: not loaded due to errors. |
1. Clear the syntax error in the 27th row of the db file db.ims.etisalat.ae.Site1_NNIView. 2. Use named-checkconf to check if the error still exists. 3. Reload the DNS configuration: #rndc reload |
There are some syntax errors in the db file. Clear syntax error and check if it still exists. If it is cleared successfully, reload the DNS configuration dynamically. |
The command returns nothing if there is no error.
2.1.6 MySQL Benchmark Tool
MySQL Benchmark Tool is used to test the Storage Server provisioning rate. For example, see its use in Section 5.3.3.
2.1.7 ifconfig
ifconfig is used to check the status of configured interfaces. For example, see its use in Section 5.7.3.
2.1.8 netstat
netstat is used to check routing and router settings. For example, see its use in Section 5.7.3.
2.1.9 dig
User shall not dig from any SC to the VIP traffic address of PL to verify DNS/ENUM function. Because SC is in OAM subnet and PL is in signaling subnet, these 2 subnets are totally separated.
The Domain Information Groper (dig) is a tool for interrogating DNS servers. It performs DNS queries and displays the answers returned from the DNS servers queried. dig is useful to troubleshoot DNS problems because of its flexibility, ease of use and clarity of output. Other lookup tools tend to have less functionality than dig. Although dig is normally used with command line arguments, it also has a batch mode of operation for reading lookup requests from a file.
For more information, use dig -h command or go to dig man page http://linux.die.net/man/1/dig.
dig utility is commonly used to diagnose DNS problems.
- Note:
- The IPWorks dig utility is installed in /opt/ipworks/dns/usr/bin. The OS provides a native dig utility in /usr/bin.
It is recommend that replacing the native utility as follows if this has not already been done:
# cd /usr/bin
# mv dig dig.orig
Example:
dig @10.0.0.3 recl.example.com
The resulting dig output is as follows:
1 ; <<>> DiG 9.9.8-P2 <<>> @10.0.0.3 rec1.example.com
2 ;; global options: printcmd
3 ;; Got answer:
4 ;; ->>HEADER<<- opcode: QUERY,
status: NOERROR, id: 175
5 ;; flags: qr aa rd ra; QUERY: 1, ANSWER: 1,
AUTHORITY: 1, ADDITIONAL: 0
6
7 ;; QUESTION SECTION:
8 ;rec1.example.com. IN A
9
10 ;; ANSWER SECTION:
11 rec1.example.com. 300 IN A 10.2.3.4
12
13 ;; AUTHORITY SECTION:
14 example.com. 86400 IN NS mydns.example.com.
15
16 ;; Query time: 13 msec
17 ;; SERVER: 10.0.0.3#53(10.0.0.3)
18 ;; WHEN: Thu Dec 29 22:37:43 2005
19 ;; MSG SIZE rcvd: 69
Starting with line 1, dig shows its version and the command arguments given.
Line 4 contains the following header information of the DNS packet that answers our query:
- opcode is the DNS operation. Generally with dig this is "QUERY".
- status is the status of the
answer to our query. This can be as follows:
- NOERROR – The DNS Server found no errors and was able to return an answer.
- FORMERR – The DNS Server found an error in the format of the DNS query packet.
- SERVFAIL – The DNS Server was unable to answer the query. This usually means that there is a configuration error. Most often this is because the DNS Server does not have a list of root servers.
- NXDOMAIN – The DNS Server accepted the query but does not recognize the domain name given.
- NOTIMP – The DNS Server does not implement the operation code in the DNS query.
- REFUSED – The DNS Server received the packet but the client is not allowed query access.
- NOTAUTH – The DNS Server found an error in the TSIG (Transaction Signature) section and refused to process the packet.
- id is the pseudo-random identification number assigned to the packet. This ID is sometimes helpful in tracking queries and their answers.
- flags in line 5 indicates the state of the flag bits in the response as follows:
- QUERY – Indicates how many queries were in the query section. This should always be 1.
- ANSWER – Indicates how many answers were returned for the domain name and query type in the query. This number might be zero if the domain name exists but there are no matching records of the record type given. If there is a match of domain name and query type, this number should be 1 or more depending on the number of records matching the domain name and query type.
- AUTHORITY – The authority section (if the AA bit was set) lists the DNS servers that are authoritative for the zone that contains the answer to the query.
- ADDITIONAL – The additional section contains extra information that may be useful to the query client. In the example, the additional section contains the addresses of the DNS servers listed in the authority section. This avoids clients needing to make a second query if they need more information.
Lines 7 through 14 contain the data in the DNS sections as outlined in line 5.
Line 16 shows the round trip time for processing the query.
Line 17 shows the address of the DNS Server that was queried.
Line 18 shows the date and time of the query.
Line 19 shows the packet size of the DNS response.
2.1.10 mysql
The mysql utility is a command line utility that provides direct access to MySQL databases.
The full pathname of the utility is /usr/local/mysql/bin/mysql.
The user can use mysql to inspect the status and content of the IPWorks databases.
- Note:
- Do NOT use unfamiliar commands or attempt to modify anything unless fully understand the consequences.
Use the following command to start mysql:
# /usr/local/mysql/bin/mysql -P 3307
--protocol=tcp
Welcome to the MySQL monitor. Commands end with ; or \g. Your MySQL connection id is 15 Server version: 5.6.31-ndb-7.4.12-cluster-commercial-advanced-log \ MySQL Cluster Server - Advanced Edition (Commercial) Copyright (c) 2000, 2015, Oracle and/or its affiliates. All rights reserved. Oracle is a registered trademark of Oracle Corporation and/or its affiliates. Other names may be trademarks of their respective owners. Type 'help;' or '\h' for help. Type '\c' to clear the current input statement. mysql>
Use the following command to select a database:
mysql> use <database-name>
For example, to select the Storage Server database:
mysql> use ipworks
Reading table information for completion of table and column names You can turn off this feature to get a quicker startup with -A Database changed mysql>
Use the following command to close mysql and return to the shell prompt:
mysql> exit
Bye #
2.1.11 df
Completed Backup and Restore handling requires large space for the directory /cluster/ipwbrf on the disk. The df tool can be used to check the disk space. It displays the amount of disk space occupied by mounted or unmounted file system, the amount of used and available space, and how much of the file system's total capacity has been used.
For example:
SC-1:~ # df -hl Filesystem Size Used Avail Use% Mounted on /dev/sdb2 20G 2.3G 17G 13% / devtmpfs 32G 8.0K 32G 1% /dev tmpfs 32G 728K 32G 1% /dev/shm tmpfs 32G 339M 32G 2% /run tmpfs 32G 0 32G 0% /sys/fs/cgroup /dev/sdb1 2.0G 125M 1.7G 7% /boot /dev/mapper/lde--cluster--vg-lde--cluster--lv 148G 24G 117G 17% /.cluster /dev/md0p3 99G 1.4G 92G 2% /local/ipworks com_fuse_module 148G 24G 117G 17% /var/filem/nbi_root SC-1:~ # |
2.1.12 trace
The Trace provides the ability to perform subscriber tracing which helps troubleshoot the issues in IPWorks system.
For how to use trace in IPWorks, refer to IPWorks Trace User Guide.
2.2 Alarm and Notification Viewer
- The operators can check active alarms by using ECLI.
For example:
SC-1:~ # /opt/com/bin/cliss
ManagedElement=<Node Name>,SystemFunctions=1,Fm=1
(Fm-1)show
For more information about how to check the active alarms, refer to Check Alarm Status.
- All alarms, including active and cleared alarms, are recorded in the alarm log files. The file location is /cluster/storage/no-backup/nbi_root/AlarmLogs of an active SC. To check alarms, the operators can search the keywords related to specific alarm.
For more information about alarm and notification, refer to Fault Management and IPWorks Alarm List.
2.3 CM Attribute Viewer
There are two methods to view and modify the configuration parameters.
- For the configuration parameters in ECLI, the related MOs are displayed in the ECLI DN column. Operator can navigate to a specific MO to check the configuration parameters.
- For the configuration parameters that cannot be configured in ECLI, internal support or operator can configure them in the related configuration files. The files are listed in the Configuration Files Directories column.
- Note:
- Since /etc/ipworks is a link to /cluster/home/ipworks/etc, you can view all the files in /etc/ipworks on any node.
|
Name |
ECLI DN |
Configuration Files Directories |
|---|---|---|
|
Storage Server |
ManagedElement=<Node Name>,IpworksFunction=1,IpworksCommonRoot=1,StorageServer=1 |
/etc/ipworks/ipworks_ss.conf |
|
Server Manager |
ManagedElement=<Node Name>,IpworksFunction=1,IpworksDnsRoot=1,DnsServer=1,DnsSm=1 ManagedElement=<Node Name>,IpworksFunction=1,IpworksDnsRoot=1,AsdnsServer=1,AsdnsSm=1 ManagedElement=<Node Name>,IpworksFunction=1,IPWorksAAARoot=1,IPWorksAAACommonRoot=1,AAAServerManager=1 |
/etc/ipworks/ipworks_dnssm.conf /etc/ipworks/ipworks_asdnsmonsm.conf /etc/ipworks/ipworks_aaasm.conf |
|
DNS Server |
ManagedElement=<Node Name>,IpworksFunction=1,IpworksDnsRoot=1,DnsServer=1,BindService=1 |
/etc/ipworks/<hostname>/ipworks_dns.conf |
|
ActiveSelect DNS Server |
ManagedElement=<Node Name>,IpworksFunction=1,IpworksDnsRoot=1,AsdnsServer=1 |
/etc/ipworks/<hostname>/ipworks_asdnsmon.conf |
|
ENUM Server |
|
/etc/ipworks/ldapschema/ldap_dictionary.xml |
|
ManagedElement=<Node Name>,IpworksFunction=1,IPWorksAAARoot=1 |
/etc/ipworks/aaa_diameter/* | |
|
MySQL NDB Cluster |
Not Applicable |
|
(1) For the ERH configuration in SS7 signaling manager, refer
to Configure SS7 for ENUM
Number Portability.
2.3.1 Storage Server
Following example shows how to check the configuration parameters of Storage Server by ECLI:
Example 1 Check Configuration Parameters of Storage Server
>show -v ManagedElement=<Node Name>,IpworksFunction=1, IpworksCommonRoot=1,StorageServer=1 StorageServer=1 directory="/cluster/storage/no-backup/ipworks/logs" <default> fileSize=1 <default> filesNumber=3 <default> level=LOG_LEVEL_DISABLE <default> passwordExpiryDays=45 <default> port=17071 <default> securityLog=false <default> storageServerId="1" timelyRotate=DISABLE <default> |
For the other configuration parameters of Storage Server, they are stored in the file /etc/ipworks/ipworks_ss.conf.
Storage Server AMF wrapper configuration parameters are stored in the file /opt/ipworks/ss/etc/ss_wrapper.conf. The Storage Server AMF log directory, log name, log level can be configured here.
2.3.2 DNS Server Manager
Following examples show how to check the configuration parameters of DNS Server Manager by ECLI:
Example 2 Check Configuration Parameters of DNS Server Manager
>show -v ManagedElement=<Node Name>,IpworksFunction=1, IpworksDnsRoot=1,DnsServer=1,DnsSm=1 DnsSm=1 dnsSmId="1" ssAddress="ipw_ss" <default> ssPassword="<Encrypted Password>" ssUserName="admin" <default> DnsSmLog=1 |
For the other configuration parameters of Server Managers, they are stored in the files /etc/ipworks/ipworks_*sm.conf. The files contain the Server Manager properties that are used most often, where the * stands for dns or asdnsmon.
The file /opt/ipworks/sm/confs/ipworks_sm_defaults.conf contains the default values for properties used for all the Server Managers that are installed on a machine. It is stored on the board where the DNS is installed. This file is changed only rarely.
2.3.3 DNS Server
Following example shows how to check the configuration parameters of DNS by ECLI:
Example 3 Check Configuration Parameters of DNS Server
>show –v ManagedElement=<Node Name>,IpworksFunction=1,IpworksDnsRoot=1, DnsServer=1,BindService=1 BindService=1 asdnsGrpDiff=BIND_ASDNS_GRP_ENABLE_DIFF_1 <default> bindServiceId="1" debugLogLevel=1 <default> queryLogging=false <default> securityLog=false <default> DnsLog=1 DnsTransLog=1 |
For the other configuration parameters of DNS server, they are stored in the file /etc/ipworks/<hostname>/ipworks_dns.conf.
2.3.4 ActiveSelect DNS Server
Verify the ActiveSelect DNS Server configuration files have been properly exported and are in the correct location. The default path of ActiveSelect DNS server configuration file is /etc/ipworks/<hostname>/ipworks_asdnsmon.conf.
Check the ActiveSelect DNS configuration file, ipworks_asdnsmon.conf for the DNS Server to ensure that the return counts for the ActiveSelect DNS Sites are not limiting the number of returned addresses. Also, confirm that the Prefer Statements are properly configured.
2.3.5 ENUM Server
Following example shows how to check the configuration parameters of ENUM by ECLI:
Example 4 Check Configuration Parameters of ENUM Server
>ManagedElement=<Node Name>,IpworksFunction=1,IpworksDnsRoot=1, IpworksEnumRoot=1,EnumServer=1 (EnumServer=1)>show -v EnumServer=1 dbConnectString="SC-1:1186" <default> dbConnectStringSecondary="SC-2:1186" <default> dnsResolver=true <default> dnsResolverIPAddress="127.0.0.1" <default> <read-only> dnsResolverPort=5300 <default> enumServerId="1" ipv4Address="0.0.0.0" <default> ipv6Address="::" <default> port=53 <default> securitylog=false <default> threadCount=50 <default> Erh=1 Log=1 |
Following example shows how to check the configuration parameters of ERH by ECLI:
Example 5 Check Configuration Parameters of ERH
>ManagedElement=<Node Name>,IpworksFunction=1,IpworksDnsRoot=1, IpworksEnumRoot=1,EnumServer=1,Erh=1 (Erh=1)>show -v Erh=1 discardErhFailure=false <default> erhId="1" ldap=true MAPRespNumberFormat=COUNTRYCODEWITHDASHSEC <default> nxdomainForNonPortedNumber=true <default> rcseInterConnect=false <default> teTimer=30 <default> ErhLdap=1 ErhSs7=1 |
Following example shows how to check the configuration parameters of ENUM FE by ECLI:
Example 6 Check Configuration Parameters of ENUM FE
>ManagedElement=<Node Name>,IpworksFunction=1,IpworksDnsRoot=1,IpworksEnumRoot=1, EnumFE=1 (EnumFE=1)>show -v EnumFE=1 enableEnumDnSchedCache=false <default> enableEnumFE=true enumDnRangeExpiration=7 <default> enumDnSchedExpiration=7 <default> enumFEId="1" handleLDAPFailure=NXDOMAIN <default> EnumFELog=1 |
Following example shows how to check the CUDB connection with ENUM server by ECLI:
Example 7 Check CUDB Connection with ENUM Server
>ManagedElement=<Node Name>,IpworksFunction=1,IpworksCommonRoot=1, DataBaseInfo=1,CudbManager=1,CudbServiceSite=ENUM,CudbSiteManager=1, CudbSite=<CudbSite Name>,CudbNode=<CudbNode Name> (CudbNode=1)>show -v CudbNode=<CudbNode Name> address="192.168.20.14" cudbNodeId="1" <default> distinguishedName="cudbUser=ENUMUser,ou=admin,dc=ericsson,dc=com" password="1:gliG5ALpb/AiV+hl2cd89uNRnnnCZCR7" poolSize=400 <default> port=389 <default> |
Following example shows how to check the CUDB connection with ERH module by ECLI:
Example 8 Check CUDB Connection with ERH Module
>ManagedElement=<Node Name>,IpworksFunction=1,IpworksCommonRoot=1, DataBaseInfo=1,CudbManager=1,CudbServiceSite=NP,CudbSiteManager=1, CudbSite=<CudbSite Name>,CudbNode=<CudbNode Name> (CudbNode=1)>show -v CudbNode=<CudbNode Name> address="192.168.20.14" cudbNodeId="1" <default> distinguishedName="cudbUser=ERHUser,ou=admin,dc=ericsson,dc=com" password="1:gliG5ALpb/AiV+hl2cd89uNRnnnCZCR7" poolSize=400 <default> port=389 <default> |
2.3.6 AAA Server
Following example shows how to check the configuration parameters of AAA by ECLI:
Example 9 Check Configuration Parameters of AAA Server
>show –v –r ManagedElement=<Node Name>,IpworksFunction=1,IPWorksAAARoot=1
IPWorksAAARoot=1
ipworksAAARootId="1" <default>
IPWorksAAACommonRoot=1
ipworksAAACommonRootId="1" <default>
AAAServer=PL-3
aaaServerId="PL-3"
...
|
2.3.7 AAA Server Manager
Following example shows how to check the configuration parameters of AAA Server Manager by ECLI:
Example 10 Check Configuration Parameters of AAA Server Manager
>show -v ManagedElement=<Node Name>,IpworksFunction=1,IPWorksAAARoot=1, IPWorksAAACommonRoot=1,AAAServerManager=1 AAAServerManager=1 aaaServerManagerId="1" directory="/cluster/storage/no-backup/ipworks/logs" <default> fileSize=10 <default> filesNumber=10 <default> level=LOG_LEVEL_DEBUG timelyRotate=DISABLE <default> |
2.3.8 AAA Load Unbalanced in eVIP Scenario
Under normal situation, eVIP distributes the connection number to every PayLoad equally.
If one of the Payloads is down, all the connections will be automatically distributed to the other payload. Once the down Payload is recovered, the connection will not recover automatically. You must manually disconnect and re-establish the connection to make the connection number in every payload is nearly equal.
Check if connection number of PL-3 and PL-4 is nearly equal.
- For connection over TCP, use below command to check
connection number:
#netstat -apn|grep 3868
Example output:
tcp 0 0 10.175.161.115:50439 10.170.19.49:3868 ESTABLISHED 57284/titansim tcp 0 0 10.175.161.115:50428 10.170.19.49:3868 ESTABLISHED 57262/titansim tcp 0 0 10.175.161.115:50425 10.170.19.49:3868 ESTABLISHED 57256/titansim tcp 0 0 10.175.161.115:50437 10.170.19.49:3868 ESTABLISHED 57280/titansim tcp 0 0 10.175.161.115:50426 10.170.19.49:3868 ESTABLISHED 57258/titansim tcp 0 0 10.175.161.115:50435 10.170.19.49:3868 ESTABLISHED 57276/titansim
If the connection number of PL-3 and PL-4 is not close to equal, rebalance connection number by disconnecting some connections or all connections on the Payload which have more connection number.
2.3.9 MySQL NDB Cluster
Management Node
Configuration parameters for the MySQL NDB Cluster Management Node are stored in the file /etc/ipworks/mysql/confs/ipworks_mgm_conf. Both NDB cluster Active-Active Management Nodes share the same .conf file.
Data Node
Configuration parameters for the MySQL NDB Cluster Data Node are stored in file /etc/ipworks/mysql/confs/ipworks_datanode_my.conf.
SQL Node
Configuration parameters for the MySQL NDB Cluster SQL Node are stored in file /etc/ipworks/mysql/confs/ipworks_sqlnode.conf. All SQL Nodes share the same .conf file.
2.4 Performance Management Viewer
For more information about how to check performance measurements, refer to IPWorks Performance Measurements.
3 Troubleshooting Functions
This section describes the troubleshooting functions.
3.1 Alarm
ECLI is the tool for product that shows all active alarms.
Example 11 Show Active Alarms
# /opt/com/bin/cliss >ManagedElement=<Node Name>,SystemFunctions=1,Fm=1 (Fm=1)>show FmAlarm=397 FmAlarm=397 activeSeverity=MINOR additionalText="Agent 169.254.43.15 reports node 192.168.10.201 down" eventType=COMMUNICATIONSALARM lastEventTime="2015-03-03T01:54:22+01:00" majorType=193 minorType=851974 originalAdditionalText="Agent 169.254.43.15 reports node 192.168.10.201 down" originalEventTime="2015-03-03T01:54:22+01:00" originalSeverity=MINOR probableCause=342 sequenceNumber=397 source="ManagedElement=<Node Name>,SystemFunctions=1,Fm=1,FmAlarmModel=ipworksDns, FmAlarmType=ipworksDnsServASDNSNodeDown,HostName=PL-3,Node=192.168.10.201" specificProblem="DNS, ASDNS Node down" |
Also, the operator can check the alarm status by referring to Check Alarm Status.
All alarms, including active and cleared alarms, are recorded in alarm logs recorded in folder: /cluster/storage/no-backup/nbi_root/AlarmLogs on SC nodes.
For more information about the IPWorks alarms, refer to IPWorks Alarm List.
3.2 Logging
This section describes the event logs for the product.
3.2.1 Error Log File Type
Not applicable.
3.2.2 Application-specific Logs
|
Log Directory |
Description |
|---|---|
|
/storage/no-backup/ipworks/logs/(1) |
IPWorks Service and AMF wrapper logs. |
|
/storage/no-backup/coremw/var/log(1) |
Core MW logs. |
|
AMF logs. | |
|
var/log/messages |
Linux OS, kernel logs |
|
OpenSaf, CLM, COM, CMW, SMF, IMM, AMF, FM, JavaOam log, BRF, NTP, RPM, etc. logs | |
|
IPWorks scripts logs (for example, amf, brf, tools, installation, initial configuration) | |
|
/local/ipworks/mysql-cluster/(2) |
MySQL NDB Cluster logs |
(1) /storage folder is a link to /cluster/storage,
you can view the log files on any node.
(2) The log files under /local/ipworks/mysql-cluster are stored only on SC node.
3.2.3 Storage Server
The Storage Server writes logging information to the file /cluster/storage/no-backup/ipworks/logs/<hostname>/ipworks_ss_<hostname>.log.
The Storage Server appends logging information to the existing log file. When user checks log files, it is recommended to start from the end of the file.
3.2.3.1 File I/O Error
A File I/O Exception is thrown for log files, when user starts Storage Server as a non-root user:
File "logfile" I/O Error: /storage/no-backup/ipworks/logs/<hostname>/ipworks_ss_<hostname>.log (Permission denied)
File I/O Exception is thrown for audit log file when user logon to the CLI as a non-root user.
File "auditlogfile" I/O Error: /var/ipworks/logs/security/ipworks_ss_security Oct 05.audit (Permission denied)
Ensure that user is logon with root privileges to avoid these exceptions.
3.2.4 Server Manager
The Server Manager can be configured to use debug logging. By default, Server Manager log is disabled, it can be enabled by using ECLI. For details, see Section 3.6.
The Server Manger logs are stored in the file /storage/no-backup/ipworks/logs/<host-name>/<*>sm.log.
Where: <*> is the dns , asdnsmon, or aaasm.
3.2.5 DNS Server
To help resolve problems with the DNS Server, inspect the log files of server, either directly on the server system or through the IPWorks CLI.
The DNS Server log events use the syslog utility and can log events to log files. By default, major events are written through the syslog utility, though other events can be added. The default path is /var/log/messages.
Following example shows how to enable the debug logging for DNS server:
Example 12 Enable Debug Logging for DNS Server
#/opt/com/bin/cliss #config (config)>ManagedElement=<Node Name>,IpworksFunction=1, IpworksDnsRoot=1,DnsServer=1,BindService=1,debugLogLevel=<number> (config)>ManagedElement=<Node Name>,IpworksFunction=1, IpworksDnsRoot=1,DnsServer=1,BindService=1, DnsLog=1,level=DNS_LOG_LEVEL_DEBUG (config-DnsLog=1)>commit |
Where: <number> represents the granularity of debug logging information. Refer to the attribute debugLogLevel in the MO BindService for details.
- Note:
- By default, DNS transaction log is enabled.
The DNS server opens a log file, ipworks_dns.log, in the configured log directory (/cluster/storage/no-backup/ipworks/logs/), if the debug level is DNS_LOG_LEVEL_DEBUG. The log directory is read-only.
There are also amf wrapper and coremw related logs recorded in /cluster/storage/no-backup/coremw/var/log/. In default, the logs are enabled.
3.2.6 ActiveSelect DNS Server
Check the ActiveSelect DNS (ASDNS) Monitor log file, ipworks_asdnsmon.log for errors. The default path is /cluster/storage/no-backup/ipworks/logs/.
Check the status for a given address using the ipworks_asdnsmon.log file.
Check the ipworks_asdnsmon_trans.log that tracks the transaction events regarding ASDNS monitor.
coremw related log is enabled by default. It is located in /cluster/storage/no-backup/coremw/var/log/.
Check the DNS Server log file, ipworks_dns.log for the following two messages:
datagram from [ASDNS Monitor IP Address].port ns_req: TSIG verify failed - BADSIG (16)
If this message is displayed, there is a mismatch in the TSIG key being used and thus messages from the ASDNS Monitor are not being processed. Use the IPWorks CLI to correct the configuration.
3.2.6.1 ActiveSelect DNS Monitor Log Files
To help resolve problems with the ASDNS Monitor, trace the activity by inspecting the monitor log files.
The IPWorks ASDNS Monitor logs events to the syslog utility and log files. By default, major events are written to the syslog utility. For details about the syslog utility, see the syslog(3C) manual page.
By default, the ASDNS Monitor log is disabled as logging consumes CPI and disk resources.
Following example shows how to enable the logging for ASDNS Monitor:
Example 13 Enable Logging for ASDNS Monitor
#/opt/com/bin/cliss #config (config)>ManagedElement=<Node Name>,IpworksFunction=1,IpworksDnsRoot=1, AsdnsServer=1,asdnsMonitor=1,AsdnsMonLog=1,level=LOG_LEVEL_DEBUG (config-AsdnsMonLog=1)>commit |
- Note:
- The ASDNS transaction log is enabled by default.
The ASDNS Monitor opens a file, ipworks_asdnsmon.log, in the log directory (/cluster/storage/no-backup/ipworks/logs).
For more information about the ASDNS Monitor Log and ASDNS Monitor Transaction Log, refer to AsdnsMonLog and AsdnsMonTransLog in Managed Object Model (MOM).
3.2.6.2 ActiveSelect DNS Monitor System Logs
The system log is the primary location where operational problems with the ASDNS Monitor are logged. It is important to monitor the system log (on the host where the monitor is running) for errors or problems. The path of the system log file is /var/log/messages.
For logs generated by coremw that is related to ASDNS Monitor, is recorded in the directory /cluster/storage/no-backup/coremw/var/log/<PL hostname>/asdns_coremw.log. It is enabled in default.
When errors are displayed, the messages in general describe the error and most prevents the monitor from running. The monitor may run with a partially successful configuration file, so it is important to check the log messages and not simply assume that the configuration is correct if the monitor is running.
- Note:
- In the log file, the ASDNS Monitor identifies itself as dagent. For example:
Feb 28 09:46:35 dagent started - this version compiled 01:14:27 Apr 21 2003
The following table lists the error messages generated by the asdnsmon daemon:
|
Error |
Description |
|---|---|
|
exec failed for script error-message |
This error indicates that the monitor failed to start the script and the error message should provide information as to why it failed. |
|
can’t send to dns: error |
An error was encountered while trying to send load information to a DNS Server. |
|
exec failed for command: error |
An error was encountered when trying to run the command configured for a monitor script. |
|
unable to locate target for fd number, pid |
A temporary error condition when processing the exit status of a monitor command. If this often occurs, review the scripts used. |
|
target name failed to complete |
A previous monitor load sample had failed to complete by the time the next sample was measured. It may be the service is down or that the interval specified is too short. |
|
too many processes for name |
Too many monitoring processes have been created. This may be because they are not completing because of the short interval between checks, or they are not able to detect an error condition quickly enough and return the error condition. |
|
can’t fork in create_child: error can’t dup errno: error |
These are errors in creating monitoring processes. Contact product support. |
|
unable to open pidfile file: error |
The file where the monitor process ID is maintained cannot be created. Typically this is because the monitor process has not been started as root. |
|
select error: error |
This is a fatal runtime error that can be caused by problems with the network layer. |
|
error setting priority: error |
The monitor was unable to change its priority, typically because it was not run as root. |
|
can’t malloc entity can’t get mem in function |
These are fatal runtime error messages that indicate there is no more memory is available. Perhaps too many resources are being monitored by this monitor. |
3.2.7 ENUM Server
The error log file of ENUM server (including ERH over LDAP), ERH over SS7, and ENUM FE Sync are stored in /cluster/storage/no-backup/ipworks/logs/<hostname>, the log file is named as ipwenum.log.x and ipworks_fesync.log.x respectively.
The ENUM server, the ERH module, and ENUM FE Sync automatically start a new error log file after a configurable period or when the current file reaches a configurable size. Take ENUM error log file as an example, it retains a configurable number of previous versions of the file with names ipwenum.log.<n>, where n is the number of the log file. The user can configure the number of files retained, and the size and time limits except the directory path using the ECLI.
3.2.8 AAA Server
The AAA Server writes logging information under the directory /cluster/storage/no-backup/ipworks/logs/<PL hostname>/aaa_diameter_server.log.
To help resolve problems with the AAA Server, inspect the server’s log files, refer to the Section EPC AAA in Data Collection Guideline for IPWorks.
3.2.9 MySQL NDB Cluster
The MySQL NDB Cluster writes logging information under the directory /local/ipworks/mysql-cluster/.
3.2.10 Backup and Restore
The Backup and Restore handling writes logging information under the directory /cluster/storage/no-backup/ipworks/logs/<hostname>/ipwbrf.log.
3.2.11 Scaling
IPWorks application scaling writes logging information under SC-1/SC-2 log file /var/log/message.
LDE scaling writes logging information under SC-1/SC-2 log file /var/log/message.
CoreMW scaling writes logging information under SC-1/SC-2 folder /var/opt/coremw/clustermonitor files clustermonitor.log*.
SS7CAF scaling writes logging information under SC-1/SC-2 folder /opt/sign/log files ss7caf_scaling.log*.
3.3 Core Dumps
This section describes how to troubleshoot with core dump.
A core dump is a file containing a process's address space (memory) when the process terminates unexpectedly. Core dumps may be produced on-demand (such as by a debugger), or automatically upon termination. Core dumps are triggered by the kernel in response to program crashes, and may be passed to a helper program (such as systemd-coredump) for further processing. Core dumps may be useful for developers to debug program crashes, however they are practically useless to the average user, and have been largely obsoleted by modern debuggers.
3.3.1 Locating Core File
Normally the core dump files are stored in the directory /cluster/dumps/.
3.3.2 Core Dump Limitation
By default, there is no limitation for core dump files. This limitation can be checked by ulimit –c. If the operator wants to set the limitation, use ulimit –c 1024k, and change it back to default by using ulimit –c unlimited.
3.3.3 Defining Name of Core Dump File
To define name of core dump files, do the following:
- In the configuration file /etc/sysctl.conf, navigate to the parameter kernel.core_pattern, and define a template that is used to name core dump files.
The template can contain % specifiers which are substituted by the following values when a core file is created:
%% a single % character %p PID of dumped process %u (numeric) real UID of dumped process %g (numeric) real GID of dumped process %s number of signal causing dump %t time of dump, expressed as seconds since the Epoch, 1970-01-01 00:00:00 +0000 (UTC) %h hostname %e executable filename (without path prefix) %c core file size soft resource limit of crashing process (since Linux 2.6.24)
The default value is kernel.core_pattern = /cluster/dumps/%e.%p.%h.core.
- Execute the command sysctl –p to take effect without rebooting.
3.3.4 Analyzing Core Dump File
Analyze the core dump file to find the cause of abnormal crash. Before performing the following steps, users must install the tool gdb first.
For example, if a core dump file CoreDumpFile is found under /cluster/dumps.
- Find which service crashed and which specific binary file
generates the core dump files.
- Go to the directory /cluster/dumps.
Example:
SC-1:~ # cd /cluster/dumps
- List the core dump files.
Example:
SC-1:~ # ls -lrt *.core*
-rw------- 1 root root 140431360 Mar 20 03:00 named.12161.PL-3.core
Where: the named.12161.PL-3.core is the core dump file.
- Based on the dump file, determine what process or service
(such as DNS) crashed and what binary file generates the core dump
file accordingly.
Example:
SC-1:~ # file named.12161.PL-3.core
named.12161.PL-3.core: ELF 64-bit LSB core file x86-64, version 1 (SYSV), SVR4-style, from '/opt/ipworks/dns/usr/bin/named -f'
From the command output, the segment dns indicates that the DNS server crashed and the binary file named in the directory /opt/ipworks/dns/usr/bin generates the core dump file.
- Go to the directory /cluster/dumps.
- Save or back up the following proof files:
- The core dump file like named.12161.PL-3.core in the directory /cluster/dumps.
- The binary file like named in the directory /opt/ipworks/dns/usr/bin.
- The log files in the directory /cluster/storage/no-backup/ipworks/logs.
- Use the tool gdb to analyze
the reason why the process crashed.
Example:
PL-3:~ # gdb /opt/ipworks/dns/usr/bin/named named.12161.PL-3.core
- Use command bt or where in GDB to view the called and calling stack
of the thread that caused the crash.
(gdb) bt
Or
(gdb) where
Example:
#12 0x00007fdd0bfa3563 in LmServerProxy::connectToLmServer() () from /usr/lib64/liblmcba64.so #13 0x00007fdd0bfa3616 in LmServerProxy::handleConnectionLoss() () from /usr/lib64/liblmcba64.so #14 0x00007fdd0bfa48f6 in LmServerProxy::connectionLossThreadFunction(void*) () from /usr/lib64/liblmcba64.so #15 0x00007fdd0bd687f6 in start_thread () from /lib64/libpthread.so.0 #16 0x00007fdd0b84b09d in clone () from /lib64/libc.so.6
- Use the following command to view status of all threads
in the same process.
(gdb) thread apply all bt
Example:
Thread 17 (Thread 0x7fdd0e007720 (LWP 12161)): #0 0x00007fdd0b7a2f6b in sigsuspend () from /lib64/libc.so.6 #1 0x0000000000640ad1 in isc__app_ctxrun () #2 0x0000000000640b89 in isc__app_run () #3 0x0000000000424770 in main () Thread 16 (Thread 0x7fdd0793c700 (LWP 12170)): #0 0x00007fdd0bd6c65c in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 #1 0x000000000066e55b in timer_thread_handler (arg=<optimized out>) at /vobs/ims/ipworks/src/common/c_common/c_common_scc/src/ipworks_timer.c:177 #2 0x00007fdd0bd687f6 in start_thread () from /lib64/libpthread.so.0 #3 0x00007fdd0b84b09d in clone () from /lib64/libc.so.6 #4 0x0000000000000000 in ?? () Thread 15 (Thread 0x7fdd0450f700 (LWP 12176)): #0 0x00007fdd0bd6c65c in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 #1 0x0000000000672403 in PmUploaderThread::run (this=0xac2410) at /vobs/ims/ipworks/src/common/coremw_adaptor/pm_adaptor_scc/src/PmUploaderThread.cpp:71 #2 0x00007fdd0d3b6213 in ipworks::Thread::loop (this=0xac2410) at /vobs/ims/ipworks/src/common/cpp_common/cpp_common_scc/src/Thread.cpp:56 #3 0x00007fdd0c82e5e3 in thread_proxy () from /opt/ipworks/common/usr/lib/libboost_thread.so.1.54.0 #4 0x00007fdd0bd687f6 in start_thread () from /lib64/libpthread.so.0 #5 0x00007fdd0b84b09d in clone () from /lib64/libc.so.6 #6 0x0000000000000000 in ?? ()
- Use command bt or where in GDB to view the called and calling stack
of the thread that caused the crash.
- Note:
- If users have not installed the GDB, install it first. Or users can ask for support to analyze the core dump files, binary files, and logs. The most important thing is that these proof files must be taken care of.
3.4 Performance Measurements
Generation of the performance measurements by the IPWorks is another way to get useful information when troubleshooting a problem.
The performance management report files are generated in 3GPP compliant XML format and can be transferred outside the system for post processing.
For more information about file format, refer to Performance Management Report File Format.
For more information about the performance measurements, refer to IPWorks Measurement List.
3.5 Software Version Checks
Check the software version on IPWorks. For details, refer to View Software Information.
3.6 Log Level Changes
|
Server Name |
Operation |
Comments |
|---|---|---|
|
Storage Server |
#/opt/com/bin/cliss #config (config)>ManagedElement=<Node Name>,IpworksFunction=1, IpworksCommonRoot=1,StorageServer=1,level=<Log level> |
Where: <Log Level> specifies the log level for Storage Server. For more information, refer to level in Managed Object Model (MOM). Note: Changing log level of Storage Server to higher levels of detail might result in large log file that affects the performance of the server. Therefore, it needs to be changed only when there is a problem, and to be changed back once the problem is resolved. |
|
DNS Server |
Change DNS Debug Log Level: #/opt/com/bin/cliss #config (config)>ManagedElement=<Node Name>,IpworksFunction=1, IpworksDnsRoot=1,DnsServer=1,BindService=1,debugLogLevel=90 (config)>commit |
Where: debugLoglevel can be any vaule of 1-99. For more information, refer to the attribute debugLogLevel in Managed Object Model (MOM). |
|
Change DNS Log Level: #/opt/com/bin/cliss #config (config)>ManagedElement=<Node Name>,IpworksFunction=1, IpworksDnsRoot=1,DnsServer=1,BindService=1,DnsLog=1, level=<Log level> (config-DnsLog=1)>commit |
Where: <Log Level> specifies the log level for DNS server. It can be DNS_LOG_LEVEL_DEBUG or DNS_LOG_LEVEL_DISABLE . For more information, refer to the attribute level in the class DnsLog in Managed Object Model (MOM). | |
|
ASDNS Monitor |
#/opt/com/bin/cliss #config (config)>ManagedElement=<Node Name>,IpworksFunction=1, IpworksDnsRoot=1,AsdnsServer=1,AsdnsMonitor=1, AsdnsMonLog=1,level=<Log Level> (config-AsdnsMonLog=1)>commit |
Where: <Log Level> specifies the log level for ASDNS Monitor. For more information, refer to level in class AsdnsMonLog in Managed Object Model (MOM). |
|
DNS/ASDNS SM |
#/opt/com/bin/cliss >ManagedElement=<Node Name>,IpworksFunction=1, IpworksDnsRoot=1,**Server=1, **Sm=1, **SmLog=1 (**SmLog=1)> config (config-**SmLog=1)>level=<Log level> (config-**SmLog=1)>timelyRotate=<Timely rotation> (config-**SmLog=1)>commit |
Where:
|
|
ENUM Server |
#/opt/com/bin/cliss >ManagedElement=<Node Name>,IpworksFunction=1, IpworksDnsRoot=1,IpworksEnumRoot=1,EnumServer=1,Log=1 (Log=1)>configure (config-Log=1)>level=<Log Level> (config-Log=1)>commit |
Where: <Log Level> specifies the log level for ENUM server. For more information about logging level, refer to IpworksLogLevel in Managed Object Model (MOM). The changes dynamically take effect. |
#/opt/com/bin/cliss >ManagedElement=<Node Name>,IpworksFunction=1, IpworksDnsRoot=1,IpworksEnumRoot=1,EnumFE=1,EnumFELog=1 (EnumFELog=1)>configure (config-EnumFELog=1)>level=<Log Level> (config-EnumFELog=1)>commit | ||
#/opt/com/bin/cliss >ManagedElement=<Node Name>,IpworksFunction=1, IpworksDnsRoot=1,IpworksEnumRoot=1,EnumServer=1,Erh=1, ErhLdap=1,Log=1 (EnumFELog=1)>configure (config-EnumFELog=1)>level=<Log Level> (config-EnumFELog=1)>commit Note: The log configuration of ERH over LDAP is obsoleted, it is merged into EnumServer log configuration. | ||
#/opt/com/bin/cliss >ManagedElement=<Node Name>,IpworksFunction=1, IpworksDnsRoot=1,IpworksEnumRoot=1,EnumServer=1,Erh=1, ErhSs7=1,Log=1 (EnumFELog=1)>configure (config-EnumFELog=1)>level=<Log Level> (config-EnumFELog=1)>commit | ||
#/opt/com/bin/cliss >ManagedElement=<Node Name>,IpworksFunction=1, IPWorksAAARoot=1,IPWorksAAACommonRoot=1, AAAServer=<PL hostname>,LogManagement=1, IPWorksLog=AAA_DIAMETER_SERVER (IPWorksLog=AAA_DIAMETER_SERVER)>configure (config-IPWorksLog=AAA_DIAMETER_SERVER)>level=<Log Level> (config-IPWorksLog=AAA_DIAMETER_SERVER)>commit |
Where: <Log Level> specifies the log level for EPC AAA Server. For more information about logging level, refer to IpworksLogLevel in Managed Object Model (MOM). The changes dynamically take effect. | |
|
AAA Server Manager |
#/opt/com/bin/cliss >ManagedElement=<Node Name>,IpworksFunction=1, IPWorksAAARoot=1,IPWorksAAACommonRoot=1, AAAServerManager=1 (AAAServerManager=1)>configure (config-AAAServerManager=1)>level=<Log Level> (config-AAAServerManager=1)>commit | |
|
MySQL NDB Cluster |
Not Applicable. |
For changing the log level for MySQL NDB Cluster, refer to MySQL online reference. |
3.7 Restart
Use the command ipw-ctr restart <component> to restart IPWorks components. For more information, refer to the section Service Life Cycle Management in IPWorks Configuration Management.
3.8 Server Status Checks
Table 8 lists which methods can be used to check the server status:
|
Server |
Methods | |||||
|---|---|---|---|---|---|---|
|
ipw-ctr(1) |
ipwcli(2) |
ps(3) |
rndc(4) |
dig(5) |
Script(6) | |
|
Storage Server |
√ |
√ |
||||
|
MySQL NDB Cluster |
√ |
√ | ||||
|
√ |
√ |
√ |
√ |
√ |
||
|
√ |
√ |
|||||
|
√ |
√ |
√ |
||||
|
√ |
√ |
|||||
|
√ |
√ |
√ |
||||
|
√ |
√ |
|||||
|
√ |
√ |
|||||
|
√ |
√ |
|||||
(1) Use ipw-ctr status <component> <hostname>. For details, see Section 2.1.2.
(2) Use show status in the IPWorks CLI. For more information, refer to Command Line Interface
User Guide for IPWorks SS.
(3) Use ps -ef | grep <process
name> to check if the Server process is running.
Check Section 2.1.1 for details.
(4) Use the rndc status command for more detailed status.
(5) Use dig or another
query utility to send a query to the server to monitor that each configured
zone is loaded. For more information, see Section 2.1.9.
(6) For details, refer to the section Showing Status of MySQL NDB Cluster in Configure MySQL NDB Cluster.
(7) If DNS SM is not running, DNS server cannot
be updated from IPWCLI. After IPWorks is installed, DNS SM is not
started.
(8) If ASDNS SM is not running, ASDNS monitor
cannot be updated from IPWCLI. After IPWorks is installed, ASDNS SM
is not started.
(9) If AAA SM is not running, AAA server status
cannot be received from IPWCLI. After IPWorks is installed, AAA SM
is not started.
3.9 IPWorks Common Component
Table 9 lists the links to the Common Components troubleshooting. These Common Components are used by IPWorks software and are provided by Ericsson middleware department. The related detail troubleshooting guides can be found in their own CPI document.
|
IPWorks Common Component |
Troubleshooting Guide Link |
|---|---|
|
COM |
|
|
Core MW |
|
|
JavaOam |
|
|
LM (License Management) |
|
|
SS7 CAF |
- Note:
- The common components without troubleshooting guide are not listed here.
4 Troubleshooting Procedure
Troubleshooting a problem might require the use of one or more functions described in Section 3. To assure an efficient location of the fault, user can do the following:
- Check the alarms and notifications.
- Check licenses.
- Check the performance management measurements.
- Check the logs.
- Check the server status.
- Check the configuration files.
- Start tracing.
- Check available information owing to capsule abortion/core dumps.
- Collect information.
- Check already reported troubles (CSRs).
- If writing a CSR, check software version and level.
- Consult the next level of maintenance support.
A troubleshooting workflow is shown in Figure 1.
5 Problem-Solving Procedure
5.1 IPWorks VNF Stack Deployment
This section provides information on resolving problems during IPWorks VNF stack deployment.
For more information about CEE related troubleshooting, refer to CEE Troubleshooting Guideline.
5.1.1 Server Groups Forbidden
5.1.1.1 Trouble Symptoms
When you try to launch IPWorks VNF HEAT stack, it fails with the "CREATE_FAILED" stack status, and the reason is "Quota exceeded, too many server groups."
$openstack stack show <Stack Name or ID>
For example:
$openstack stack show ipw6a .... | parent | None | | stack_name | ipw6a | | stack_owner | admin | | stack_status | CREATE_FAILED | | stack_status_reason | Resource CREATE failed: Forbidden: | | resources.pl34_server_group: Quota exceeded, too many | | | server groups. (HTTP 403) (Request-ID: req-acd057df- | | | 83b1-44e1-84c8-a55e7021b1c8) | | stack_user_project_id | 3f8143c8366e45e09083edf4e6845791 | | template_description | IPWorks Stack for CEE HEAT (08-01-2016) | | timeout_mins | None | | updated_time | None | +-----------------------+--------------------------------------------------------+
5.1.1.2 Locating Fault
For the default Atlas configuration, the Quota info may not be enough to deploy the IPWorks. In this case, users must increase the resource limitation in quota to make sure that the IPWorks resource can be created successfully:
- Log on to the Atlas with the tenant user with admin role.
- Source tenant user environment.
$source openrc
- Note:
- If you use the new created tenant user, create a new openrc (refer to the format in /home/atlasadm/openrc) for the new user, and then source it.
- Verify if the tenant user environment is correct.
$nova list
$nova quota-show
$neutron quota-show
- Get the tenant ID from tenant list output.
$openstack project list
- Update Server Groups limitation.
$nova quota-update --server-groups <Server groups Limitation> <tenant-id>
For example:
$nova quota-update --server-groups 20 5a49b043d9ea4666ac4adf6bc821942e
5.1.1.3 Confirming Solution
Check whether the IPWorks VNF stack can be deployed successfully. If the problem persists, contact next level of Ericsson support.
5.1.2 VLAN Conflicts
5.1.2.1 Trouble Symptoms
When you try to launch IPWorks VNF HEAT stack, it fails with the "CREATE_FAILED" stack status, and the reason is "Unable to create the network. The VLAN xxx on physical network default in use.".
$openstack stack show <Stack Name or ID>
For example:
$openstack stack show ipw6a ... | parent | None | | stack_name | sub12-release-vnf | | stack_owner | admin | | stack_status | CREATE_FAILED | | stack_status_reason | Resource CREATE failed: Conflict: resources.ipw_sig_sp:| | | Unable to create the network. The VLAN 213 on physical | | | network default is in use. | | stack_user_project_id | 3f8143c8366e45e09083edf4e6845791 | | template_description | IPWorks Stack for CEE HEAT (08-01-2016) | | timeout_mins | None | | updated_time | None | +-----------------------+--------------------------------------------------------+
5.1.2.2 Locating Fault
To detect which network occupies the VLAN ID and the reason for why the VLAN is used, execute the following command in Atlas server:
- Check VLAN ID is used by which CEE neutron network.
$vid=<VLAN_ID>
$for i in $(neutron net-list -F name -D -f value);do j=$(neutron net-show -F provider:segmentation_id -f value $i); [[ $j == "$vid" ]] && echo "Occupy vlan $vid by network $i" && break ; done
According to the above example, execute the following commands:
$vid=213
$for i in $(neutron net-list -F name -D -f value);do j=$(neutron net-show -F provider:segmentation_id -f value $i); [[ $j == "$vid" ]] && echo "Occupy vlan $vid by network $i" && break ; done
The command output shows like below:
Occupy vlan 213 by network network ipw6a_sig_sp
- Check whether the VLAN ID is duplicated with other network.
If the network data is dirty or the VLAN ID is occupied by other VNF
application, delete the network manually in Atlas server:
$neutron net-delete <NET_NAME>
According to the above example, execute the following command:
$neutron net-delete ipw6a_sig_sp
5.1.2.3 Confirming Solution
Check whether the IPWorks VNF stack can be deployed successfully. If the problem persists, contact next level of Ericsson support.
5.1.3 Failed to Create Network
5.1.3.1 Trouble Symptoms
When you try to launch IPWorks VNF HEAT stack, it fails with the “CREATE_FAILED” stack status and the reason is “create_network_postcommit failed”.
$openstack stack show <Stack Name or ID>
For example:
$openstack stack show ipw6a ... | parent | None | | stack_name | ipw6a | | stack_owner | ipwvnf | | stack_status | CREATE_FAILED | | stack_status_reason | Resource CREATE failed: InternalServerError: | | | resources.ipw_oam_sp: create_network_postcommit failed. | | stack_user_project_id | 2326bf1070a94112bb4daf4a6a9e81cd | | template_description | IPWorks Stack for CEE HEAT (08-01-2016) | | timeout_mins | 60 | | updated_time | None | +-----------------------+---------------------------------------------------------+
5.1.3.2 Locating Fault
Detect which network occupies the VLAN ID in BSP DMX. In DMX COM CLI, check whether the VLAN ID exists, if yes, make sure that the VLAN ID is not used by other network like other application VNF. First, confirm this by IP plan or with CEE administrator.
- Navigate to the VirtualBridge MO.
>ManagedElement=1,DmxcFunction=1,Trm=1,VirtualBridge=CEE
>show Vlanid=<VLAN_ID>
- If the VLAN ID is already there, delete it in configuration
mode as below:
>configure
>no Vlan=<VLAN_ID>
According to the above example, execute the following command:
>no Vlan=<ipw_oam_sp VLAN_ID>
5.1.3.3 Confirming Solution
Check if the IPWorks VNF stack can be deployed successfully. If the problem persists, contact next level of Ericsson support.
5.1.4 Policy Problem
5.1.4.1 Trouble Symptoms
When you try to launch IPWorks VNF HEAT stack, it fails with "CREATE_FAILED” stack status and the reason shows that policy does not allow several actions to be performed.
$openstack stack show <Stack Name or ID>
For example:
$openstack stack show ipw6a … | parent | None | | stack_name | ipw6a | | stack_owner | ipwdemo | | stack_status | CREATE_FAILED | | stack_status_reason | Resource CREATE failed: Forbidden: | | | resources.ipw_sig_sp: Policy doesn't allow | | | ((((rule:create_network and | | | rule:create_network:provider:physical_network) and | | | rule:create_network:shared) and | | | rule:create_network:provider:network_type) and | | | rule:create_network:provider:segmentation_id) to be | | | performed. | | stack_user_project_id | 42322a142af24b9a821475b434ea8152 | | template_description | IPWorks Stack for CEE HEAT (08-01-2016) | | timeout_mins | 60 | | updated_time | None | +-----------------------+------------------------------------------------------------+
5.1.4.2 Locating Fault
This issue is caused by trying to launch IPWorks VNF stack by using a user without "admin" role.
Show the user info in Atlas server:
$ openstack role list --user <USER_NAME> --project <TENANT_NAME>
For example, the following user “ipwvnf” has admin role.
$ openstack role list --user ipwvnf --project ipwvnf
+----------------------------------+----------+----------------------------------+----------------------------------+ | id | name | user_id | tenant_id | +----------------------------------+----------+----------------------------------+----------------------------------+ | 9fe2ff9ee4384b1894a90878d3e92bab | _member_ | 50f8c42336d347dbbd1a506428b1fdc6 | 6e4c612850914c7f86041085bf00a2a2 | | 3e86a80ffab44fd6b489c2d9d2ccaf13 | admin | 50f8c42336d347dbbd1a506428b1fdc6 | 6e4c612850914c7f86041085bf00a2a2 | +----------------------------------+----------+----------------------------------+----------------------------------+ |
If the IPWorks VNF tenant user does not have “admin” role, contact CEE administrator to add “admin” role to the user first.
5.1.4.3 Confirming Solution
After adding “admin” role to the IPWorks tenant user, check if the IPWorks VNF stack can be deployed successfully. If the problem persists, contact next level of Ericsson support.
5.1.5 Failed to Delete HEAT Stack
5.1.5.1 Trouble Symptoms
When you try to delete a HEAT stack for IPWorks VNF, it fails with the "DELETE_FAILED" stack status.
Execute the following command in Atlas server:
$openstack stack show <Stack Name or ID>
For example:
$openstack stack show ipw6a ... | parent | None | | stack_name | ipw6a | | stack_owner | admin | | stack_status | DELETE_FAILED | | stack_status_reason | Resource DELETE failed: Error: resources.ipw_SC-1: | | | Server ipw6a_SC-1 delete failed: (400) Cannot pin/unpin| | | cpus [8, 16, 18, 6] from the following pinned set [9, | | | 3, 4, 5, 17] | | stack_user_project_id | d7920b81148944ba9a8a6400a0d3b593 | | template_description | IPWorks Stack for CEE HEAT (08-01-2016) | | timeout_mins | None | | updated_time | None | +-----------------------+--------------------------------------------------------+
5.1.5.2 Locating Fault
To delete the stack, stop the VM (SC-1 here) first by using nova command, and then delete the HEAT stack in Atlas server:
$nova stop <VM_NAME>
$heat stack-delete <STACK_NAME>
According to the above example, execute the following commands:
$nova stop ipw6a_SC-1
$heat stack-delete ipw6a
5.1.5.3 Confirming Solution
Execute the following command to check whether the IPWorks VNF stack can be deleted successfully.
$openstack stack show <Stack Name or ID>
If the problem still remains, contact next level of Ericsson support.
5.2 IPWorks Upgrade
This section provides information on resolving problems during IPWorks Upgrade.
5.2.1 Error: Could Not Find Local Upgrade Package
5.2.1.1 Trouble Symptoms
When user tries to create IPWorks Upgrade Package (UP) by executing the command createUpgradePackage in ECLI, it fails with "Could not find local upgrade package".
SC-X:~ #ls /cluster/UP/
5.2.1.2 Locating Fault
Check the folder /cluster/UP to see whether other files, in addition to the file ERIC-IPW_UP.tar.gz, exist in this folder. If yes, remove the files except for the ERIC-IPW_UP.tar.gz and then try the action again.
5.2.1.3 Confirming Solution
Check whether the IPWorks UP can be created successfully. If the problem still remains, contact next level of Ericsson support with ECIM logs and /var/log/messages.
To generate the ECIM logs, do the following:
- Find which SC is active for ECIM process.
#cmw-status -v csiass | grep -i ecimswm -A 2
- Enable ECIM trace log (assume that ECIM is active in SC-1).
For example:
SC-1:~ # ps -ef | grep ecim
cmw-swm 7788 1 0 Dec07 ? 00:00:01 /opt/coremw/lib/ecimswm instantiate
SC-1:~ # kill -SIGUSR2 7788
- View the log under the following folder:
/var/opt/coremw/ecimswm
5.2.2 Error: Failed to Remove Upgrade Package
5.2.2.1 Trouble Symptoms
When user tries to remove IPWorks Upgrade Package (UP) by executing the command removePackageUpgrade UpgradePackage=<UP Name> in ECLI, it fails with "Failed to remove upgrade package".
5.2.2.2 Locating Fault
Check folder /cluster/UP to find if there is any file (for example, ERIC-IPW_UP.tar.gz) in this folder. If yes, do the following:
- Remove all files under the folder.
SC-X:~ #rm /cluster/UP/*
- Try to remove the IPWorks UP again.
For details, refer to Delete Upgrade Package.
5.2.2.3 Confirming Solution
Check whether the IPWorks UP can be removed successfully. If the problem still remains, contact next level of Ericsson support with ECIM log and /var/log/messages.
To generate the ECIM logs, do the following:
- Find which SC is active for ECIM process.
#cmw-status -v csiass | grep -i ecimswm -A 2 | grep ACTIVE -B 1
- Enable ECIM trace log (assume that ECIM is active in SC-1).
For example:
SC-1:~ # ps -ef | grep ecim
cmw-swm 7788 1 0 Dec07 ? 00:00:01 /opt/coremw/lib/ecimswm instantiate
SC-1:~ # kill -SIGUSR2 7788
- View the log under the following folder:
/var/opt/coremw/ecimswm
5.2.3 Failed to Restore System Data after Upgrade Failure
5.2.3.1 Trouble Symptoms
After upgrade failure, user cannot restore System Data by using ECLI. When this issue occurs, users receive the information resembles the following:
actionName="RESTORE" <read-only> additionalInfo <read-only> "Restore Backup for SystemData_BKP_preUGLSV16_2017-03-09: Initialized" "No active result is reported for one or more groups. BRFC is cancelling Current Request" "Restore Backup for SystemData_BKP_preUGLSV16_2017-03-09: Failed" |
5.2.3.2 Locating Fault
This issue occurs when upgrade fails and the DRBD is running on SC-2.
To resolve this issue:
- Confirm that drbd is running on SC-2. Execute the following
command on SC-1:
SC-1:~ # drbd-overview
For example:
0:drbd0/0 Connected Secondary/Primary UpToDate/UpToDate C r-----
The output shows that SC-1 is secondary, this means that drbd is running on SC-2.
- Reboot SC-2 to switch drbd to SC-1.
SC-2:~ # reboot
- After reboot SC-2, execute the command again to see whether
DRBD is switched to SC-1 successfully.
SC-1:~ # drbd-overview
For example:
0:drbd0/0 Connected Primary/Secondary UpToDate/UpToDate C r----- lvm-pv: lde-cluster-vg 100.00g 50.06g
The output shows that the SC-1 is primary, now DRBD is running on SC-1.
- Perform System Date backup restore again.
For more information, refer to the section Restore System Data Backup in Restore Backup.
5.2.3.3 Confirming Solution
Check whether the operation is successful, if not, contact next level of Ericsson support.
5.2.4 Error: Campaign Failed Verification
5.2.4.1 Trouble Symptoms
When user verifies the result of preparation of IPWorks Upgrade Package in ECLI, the result shows that the verification is failed.
For example:
(UpgradePackage=IPWORKS.base-AVA90133-3.0.0-2)>show -v UpgradePackage=IPWORKS.base-AVA90133-3.0.0-2 activationFallbackTimer=0 <read-only> created="2018-03-04T11:56:44" <read-only> creatorActionId=4 <read-only> execMethod=ONE_STEP ignoreBreakPoints=true <default> password=[] <empty> state=PREPARE_COMPLETED <read-only> upgradePackageId="IPWORKS.base-AVA90133-3.0.0-2" uri="sftp://root@10.170.57.148:/cluster/UP" userLabel=[] <empty> activationStep[@1] <read-only> description="not yet supported" <read-only> name="not yet supported" <read-only> serialNumber=1 <read-only> administrativeData[@1] <read-only> description="" <read-only> productionDate="2018-03-04" <read-only> productName="IPWORKS.base" <read-only> productNumber="AVA90133" <read-only> productRevision="3.0.0-2" <read-only> type="OTHER" <read-only> reportProgress actionId=4 actionName="Verify" additionalInfo "" progressInfo="Prepare UpgradePackage" progressPercentage=100 result=FAILURE resultInfo="Campaign failed verification" state=FINISHED step=1 stepProgressPercentage=0
5.2.4.2 Locating Fault
On SC nodes, check whether the following error log exists in /var/log/messages:
For example:
Mar 4 12:05:31 SC-1 CMW: ERROR (cmw-campaign-verify): ERROR: Verify timeout Mar 4 12:05:31 SC-1 ecimswm: Campaigned failed verification for ERIC-CSM-Merged-2018_03_04-120152 Mar 4 12:05:31 SC-1 ecimswm: Calling immutil_saImmOmAdminOwnerInitialize with owner CoreMwEcimSwM_140656747996928 and releaseOnFinalize TRUE Mar 4 12:05:31 SC-1 osafimmnd[7760]: NO Ccb 11464 COMMITTED (CoreMwEcimSwM_140656747996928)
If similar error log exists, on all nodes (SC and PL), check whether the following error log exists in /var/log/messages:
For example:
On PL-3: Mar 4 12:03:51 PL-3 osafsmfnd[6743]: NO Failed to send mds message, rc = 2, SMFD DEST 0
On PL-4: Mar 4 12:03:51 PL-4 osafsmfnd[6750]: NO Failed to send mds message, rc = 2, SMFD DEST 0
This example shows that PL-3 and PL-4 have the problem, execute the following commands to fix the problem for PL-3 and PL-4:
amf-adm restart safComp=SMFND,safSu=PL-3,safSg=NoRed,safApp=OpenSAF
amf-adm restart safComp=SMFND,safSu=PL-4,safSg=NoRed,safApp=OpenSAF
5.2.4.3 Confirming Solution
Try again to verify the preparation of IPWorks Upgrade Package in ECLI. If the issue remains, contact next level of Ericsson support.
5.2.5 Login Fails during Rebooting SC
5.2.5.1 Trouble Symptoms
Login to IPWCLI provision system fails when you shut down the SC which SS and SqlmgmNode is running by the command shutdown. After about 100 s, you can log on to the IPWCLI system successfully.
5.2.5.2 Locating Fault
It is not recommended to shut down the OS system. It can cause time consuming for switching resource from one SC to another.
If shutting down system is required, use the command shutdown -h now. This can reduce the time from about 100 s to 35 s.
5.2.5.3 Confirming Solution
Not Applicable.
5.2.6 Health Check Hang
5.2.6.1 Trouble Symptoms
The operation of health check on IPWorks system fails during the upgrade procedure. The operation will stop at some point and will not proceed.
5.2.6.2 Locating Fault
Part of health check operation is to check whether any error exists in IPWorks application logs. If there are too many errors, the health check script could not handle them in this situation and will hang at the moment.
Execute below steps to confirm solution:
- Backup all the logs in /storage/no-backup/ipworks/logs/SC-X/* and /storage/no-backup/ipworks/logs/PL-X/* and delete them.
- Stop health check process in SC-1 and SC-2.
- Execute the command to clear related log.
# for log in $(find /storage/no-backup/ipworks/logs -mtime -1 -name '*.log*');do > $log;done
- Record the process id of hcfd.
# ps -ef | grep hcfd | grep -v grep
- Kill the process.
# kill -9 <process id>
- Execute the command to clear related log.
- Perform the health check operation again.
5.2.6.3 Confirming Solution
Check whether the health check operates successfully. If the problem persists, contact next level of Ericsson support.
5.3 IPWCLI
5.3.1 Network Issues
5.3.1.1 Trouble Symptoms
The Storage Server cannot be started.
Also, when the Storage Server is not running and the IPWorks CLI is started, the CLI gets an Error “Network I/O Error: Opening socket: reason: Connection refused: connect” as CLI tries to send a logon request to the SS on Server port.
5.3.1.2 Locating Fault
Check the SS status by using ipw-ctr status ss command.
Ensure that there are no other processes on the system that uses the TCP/IP port used by the Storage Server. The default TCP/IP port for the Storage Server is 17071.
5.3.1.3 Confirming Solution
After the port of Storage Server is changed, check whether Storage Server can be started successfully.
5.3.2 Provisioning Issues
5.3.2.1 Trouble Symptoms
When user selects a range of resource records to delete (for example, the command select naptrrecord and delete), and it contains one or more resource records marked for deletion, the command delete fails.
5.3.2.2 Locating Fault
This issue occurs because records in the range are already marked for deletion.
To avoid this issue, users must execute the command update dnsserver to remove such resource records that are marked for deletion from the MySQL database, then execute the command select naptrrecord and delete to delete the range of resource records.
When an object (Resource) is in processing state (for example, in transaction state) the object is locked by the Storage Server to prevent the other users to modify or delete the same object. If the users send any requests related to the locked object, the SS sends “Locked By Admin” Exception to IPWorks CLI.
5.3.2.3 Confirming Solution
Check whether the delete operation can be performed successfully after executing the command update dnsserver.
5.3.3 Provisioning Rate Too Low
5.3.3.1 Trouble Symptoms
The provisioning through the IPWorks CLI is too slow.
5.3.3.2 Locating Fault
Use the MySQL Benchmark Tool to test the provisioning rate.
For example, test 10, 000 queries and the average number of seconds falls in the range of 20 seconds - 30 seconds.
# /usr/local/mysql/bin/mysqlslap --engine=ndbcluster --socket=/local/ipworks/mysql-cluster/sqlnode/sqlnode.sock -a --auto-generate-sql-load-type=write --number-char-cols =4 --number-of-queries=10000 Benchmark
Average number of seconds to run all queries: 24.565 seconds Minimum number of seconds to run all queries: 24.565 seconds Maximum number of seconds to run all queries: 24.565 seconds Number of clients running queries: 1 Average number of queries per client: 10000 |
5.3.3.3 Confirming Solution
Not applicable.
5.4 ECLI
This section provides information on resolving problems with ECLI (COM CLI).
5.4.1 ERROR: Transaction validation failed with error code: ComFailure
5.4.1.1 Trouble Symptoms
When user tries to commit configurations in ECLI, it fails with "ERROR: Transaction validation failed with error code: ComFailure".
5.4.1.2 Locating Fault
Check the DN error log file in /var/log/messages. The fault can be caused by DNS service or DHCPv4 service.
- Fault caused
by DNS service
For example:
Oct 12 10:04:18 SC-1 com: COM_SA Error string number 0: IMM: ERR_NOT_EXIST: object 'dnsLogId=1,bindServiceId=1,dnsServerId=1,ipworksDnsRootId=1' exist but no implementer (which is required)
Oct 12 10:04:18 SC-1 com: COM_SA OamSAImmBridge::OamSAPrepare() ModifiedObjects: failed to Modify DN
This output shows the DN error is caused by DNS configurations in COM CLI. Follow the procedure in Restart DNS Service to resolve this problem.
- Fault caused
by DHCPv4 service
For example:
Jan 29 15:28:49 SC-1 com: COM_SA Error string number 0: IMM: ERR_NOT_EXIST: object 'dhcpv4LogId=1,dhcpServerId=PL-4,ipworksDHCPRootId=1' exist but no implementer (which is required)
Jan 29 15:28:49 SC-1 com: COM_SA OamSAImmBridge::OamSAPrepare() ModifiedObjects: failed to Modify DN(dhcpv4LogId=1,dhcpServerId=PL-4,ipworksDHCPRootId=1),error(IMM: SA_AIS_ERR_NOT_EXIST)
This output shows the DN error is caused by DHCPv4 configurations in COM CLI. Follow the procedure in Clean DHCPv4 Implementer to resolve this problem.
5.4.1.3 Confirming Fault
- Restart DNS service
Restart relevant service with the command:
ipw-ctr restart <service name> <PL Name>
According to the output shown in Fault caused by DNS service, restart DNS service:
ipw-ctr restart dns <PL Name>
If the issue occurs on the PL-3, restart the DNS service on the PL-3:
ipw-ctr restart dns pl-3
After the service restarts, try to commit the configurations again. If the issue remains, contact next level of Ericsson support.
- Clean DHCPv4 implementer
Clean DHCPv4 implementer with the command:
/opt/ipworks/dhcp/scripts/ipworks.dhcpv4 cleanup
According to the output shown in Fault caused by DHCPv4 service, execute the command on PL-4:
# ssh PL-4
# /opt/ipworks/dhcp/scripts/ipworks.dhcpv4 cleanup
If the issue occurs on the PL-3, execute this command on PL-3.
Then, try to commit the configurations again. If the issue remains, contact next level of Ericsson support.
5.5 IPWorks DNS Management
This section provides information on resolving problems with the IPWorks DNS Management in Web GUI.
5.5.1 Trouble Symptoms
This section describes following common IPWorks DNS Management problems as shown in Table 10.
|
Symptoms |
Locating Fault |
|---|---|
|
Session time out |
See Section 5.5.2.1 |
|
Log in failed |
See Section 5.5.2.2 |
5.5.2 Locating Fault
This section describes how to locate common IPWorks DNS Management problems described in Section 5.5.1.
If the problems persist, users need to relogin their sessions. Alternatively, users need to restart the IPWorks DNS Management.
5.5.2.1 Session Time Out
By default, sessions times out after 30 minutes of inactivity. If this happens, user must log in again.
5.5.2.2 Log in failed
5.5.2.2.1 Tunnel not work or IPWorks SS down
Normally following two cases can cause the error shown as Figure 3.
- The tunnel can not be created or not work well.
Check the status of the port used to create the tunnel, take 17071 as an example.
- Take Windows system for example, check the port status
on your client with following command and the result should be like
following:
>netstat -ano | findstr 17071
TCP 127.0.0.1:17071 0.0.0.0:0 LISTENING 11960
TCP [::1]:17071 [::]:0 LISTENING 11960
- Check the port status on your jump server. SUSE are
supposed to be the system of the jump server.
Use following command and the result should be like following:
# netstat -ano | grep 17071
tcp 0 0 127.0.0.1:17071 0.0.0.0:* LISTEN off (0.00/0/0)
tcp6 0 0 ::1:17071 :::* LISTEN off (0.00/0/0)
- Take Windows system for example, check the port status
on your client with following command and the result should be like
following:
- The IPWorks Storage Server is down.
Check the status of IPWorks Storage Server with following command and ensure that the SS is running:
# ipw-ctr status all
on SC-1 :
ss is running as active role. csvengine is down. sqlnodemgr is running as standby role.
5.5.2.2.2 IPWorks DNS Management Engine down
If the error message in Figure 4 is displayed, there is error in the DNS Management Engine start.
- Close the DNS Management.
- Check the port 8080 with the following command and ensure
that the port is not occupied by other application.
>netstat -ano | findstr 8080
TCP 0.0.0.0:8080 0.0.0.0:0 LISTENING 6288
TCP [::]:8080 [::]:0 LISTENING 6288
5.5.3 Confirming Solution
Redo the login operation and check whether login is successful, if not, contact next level of Ericsson support.
5.6 Storage Server
This section describes Storage Server troubleshooting cases.
5.6.1 Failed to Stop/Start/Restart Storage Server by ipw-ctr
5.6.1.1 Trouble Symptoms
Failure to start Storage Server will cause the SC reboot.
For example, you might see the following output:
SC-1:~ # ipw-ctr start ss
Start ss ==> failed!
After several seconds, the following output might be displayed:
Broadcast message from root@SC-1 (somewhere) (Wed Mar 15 09:42:36 2017):
The system is going down for reboot NOW!
5.6.1.2 Locating Fault
Do the following steps to trouble shoot the root cause.
- Stop both SS on SC-1 and SC-2 immediately.
# ipw-ctr stop ss SC-1
# ipw-ctr stop ss SC-2
- Check Storage Server
status.
# ipw-ctr status ss <SC-ID>
<SC-ID> can be SC-1 or SC-2 which Storage Server is running on.
If output shows saAmfSUPresenceState is failed, go to Step 3. Otherwise, go to Step 4.
- Repair Storage Server.
# ipw-ctr repaired ss <SC-ID>
After executing this command, execute Step 2 again to check the status.
If it is failed, continue Step 4. Otherwise, start both Storage Server in both SCs.
- Enable trace log in ECLI
for Storage Server.
>dn ManagedElement=1,IpworksFunction=1,IpworksCommonRoot=1,StorageServer=1 (StorageServer=1)>configure (config-StorageServer=1)>level=LOG_LEVEL_TRACE (config-StorageServer=1)>commit (StorageServer=1)>exit
- Start Storage Server by executing script ipworks.ss directly in unhealthy SC.
- Start Storage Server by script.
#cd /opt/ipworks/ss/scripts
#bash +x ipworks.ss start_debug
Check the output to find if there is any failure information.
- Check Storage Server log.
#cd /storage/no-backup/ipworks/logs/<SC-ID>
Check log files ipworks_ss_SC-1.log and ss_amf_wrapper.log to find the if there is any failure information.
Check /var/log/<SC-ID>/messages, and search ipworks.ss to find Storage Server related log.
- Start Storage Server by script.
5.6.1.3 Confirming Solution
From all above logs, you can find which failure cause Storage Server fails to start. If the problem still remains, collect all the related information to ask for next level support.
5.6.2 Storage Server Not Listen on the Port
5.6.2.1 Trouble Symptoms
The SS started successfully, but the SS function is abnormal.
The following output might be displayed:
SC-1:~ # ipwcli
IPWorks> Login:admin IPWorks> Password:******** Unexpected error detected: Could not create connection to database server. Attempted reconnect 3 times. Giving up.
5.6.2.2 Locating Fault
- Check Storage Server status.
# ipw-ctr status ss <SC-ID>
<SC-ID> can be SC-1 or SC-2 which Storage Server is running on.
For example:
SC-1:~ # ipw-ctr status ss sc-1
ss on SC-1 is running, working as an active node.
- Check Storage Server process status on the active SC.
SC-1:~#ps -ef |grep StorageServer |grep -v grep
root 13664 1 0 08:35 ? 00:00:08 java -DTCPSTARTPORT=9701 -DTCPENDPORT=9708 -DMULTICASTADDRESS=224.0.0.1 -DMULTICASTPORT=15663 -DBIND_INTERFACE_ADDRESS=169.254.100.23 -Djboss.server.name=ipwss_SC-1 -Djava.net.preferIPv4Stack=true -Djava.util.logging.config.file=/opt/ipworks/jre/java/lib/logging.properties -server -DApp=ipwss -DSysLogin=root -Xmx512m -Xms512m -cp /opt/ipworks/ss/scripts:/opt/ipworks/common/java/AdventNetLogging.jar:/opt/ipworks/common/java/log4j-1.2.15.jar:/opt/ipworks/common/java/ipwcommon.jar:/opt/ipworks/common/java/AdventNetAgentRuntimeUtilities.jar:/opt/ipworks/common/java/dom4j-1.6.1.jar:/opt/ipworks/common/java/ipwse.jar:/opt/ipworks/common/java/AdventNetSnmp.jar:/opt/ipworks/common/java/AdventNetSnmpAgent.jar:/opt/ipworks/ss/java/mysql-connector-java-commercial-5.1.16-bin.jar:/opt/ipworks/ss/java/ipwss.jar:/home/javaoam/lib/jna-4.0.0.jar:/home/javaoam/lib/cglib-2.2.jar:/home/javaoam/lib/javaoam-core-2.2.0-186.jar:/home/javaoam/lib/javaoam-coremw-spi-2.2.0-186.jar ericsson.ipworks.storage.server.StorageServer
From the output, you can see the process is running.
- Check if the SS port is listening by below command.
# netstat -anp |grep <ss_port>
The defult value of <ss_port> is 17071. For more information, refer to the section Storage Server Initial Configuration in IPWorks Initial Configuration.
For example:
SC-1:~ # netstat -anp |grep 17071
tcp 0 0 0.0.0.0:17071 0.0.0.0:* LISTEN 13664/java
From the output, you can see the port 17071 is listening here.
If the port is not displayed in the output, check if there is alarm related to SS in FM, the specific problems might are:
- Storage Server, MySQL Cluster Node Unreachable, for
example:
FmAlarm=40 activeSeverity=MAJOR additionalText="This alarm is issued when the MySQL Cluster [ SC-1:SQL Node ] is down or unreachable from [ SC-1 ] ManageNode;uuid:E02973B0-23DD-418B-9F2C-377734F0B523" eventType=COMMUNICATIONSALARM lastEventTime="2017-03-17T02:43:57.429+01:00" majorType=193 minorType=860161 originalAdditionalText="This alarm is issued when the MySQL Cluster [ SC-1:SQL Node ] is down or unreachable from [ SC-1 ] ManageNode;uuid:E02973B0-23DD-418B-9F2C-377734F0B523" originalEventTime="2017-03-17T02:43:57.429+01:00" originalSeverity=MAJOR probableCause=306 sequenceNumber=90 source="ManagedElement=ipworks_cba,SystemFunctions=1,Fm=1,FmAlarmModel=ipworksEM,FmAlarmType=ipworksEmMysqlClusterNodeUnreachable,Source=SC-1:ManageNode:SC-1:SQL Node" specificProblem="Storage Server, MySQL Cluster Node Unreachable"To clear the alarm, refer to Storage Server, MySQL Cluster Node Unreachable.
- Storage Server, MySQL Database Unreachable, for example:
FmAlarm=42 activeSeverity=CRITICAL additionalText="This alarm is issued when Storage Server losts communication with Database;uuid:E02973B0-23DD-418B-9F2C-377734F0B523" eventType=COMMUNICATIONSALARM lastEventTime="2017-03-17T02:44:01.690+01:00" majorType=193 minorType=860162 originalAdditionalText="This alarm is issued when Storage Server losts communication with Database;uuid:E02973B0-23DD-418B-9F2C-377734F0B523" originalEventTime="2017-03-17T02:44:01.690+01:00" originalSeverity=CRITICAL probableCause=306 sequenceNumber=92 source="ManagedElement=ipworks_cba,SystemFunctions=1,Fm=1,FmAlarmModel=ipworksEM,FmAlarmType=ipworksEmSsDbUnreachable,Source=Storage Server" specificProblem="Storage Server, MySQL Database Unreachable"To clear the alarm, refer to Storage Server, MySQL Database Unreachable.
- Storage Server, MySQL Cluster Node Unreachable, for
example:
After MySQL alarm ceased, wait for 10 seconds to check again.
5.6.2.3 Confirming Solution
If the problem still remains, enable the trace login ECLI for Storage Server as the Step 4 in Section 5.6.1.2 in Section 5.6.1 and contact next level of Ericsson support.
5.7 Server Manager
This section provides information on resolving problems with the IPWorks Server Manager (SM).
For DNS and ASDNS, each has an associated Server Manager component residing on the same machine. The Server Manager serves as the link between the DNS or ASDNS server and the rest of the IPWorks system. All communication between the Storage Server and the DNS or ASDNS server is through the Server Manager.
When the Server Manager starts up, it connects to the SS, logs on, registers as a remote agent for the PS, and reports the status of the PS to the SS. Use ECLI on which the Server Manager is running to configure the data that the Server Manager uses in contacting the SS.
5.7.1 Server Manager Failed to Start
5.7.1.1 Trouble Symptoms
The Server Manager failed to start.
5.7.1.2 Locating Fault
Use the ECLI to configure a higher logging level for the Server Manager (see Section 3.6). Then restart the Server Manager (see Section 2.1.2) and check the log file to find the specific problem.
- Note:
- Use debug logging only to diagnose problems. Turn it off during normal operation. This is because the log file grows rapidly when debugging is enabled. This will degrade server performance, especially at higher levels of debug logging.
5.7.1.3 Confirming Solution
5.7.2 Problem in Deleting Server Instance
5.7.2.1 Trouble Symptoms
When an instance is in running status, it cannot be deleted.
5.7.2.2 Locating Fault
To delete a server instance from the IPWorks CLI, ensure that the Server Manager for that server is not running. For instance, to delete a DNS Server from machine 10.0.0.1, stop the DNS Server Manager in 10.0.0.1.
5.7.2.3 Confirming Solution
After stopping the Server Manager, test the behavior again. If the problem persists, contact Ericsson support.
5.7.3 Network Unreachable Exception
5.7.3.1 Trouble Symptoms
If the log of Server Manager reports an exception "Network unreachable" when the Server Manager starts up, the machine is not configured to route packets to the machine on which the Storage Server is running.
5.7.3.2 Locating Fault
Check the interfaces configured for the machine using ifconfig -a and check the routing table using netstat -r.
5.7.3.3 Confirming Solution
When correctly configured, the Storage Server machine can be ping from the Server Manager machine.
5.7.4 Access Denied Exception
5.7.4.1 Trouble Symptoms
When the IPWorks username and password configured for the Server Manager is not a valid combination in the Storage Server, the logon attempt fails. Also, the alarm DNS, Storage Server Unreachable from Server is raised.
5.7.4.2 Locating Fault
Use the ECLI to check the configuration parameters of Server Manager. Ensure that the Storage Server address is pointing to a Storage Server that is running.
Example 14 Verify DNS SM Configuration
>show -v ManagedElement=<Node Name>,IpworksFunction=1,IpworksDnsRoot=1,DnsServer=1,DnsSm=1 DnsSm=1 dnsSmId="1" ssAddress="ipw_ss" <default> ssPassword="<Encrypted Password>" ssUserName="admin" <default> DnsSmLog=1 |
Then, check the SS status through ipw-ctr. If it is running, try to log on to the ipwcli to verify whether the Storage Server is reachable.
Example 15 Check SS Status
SC-2:~ # ipw-ctr status ss ss on SC-2 is running, working as an active node |
5.7.4.3 Confirming Solution
After correcting the configuration of Server Manager, try the logon attempt again.
Example 16 Verify whether SS is Reachable
# ipwcli IPWorks> Login: <SS Username> IPWorks> Password: Login to server successful. IPWorks> |
5.7.5 Connection Time-out Exception
5.7.5.1 Trouble Symptoms
When the IP address configured for the Primary Storage Server is pointing at a machine that is down or not reachable on the network, the Server Manager tries to contact the Secondary Storage Server. If the Secondary Storage Server is unreachable, then the Server Manager reports a "connect time out" exception. This exception is reported after a delay of 60 seconds by default.
5.7.5.2 Locating Fault
Use ECLI to verify (and correct) the address or password, or both of the Storage Server.
5.7.5.3 Confirming Solution
After correcting the configuration of Storage Server, check whether the Storage Server machine is up and reachable using ping.
5.7.6 Failed Attempting to Get Machine Information
5.7.6.1 Trouble Symptoms
If the machine is not properly configured with a DNS name, the Server Manager reports the message "Failed attempting to get machine information.".
5.7.6.2 Locating Fault
Check that the domain name parameter is properly configured in the /etc/resolv.conf file.
Ensure that the ' hostname' has a corresponding entry in the file /etc/hosts, otherwise the Server Manager does not start.
5.7.6.3 Confirm Fault
After configuring the parameter properly in the configuration file, check whether the exception will be raised again.
5.7.7 New or Renamed Object Already Exists Exception
5.7.7.1 Trouble Symptoms
When there is a DNS server object in the Storage Server with the same hostname but different IP address, the Server Manager reports an exception "New or renamed object already exists".
5.7.7.2 Locating Fault
- Change the IP address of the machine to that used by
the server object in the Storage Server.
Or
- Change the IP address of the object in the Storage Server to that of the machine.
5.7.7.3 Confirm Fault
After changing the IP address, check whether the exception will be raised again.
5.7.8 Permission Denied Exception
5.7.8.1 Trouble Symptoms
After the Server Manager has connected and registered, it attempts to write the status of the DNS server to the Storage Server. When the user under which the Server Manager is running has no write privileges, the Server Manager reports an exception "Permission to create/change/delete object denied".
5.7.8.2 Locating Fault
Configure the Server Manager using the DNS Server Manager configuration file (see Table 4 or Section 2.3.3) or change the permissions of the user to allow writing.
5.7.8.3 Confirming Solution
After the changing of user permission and correcting the Server Manager configuration, try to perform some write operations to check whether the exception will be raised again.
5.7.9 Cannot Stop the Server Manager
5.7.9.1 Trouble Symptoms
The Server Manager does not stop when the user tries to stop it using ipw-ctr.
5.7.9.2 Locating Fault
Make sure that the root user (or other user under which the Server Manager is running) has a path to /opt/ipworks/common/scripts/. Manually run from a terminal window:
#/opt/ipworks/common/scripts/ipw-ctr stop <type-of-server>sm <hostname>
Where: <type of server> stands for dns or asdns.
If this fails to stop the Server Manager, make sure that the file /var/run/*sm.port exists and has not been modified. If it is necessary to stop the Server Manager using the kill command, use kill without the -9 parameter. This allows the Server Manager to clean up the file /var/run/*sm.port.
5.7.9.3 Confirming Solution
Not applicable.
5.7.10 Failed Sending Command to the DNS Server
5.7.10.1 Trouble Symptoms
The IPWorks CLI reports " Failed sending the RNDC <cmd> command to <servername> server", where <cmd> is " stop", " reload", and so on. The server is the name of the machine on which the DNS server and DNS Server Manager are running. This message is also displayed in the Server Manager log file at LOG_LEVEL_INFO, LOG_LEVEL_DEBUG, or LOG_LEVEL_TRACE.
5.7.10.2 Locating Fault
Make sure that the root user (or other user under which the Server Manager is running) has a path to /opt/ipworks/dns/usr/bin/ and that the file /etc/rndc.key exists and contains a valid TSIG key.
5.7.10.3 Confirming Solution
Run rndc from the command line to verify that the server responds correctly.
Example:
rndc status version: 2.6.32.12-0.7-default CPUs found: 1 worker threads: 1 UDP listeners per interface: 1 number of zones: 99 debug level: 90 xfers running: 0 xfers deferred: 0 soa queries in progress: 0 query logging is OFF recursive clients: 0/0/1000 tcp clients: 0/100 server is up and running
5.7.11 Cannot Find Script
5.7.11.1 Trouble Symptoms
The runscript operation and possibly the update operation cause the Server Manager to execute a script on the DNS server machine. These scripts must be placed in the appropriate scripts directory. If the Server Manager cannot find the script, it reports the message " script not found", where script is the absolute path of the script.
5.7.11.2 Locating Fault
Move the script to the correct scripts directory on the DNS server machine.
5.7.11.3 Confirming Solution
After moving the script file to the correct directory, check whether update operation will raise the message again.
5.7.12 Cannot Execute Message When Running a Script
5.7.12.1 Trouble Symptoms
If the script is in the scripts directory but does not have execute permission for the Server Manager process user (usually root), the Server Manager reports the message " cannot execute".
5.7.12.2 Locating Fault
Change the permissions on the script to allow the Server Manager to execute the script.
Example:
>chmod 555 script_file
5.7.12.3 Confirming Solution
After changing the permissions for the script, perform certain execution for the Server Manager to check whether the message will be reported.
5.7.13 IPWorks CLI Displays DNS Records Slowly
5.7.13.1 Trouble Symptoms
Dynamic resource records are retrieved from the DNS server to be presented to the IPWorks CLI. If the query requires excessive data, it takes a long time to transfer it from the DNS server to the Server Manager, to the Storage Server, then to the user interface.
5.7.13.2 Locating Fault
Formulate queries for dynamic data using filters that minimize the amount of data that is retrieved.
5.7.13.3 Confirming Solution
Check whether the new query still takes a long time.
5.7.14 Large Data Queries Cause Memory Problems
5.7.14.1 Trouble Symptoms
The machine on which the DNS server and Server Manager are running must have enough physical memory to avoid excessive paging. If there is not enough physical memory, the query takes a long time. If necessary, increase the physical memory of the DNS server machine.
Sufficient memory must also be made available to the Java Virtual Machine for the Server Manager to create enough resource record or lease objects. The Server Manager log may record an " Out of memory" exception in response to a query for a large amount of data.
5.7.14.2 Locating Fault
Configure the Java Virtual Machine to use more of the machine memory. To do this, edit the file /opt/ipworks/IPWsm/scripts/ipwsm. In the last line of this script, the Java Virtual Machine is started with the parameter -mx128m, indicating a maximum memory use of 128 MB. Increasing this value allows the Server Manager to use more of the system memory.
5.7.14.3 Confirming Solution
After configuring the machine memory, check whether the queries still take a long time.
5.7.15 DNS Server Performance Drops during Queries
5.7.15.1 Trouble Symptoms
Querying a DNS server for a large amount of data can affect the performance of the machine and thus the performance of the DNS server.
5.7.15.2 Locating Fault
Limit the queries that the Server Manager performs in ipworks_*sm.conf to prevent degradation of the DNS services. For more information, see Table 4 and Section 2.3.3.
5.7.15.3 Confirming Solution
Check whether the revised query still affects the performance.
5.7.16 Status of Server in Interface Disagrees with Current Status
5.7.16.1 Trouble Symptoms
The status shown on a DNS server object displayed in the IPWorks CLI can be down when in fact the DNS server is running, or conversely. The DNS server does not automatically inform the Server Manager (SM) of a change in status. The status field on a DNS server also contains a time stamp, for example, " On 04/30/03 at 09:51:30 server is 'down'”. This does not indicate the status of the server; it only indicates that at a particular time the server had this status.
5.7.16.2 Locating Fault
There is a communication problem possibly between the DNS server and the DNS SM.
For example, a service used the same port as the DNS server. If so, follow the procedures to solve the problem:
- Check the alarm for more information.
- Stop the DNS server, then the DNS SM.
- Start the DNS SM, then the DNS server.
- If the problem remains, consult the next level of maintenance support.
- Note:
- Through restarting DNS SM, a new port is assigned. For how to start or stop DNS server, see Section 2.1.2.
5.7.16.3 Confirming Solution
Use ipw-ctr to get server status. For more information, see Section 2.1.2.
Example 17 Check Status of DNS Server on PL-3
ipw-ctr status dns pl-3 dns on PL-3 is running.
5.7.17 RNDC Statistics History Is Lost
5.7.17.1 Trouble Symptoms
The rndcstats and clearrndcstats operations use the RNDC command to display BIND server statistics. This works as a history, appending the results for every rndcstats operation until the clearrndcstats operation is called to delete the previous results. The DNS Server stores all RNDC statistics in a single file. A clearrndcstats operation by one user clears the history for all users.
5.7.17.2 Locating Fault
Multiple users of rndcstats must coordinate their use of this operation for any particular DNS Server.
5.7.17.3 Confirming Solution
After the coordination of the users is performed, check the history by using ipwcli.
- Note:
- The rndcstats command is issued from the CLI.
Example 18 Show RNDC Statistics History
# ipwcli IPWorks> select dnsserver <dns-server> IPWorks> show rndcstats
5.8 DNS Server
This section provides information on resolving problems with the IPWorks DNS Server.
The DNS Server manages DNS data and responds to queries from DNS clients. For more information on DNS management, refer to the section DNS Management in IPWorks Configuration Management.
5.8.1 Master Server Errors
This section describes some common mistakes in configuring master servers.
5.8.1.1 Forgetting to Reload
After changing to a zone, administrators sometimes forget to reload the master server. Thus, while the change was made to the zone configuration, the server is not using the updated information.
5.8.1.2 Forgetting to Update PTR Records
Some applications require that there exist a reverse mapping for each name to address mapping. This is done using PTR records.
Also, when removing forward entries (A and AAAA records) do not forget to delete the corresponding PTR records.
5.8.1.3 Forgetting to Set up Delegations
It is important to have the proper delegations set up in both the parent and child zones. While IPWorks normally takes care of the zones it manages, there may be other DNS servers that need to have delegations to the IPWorks servers and zones and these must be properly configured.
5.8.2 Slave Server Errors
This section describes some common mistakes in configuring slave servers.
5.8.2.1 Forgetting Slave Files
A filename of DNS slave zone should generally be configured in the filename field of slavezone object. So that a backup copy of the zone is kept when a loss of network connectivity with the master servers. It ensures that a backup copy is available for loading if the slave server reboots. Without this, a disconnected slave server that is not able to connect to a master server will have no DNS data to serve.
5.8.2.2 Caching Server Errors
DNS Servers not only serve authoritative data, but they can also be used to answer queries where the answer is not in their authoritative zone. The answers to these queries are then cached for future use.
For this to work, the DNS Servers that are not authoritative for the root "." zone should have a hint zone configured. The hint zone lists the servers authoritative for root. The root zone delegates authority for all top-level domains such as .com, .net, .uk, or .se.
By knowing where the top of the DNS namespace is, the server has a starting point to look for and find the DNS Servers that are authoritative for the name being queried.
Without the hint zone, queries are likely not answered and the DNS Server returns an error code SERVFAIL(server failure).
5.8.2.3 Forwarding Server Errors
It is often an error to configure a forwarding server to forward all requests even when the server is authoritative for one or more zones. If the forwarding server is authoritative for a zone, then the administrator should override the default forwarders setting in the DNS Server object by configuring a null forwarders option for the appropriate zones.
For example, a DNS Server configuration, causing all requests except for those for example.com (and any sub zones) to be forwarded, would have a null forwarders option in the master or slave zone object.
5.8.2.4 Connectivity Errors
If a client cannot connect to one or more DNS Servers, perform the following:
- Check the /etc/resolv.conf file on the client machine if the application cannot connect or is not getting answers and check the addresses listed against name server.
- Use the ping utility to see if there is connectivity to the addresses.
- Use the command netstat -r to check the router settings for the TCP and UDP ports for DNS (port 53).
- Use dig (see Section 2.1.9) or another query utility to query the server.
5.8.2.5 Delegation Errors
A DNS delegation is the relationship between a parent and a child zone. It consists of NS and A or AAAA records that allow the parent to tell DNS clients or servers where to send queries that belong to the child. Common delegation errors include the following:
- Not putting the same set of NS records in both parent and child zones. At least one NS record must be the same in both parent and child zones. It is most desirable for all NS records to be the same in both parent and child. Normally IPWorks manages NS records, so this should not be an issue unless they have been manually adjusted.
- No A or AAAA records to direct queries to the address
of the child server. NS records report the names of the servers for
a zone to DNS. Do NOT forget to provide the name-to-address
mapping in the parent zone, if the name of the server belongs to the
child zone. Otherwise, it is impossible to reach the servers of child
zone. Again, IPWorks normally manages these.
- Putting domain names that belong to the child zone in the parent zone. The only exception to this is the glue NS records. The IPWorks DNS Server defers to the child zone and the parent server does not answer queries that belong to the child domain. This is an exception to the NS records mentioned previously.
5.8.3 DNS Server Fails to Start after System Boot
5.8.3.1 Trouble Symptoms
After system boot, DNS server can neither run, nor start request, nor reload configuration.
5.8.3.2 Locating Fault
- Check the syslog utility for errors. The corresponding path is /var/log/messages. Search for named.
- Use the ECLI to enable the debug logging for DNS server
(see Example 12), and check the file ipworks_dns.log . Other errors may include:
- Running the server from the wrong account.
- Improperly configured network interfaces – use the command ifconfig -a to check.
- Use the IPWorks CLI to check the server configuration.
5.8.3.3 Confirming Solution
Use ipw-ctr to check the DNS server status.
5.8.4 Slave Server Fails to Transfer Zone Data from the Master
5.8.4.1 Trouble Symptoms
Slave server fails to transfer zone data from the master server.
5.8.4.2 Locating Fault
- Check for errors in the slave zone configuration. See Section 5.8.2.
- Check for errors in the master zone configuration. See Section 5.8.1.
- Check connectivity between the master and slave servers. For information on Connectivity Errors, see Section 5.8.2.4.
- Check the serial numbers of the slave and SOA of master zone. The slave does not initiate a transfer if its serial number is higher than the master’s.
5.8.4.3 Confirming Solution
Not applicable.
5.8.5 Server Query Problems
If a server does not respond to queries by using query utility dig , fails to provide an answer for data it should have, or returns an error for queries, then do the following steps. If only specific clients are having problems, run a query utility on one of those systems.
If the utility reports a time-out, check for connectivity problems. Connectivity problems can include the following:
- Low-level network connectivity may be broken. Try ping to see if communication is possible between client and server or server and server. If not, use the command netstat -r to check the configuration and router settings of client.
- The DNS Server may be denying access with the allow-query or blackhole option in the DNS Server object.
If the query utility reports a status of NXDOMAIN, then the server is indicating that no resource record exists for the domain name, resource record type, and class. Perform the following to solve a query problem:
- Check the master zone for the record in question.
- If the records are in the master zone, it is possible that the zone is loaded but the records are not, as update might not have been issued.
- Check if the domain name requested belongs to a child zone.
If the status is SERVFAIL, the server does not have the answer to the query and may have configuration problems that are preventing it from getting the answer. Check if the server being queried contains a hint file. As DNS is a distributed system, servers need a common connection point. The hint file contains the location of the authoritative servers for the root zone. They provide delegation information for all servers in the namespace.
If the status is REFUSED, the server is configured not to allow queries to proceed. If possible check the settings of the allow-query option or check the match-client and match-destinations.
Check the internetDNS attribute through (BindService=1)>show -v.
- If it is true, check whether the keyId=”FAT1023219/1” (Internet DNS) exists. If exists, check whether the file /etc/ipworks/root_cert.cfg exists or the file is corrupted.
- If it is false, check whether the keyId=”FAT1023219/4” (DNS TPS) exists. If exists, check whether the file /etc/ipworks/root_cert.cfg exists or the file is corrupted.
For information about how to check KeyId, refer to View License Information, Reference [16].
If dig returns ANSWER: 0, it means that the domain name requested does exist but there is no resource record for the type requested. For example, if the user wants PTR records and type dig example.com but forget to mention the wanted PTR records, use dig example.com PTR.
If the query utility does not help, look in the ipworks_dns.log server log file (see Section 3.2.5).
If IncludeRecord is used, consider any content error in the related IncludeFile. This is because IncludeFile can be expanded into the masterzone that is affected by the content in IncludeFile.
For more information about the IncludeRecord and IncludeFile objects, refer to the IncludeRecord and IncludeFile sections of IPWorks DNS, ASDNS, ENUM Parameter Description.
5.8.6 Operations Protected by TSIG Fail
If TSIG is used to restrict access to a server, the following is required:
- The same key must be configured at both ends of the TSIG signed transaction – the key must contain the same secret and key name.
- The system clocks for both ends of the TSIG signed transaction must be reasonably synchronized (the times must be within 5 minutes of each other). Use the date -u command on both machines to get the time settings in GMT. It is best to use a time synchronizing application such as NTP (Network Time Protocol).
5.8.7 Incorrect Data Returned for Queries
5.8.7.1 Trouble Symptoms
A DNS answer to a query differs from the expected answer.
5.8.7.2 Locating Fault
- Make sure that after changing to the zone, the DNS Server is updated and the master server loads the zone correctly.
- If querying a slave server, check the SOA serial number in the slave against the master. If the slave server has a lower serial number, the slave server may have not loaded the most recent data from the master. If the slave has a SOA serial number higher than the master for a zone, it does not perform a zone transfer. Delete the zone file in the slave and reload.
- Check slave server logs in the file ipworks_dns.log (see Section 3.2.5). They may report errors transferring data from the master.
For more information on Master Server errors, see Section 5.8.1.
- Note:
- If ActiveSelect DNS is enabled on the domain name under question, it may also alter the data returned based on the state of the monitored systems, the source of the query and the ActiveSelect DNS configuration.
5.8.7.3 Confirming Solution
Not applicable.
5.8.8 Bad Data from a Malicious External DNS Server
5.8.8.1 Trouble Symptoms
When a Cache DNS server sends a request to a malicious external DNS server, and the external DNS server probably returns with a negative answer.
5.8.8.2 Locating Fault
- Restart the Cache DNS Server.
- Stop the communication between the Cache DNS server and
the external DNS server if the problem exists.
- Note:
- How to stop the communication is out of the scope of this document.
5.8.8.3 Confirming Solution
Not applicable.
5.8.9 Bad Data from a Roaming Partner
5.8.9.1 Trouble Symptoms
When a Cache DNS server sends a request to a roaming partner, while the roaming partner is updating the NS records without updating the related Glue records together, the Cache DNS server probably receives a negative answer.
5.8.9.2 Locating Fault
- Control the negative cache TTL locally using the parameter max-ncache-ttl (default value: 10,800 s; recommended
value: 60 s).
IPWorks>modify dnsserver dns1 \
-add option="max-ncache-ttl 60"
Working on 1 object(s).
1 object(s) were updated.IPWorks> update dnsserver
- Flush the stored local cache.
#rndc flush
- Send the query to the roaming partner again.
- Contact the roaming partner to update the related Glue Records if the problem exists, and IPWorks DNS resumes the query automatically when the roaming partner updates the Glue Records.
5.8.9.3 Confirming Solution
Not applicable.
5.8.10 External Clients Are Unable to Query the Server
5.8.10.1 Trouble Symptoms
The external clients cannot query the server.
5.8.10.2 Locating Fault
- Check to assure general connectivity first by using ping and other tools.
- Check for delegation errors or missing delegations between any child zone and its parent zone. If appropriate address (A and AAAA) and server (NS) records are not set properly in the parent zone, there is no way for external DNS clients to find the authoritative servers.
- Check for allow-query or black hole options that may be preventing access to external clients.
5.8.10.3 Confirming Solution
Not applicable.
5.8.11 Dynamic DNS Update Failed
5.8.11.1 Trouble Symptoms
When users try to perform dynamic DNS update, the update fails.
5.8.11.2 Locating Fault
Check that the master zone configuration allows updates. Verify that each dynamic zone (both forward and reverse) includes an allow-update option with an IP address value that includes the IP address of the DHCP server or other DDNS update clients.
It is recommended that users use TSIG for dynamic updates. In this case, make sure that the TSIG keys are the same and that the server security allows updates through the desired TSIG keys.
5.8.11.3 Confirming Solution
Not applicable.
5.8.12 Authoritative Server for Dynamic Zone Crashes
5.8.12.1 Trouble Symptoms
The DNS Server may crash if a zone file is modified for a dynamic zone while the DNS Server is running.
5.8.12.2 Locating Fault
To change a dynamic zone manually, the user must use the following procedure:
- Use the ipw-ctr to stop the DNS Server.
- Wait for the server to exit.
- Delete the zone .jnl file. The path of the file is /etc/ipworks/dns. Removing the .jnl file is critical because the manual edits are not present in the journal, rendering it inconsistent with the contents of the zone file.
- Edit the zone file.
- Use the ipw-ctr to start the DNS Server.
- Note:
- If the journal file is deleted, all the dynamic data will be lost next time the server is restarted.
5.8.12.3 Confirming Solution
Not applicable.
5.8.13 Rename the DNS Server
5.8.13.1 Trouble Symptoms
If user wants to rename the DNS Server, a friendly message is displayed. For example:
IPWorks> modify dnsserver dns1 -set name=dns2
Working on 1 object(s).
DnsServers
name cannot be renamed.
No object(s) were updated.
5.8.13.2 Locating Fault
This is because there are too many things such as view, key, acl, masterzone in the dnsserver. The dnsserver name is an important piece of information to maintain the relationship for the related objects. If there are thousands of data existing in DB, it takes long time to finish this. For all the related masterzone objects, zoneid is changed. It is ambiguous whether the records in this zone should be changed as well when a zoneid is changed. So it should be a prevented activity and a friendly error message is given.
- Note:
- If the user must change the name of the DNS Server, the user has to delete the dnsserver and all the related objects, then create everything again.
5.8.13.3 Confirming Solution
Not applicable.
5.9 ActiveSelect DNS Server
This section provides information on resolving problems with the IPWorks ActiveSelect DNS (also called ASDNS) and ActiveSelect DNS Monitor.
ActiveSelect DNS is an IPWorks specific feature that allows redundancy to be defined in the network, and allows the performance of complex load balancing than normally possible within the DNS protocol. ActiveSelect DNS is an extension to the IPWorks DNS Server. This extension makes DNS more dynamic when responding to queries. The IPWorks DNS Server with ActiveSelect DNS uses information sent to it from ActiveSelect DNS Monitors so that it can make more intelligent decisions about what information to include in a response.
5.9.1 Order of Returned Addresses Changes
5.9.1.1 Trouble Symptoms
The order of returned addresses might change with each query.
5.9.1.2 Locating Fault
ActiveSelect DNS results are dynamic and depend on the reported status and load of the resources and on statistics that are used to balance the load across the available resources.
By default, round robin is used to balance the load between resources.
5.9.1.3 Confirming Solution
Not applicable.
5.9.2 Address Is Displayed in Responses When the Resource Is Down
5.9.2.1 Trouble Symptoms
An address can appear in responses when the resource is down.
5.9.2.2 Locating Fault
To avoid this issue, try to avoid the following conditions:
- The resource is monitored periodically and this period may not have elapsed. Retry the query after the monitor interval has elapsed.
- All addresses for the domain name are down, ActiveSelect DNS can be configured either to return all addresses, only site addresses or no addresses.
- The query is not directed to an authoritative server. The answer may be from cached data. Reduce the TTL of the ActiveSelect DNS enabled domain name or direct queries to an authoritative server.
- The monitoring script (provided by customer) is reporting the incorrect status information to the monitor. This can be caused by a coding error in the script.
- The query was sent from a client that did not have access to the DNS View where ActiveSelect DNS was enabled for the domain name. Verify the ACL list on each DNS View and which views ActiveSelect DNS have enabled for the name.
5.9.2.3 Confirming Solution
Not applicable.
5.9.3 Address Does Not Appear in Responses When Resource Is Up
5.9.3.1 Trouble Symptoms
An address may not appear in responses when the resource is up.
5.9.3.2 Locating Fault
To avoid this issue, try to avoid the following conditions:
- The resource is monitored periodically and this period may not have elapsed. Retry the query after the monitor interval has elapsed.
- The configuration on the ActiveSelect Sites may be limiting the number of addresses returned.
- Prefer Statements may be filtering the addresses that are returned based on the source of the query.
- The query is not directed to an authoritative server. The answer may be from cached data. Reduce the TTL of the ActiveSelect DNS enabled domain name or direct queries to an authoritative server.
- The DNS Server is overloaded and unable to process ActiveSelect DNS Monitor updates.
- The ActiveSelect DNS monitors may not be configured to report status on all the addresses for the domain name.
- The ActiveSelect DNS Monitor and DNS Server are using different TSIG keys or have a too large time drift.
- The ActiveSelect DNS Monitor may be considering dependencies when determining if an address is up and this may be down.
- The monitoring script is reporting the incorrect status information to the monitor. This can be caused by a coding error in the script.
- The query was sent from a client that did not have access to the DNS View where ActiveSelect DNS was enabled for the domain name. Verify the ACL list on each DNS View and which views ActiveSelect DNS have enabled for the name.
5.10 ENUM Server
This section provides information on resolving problems with the IPWorks ENUM Server.
The IPWorks ENUM Server provides mapping from telephone numbers to domain names or SIP URIs that can be used to route a call.
For more information on concepts of ENUM management, refer to the Section ENUM Management of IPWorks Configuration Management. For more information on ENUM configuration, refer to the Section Configuring ENUM of Configure DNS and ENUM.
5.10.1 ENUM Server Connectivity Errors
5.10.1.1 Trouble Symptoms
A client failed to connect to one or more ENUM servers.
The ENUM server does not respond to requests.
5.10.1.2 Locating Fault
Use the following methods to locate the fault:
- Use the ping command to see whether the communication between the client and the server exists.
- Use the ping command to see whether the communication between iENUM and eENUM exists.
- Check the configuration of the client.
- Use the netstat -r command to check the router settings.
- Use the ECLI interface to check the ENUM server configuration. For example, check the port number.
- Use dig (see Section 2.1.9) command or another query command to query the server.
5.10.1.3 Confirming Solution
The ENUM server can get successful reply.
5.10.2 Failed to Stop/Start/Restart ENUM Server by ipw-ctr
5.10.2.1 Trouble Symptoms
The ENUM Server works normal, but it cannot be started/stopped/restarted by using ipw-ctr.
The question can be showed by following exampled procedure:
- Check the ENUM Server status.
# ps -ef|grep enum
root 7234 1 0 Oct27 ? 01:10:58 /opt/ipworks/enum/bin/ipwenum root 29996 16881 0 16:52 pts/1 00:00:00 grep enum
- Check the ENUM Server status by using ipw-ctr.
# ipw-ctr status enum pl-3
safSu=PL-3,safSg=NWA,safApp=ERIC-sv.SVENUM saAmfSUAdminState=UNLOCKED (1) saAmfSUOperState=DISABLED(2) saAmfSUPresenceState=UNINSTANTIATED(1) saAmfSUReadinessState=OUT-OF-SERVICE(1)
The output shows that the ENUM Server is stopped or out of service.
- Under this condition, execute the command to restart ENUM
by using ipw-ctr.
# ipw-ctr restart enum pl-3
Stop enum ==> success. Start enum ==> failed!
The output shows that the restart is failed.
- Check the status of ENUM Server again.
# ps -elf | grep enum
root 7234 1 0 Oct27 ? 01:10:58 /opt/ipworks/enum/bin/ipwenum root 29996 16881 0 16:52 pts/1 00:00:00 grep enum
5.10.2.2 Locating Fault
Use amf native command to repair the fault by the following exampled procedure on PL-3:
- Repair ENUM AMF status.
# ipw-ctr repaired enum PL-3
- Execute the amf commands.
# amf-adm lock-in safSu=PL-3,safSg=NWA,safApp=ERIC-sv.SVENUM
- Check the ENUM Server status.
# ps -elf | grep enum
0 S root 3424 16881 0 80 0 - 1433 pipe_w 17:14 pts/1 00:00:00 grep enum 4 S root 30293 1 0 80 0 - 154894 futex_ 16:52 ? 00:00:05 /opt/ipworks/enum/bin/ipwenum
The ENUM process is running.
- Check the ENUM Server status again.
# ps -elf | grep enum
0 R root 3560 16881 0 80 0 - 1433 - 17:15 pts/1 00:00:00 grep enum
The ENUM process is stopped.
5.10.2.3 Confirming Solution
- Start ENUM Server by using ipw-ctr.
# ipw-ctr start enum PL-3
Start enum ==> success
- Check the ENUM Server status.
# ps -elf | grep enum
4 S root 4402 1 1 80 0 - 180389 futex_ 17:17 ? 00:00:00 /opt/ipworks/enum/bin/ipwenum 0 S root 4562 16881 0 80 0 - 1433 pipe_w 17:17 pts/1 00:00:00 grep enum
The ENUM process is running.
- Check the status of ENUM Server by using ipw-ctr.
# ipw-ctr status enum pl-3
enum on PL-3 is running
5.10.3 Error Responses to ENUM Requests
The operator can check the Rcode field to determine the problem reason when ENUM server returns an error response.
|
Rcode Field |
Possible Cause |
Solution |
|---|---|---|
|
1 (Format Error) |
The ENUM request is incorrect or contains a syntax error. |
Try sending the request again in case it was corrupted during the transmission. |
|
2 (Server Failure) |
Use the show command in the ndb_mgm tool to check the status of the NDB Cluster. | |
|
3 (Name Error) |
The query is below an equipped ENUM zone but the specific domain name is not provisioned in the database. |
Make sure the specific domain name is provisioned in the database. |
|
4 (Not Implemented) |
The ENUM server does not support the Opcode value in the request. |
|
|
16 (Bad Version) |
The request contains an OPT resource record with a non-zero version. |
5.10.4 Errors Related to ERH
The following table contains the error messages related to the ERH and provides the possible cause:
|
Error Message in Log |
Possible Cause |
Solution |
|---|---|---|
|
2008/11/27 10:07:46|AIN|stat|SSN 100 UserId 40 Instance 1 Bind confirmed Failure |
Wrong SSN or SPC has been configured, or SS7 stack has the wrong status. |
See Section 5.10.4.1 |
|
2008/11/27 10:34:19|AIN|warning|Received T_NOTICE with SSN 200 userId 40 DID 1, report casue No trans for Addr of such Nature |
No translation type has been mentioned or wrong translation type has mentioned. |
See Section 5.10.4.2 |
|
2014/07/29 21:18:27|ENUM+|Debug|not found dn 2014/07/29 21:18:27|ENUM+|Debug|not found dnrange 2014/07/29 21:18:27|ENUM+|Debug|sendto in. |
See Section 5.10.4.3 | |
|
2014/07/29 22:13:02|ENUM+|Warning|Invalid NPHandler has been used. |
See Section 5.10.4.4 |
5.10.4.1 Check SSN and SPC Configuration
To resolve this problem, do the following:
- Check the configuration of AINNode, MAPNode, and INAPNode in IPWCLI. The configuration of LocalSPC and LocalSSN must be the
same as SS7 stack which has been installed in the local machine.
If the configuration is inconsistent, correct the configuration of the objects either in IPWCLI or Signaling Manager.
- Set the NPSwitch field of AINNode, MAPNode, and INAPNode to 0, and wait for ENUM to unload the ERH module.
- Set the NPSwitch field of AINNode, MAPNode, and INAPNode to 1 , then try again.
- Check the status of SS7.
For information on how to check SS7 stack, refer to the section Verifying Stack Configuration in Configure SS7 for ENUM Number Portability.
5.10.4.2 Check Translation Type and GT
To resolve this problem, do the following:
- Check whether the value of translation type in IPWCLI is the same as the configuration of SS7 stack.
- Check whether GT has been configured in the SS7 stack. For details, refer to Reconfiguring SS7 Network, Creating and Defining GT Routing.
5.10.4.3 Check EnumDnRange Configuration
To resolve this problem, do the following:
- Check the configuration of EnumDnRange.
# ipwcli
IPWorks> list enumdnrange
For example:
[EnumDNRange 50 8652] enumZoneId: 50 viewId: 0 enumDnRange: 8652 scope: destNode: ldap updateLevel: 0 Working on 1 object(s). IPWorks>
The destNode must be ldap when this EnumDnRange is configured for the NP by LDAP.
For example:
IPWorks> modify enumdnrange <2.5.6.8...> -set destnode=ldap Working on 1 object(s). 1 object(s) were updated
5.10.4.4 Open ENUM LDAP Switch
To resolve this problem, do the following:
- Enter the ECLI.
# /opt/com/bin/cliss
> ManagedElement=<Node Name>,IpworksFunction=1,IpworksDnsRoot=1,IpworksEnumRoot=1,EnumServer=1,Erh=1
- Set the parameter ldap to true to open the ENUM LDAP switch.
(Erh=1>) configure
(config-Erh=1)> ldap=true
(config-Erh=1)> commit
5.10.5 NP Traffic Loss
5.10.5.1 Trouble Symptoms
When NP traffic starts, the system drops packets before sending them to the ENUM server. The number of dropped packets is equal to the lost on the client side.
5.10.5.2 Locating Fault
This may be caused by the small value of net.core.rmem_default, which is the default buffer size in FUnit bytes for receiving socket. The problem can be solved by resetting the net.core.rmem_default value to its maximum.
The maximum value can be retrieved by issuing the following command:
# sysctl -a|grep net.core
... net.core.rmem_max = <max value> ...
The default value can be increased by issuing the following command:
# sysctl -w net.core.rmem_default=<max value>
5.10.5.3 Confirming Solution
After configuring the value, check if any packet lost on the client side.
5.11 ENUM Front End
IPWorks ENUM Frond End (FE) is a component of data layered architecture (DLA), where application and user data are separated in different layers that are implemented in different network functional entities. The role of ENUM FE is to provide the application logic and enable ENUM server to access to CUDB instead of local NDB. CUDB is an extensible, high-performance, subscriber-centric database system, which communicates with IPWorks by LDAP protocol and SOAP protocol.
Figure 5 illustrates the architecture of ENUM FE:
ENUM Server implements a business logic layer. The data of IPWorks ENUM FE is on the CUDB. For traffic handling, ENUM FE queries the user data from the CUDB by LDAP protocol.
ENUM FE Sync implements the cache mechanism of ENUMDnRange and ENUMDnSched. Following list describes how the cache mechanism functions:
- ENUM FE Sync acts as SOAP server to handle SOAP notifications from CUDB when ENUMDnRange and ENUMDnSched provisioning.
- ENUM FE Sync caches ENUMDnRange and ENUMDnSched in local and re-caches them when they expire.
- ENUM FE Sync provides a method to manually refresh the cached ENUMDnRange and ENUMDnSched.
ENUM FE Configuration Pre-Check
Enable ENUM FE Function in ECLI:
To make ENUM FE functions, it must be enabled first:
- Log on to the ECLI interface on the SC.
# ssh <username>@<SC MIP Address> -t -s cli
- Configure the MO EnumFE.
>configure
(config)>dn ManagedElement=<Node Name>,IpworksFunction=1,IpworksDnsRoot=1,IpworksEnumRoot=1,EnumFE=1
(config-Erh=1)>enableEnumFE=true
(config-Erh=1)>commit
(config-Erh=1)>exit
- Restart ENUM server and ENUM FE Sync to make the change
takes effect.
# ipw-ctr restart enum <PL hostname>
# ipw-ctr restart fesync <PL hostname>
Make sure that all connections are available with databases (CUDB and local MySQL DB cluster).
If any DB connection related alarms are raised, follow the procedures described in the following alarm OPIs:
- ENUM, CUDB Node Failure
- ENUM, CUDB Site Failure
- ENUM, Server Lost Connections of DB
- ENUM FE Sync, All CUDB Connections Failure
- ENUM FE Sync, Server Lost Connections of DB
License is Valid:
Make sure that license for ENUM FE function is valid.
For details, see Section 5.14.1.
5.11.1 No LDAP Connection
5.11.1.1 Trouble Symptoms
The following error message is logged in the log file ipworks_enum.log:
"LDAPProvider::find ldapConnection is null!"
However, no related alarms are raised.
5.11.1.2 Locating Fault
This issue occurs when the connection configuration for ENUM server is not configured. An example is provided as below:
For example:
SC-X:~ # /opt/com/bin/cliss
>ManagedElement=<Node Name>,IpworksFunction=1,IpworksCommonRoot=1,DataBaseInfo=1,CudbManager=1,CudbServiceSite=ENUM,CudbSiteManager=1,CudbSite=<CudbSite Name>, CudbNode=<CudbNode Name>
(CudbNode=<CudbNode Name>)>show -v
CudbNode=<CudbNode Name> address="192.168.20.14" cudbNodeId="<CudbNode Name>" distinguishedName="" password="" poolSize=16 port=389 <default>
5.11.1.3 Confirming Solution
Check whether the same issue occurs after the configuration.
5.11.2 Server Fail in ENUM Response
5.11.2.1 Trouble Symptoms
The Rcode field of ENUM response is Server Fail.
5.11.2.2 Locating Fault
- When ENUM Zone in IPWorks does not match the ENUM record
in CUDB. Make sure that there is an ENUM Zone matching this query.
Each ENUM record in CUDB must match an ENUM Zone. Otherwise, it’s unavailable to ENUM Server. For NAPTR queries, only those can find valid zones in ENUM Server will be continued with the following ENUM processing. For example, two NAPTRs in CUDB:
fqdn=1.2.3.4.5.6.7.8.9.0.3.3.1.e164.iptelco.com fqdn=1.2.3.4.5.6.7.8.9.0.3.3.2.e164.iptelco.com
An EnumZone object e164.iptelco.com must be created:
IPWorks>create enumzone 1 -set enumzonename="e164.iptelco.com"
IPWorks>exit
- Make sure that DB connection is available, including CUDB and NDB. For details, see Make Connection Available.
- Make sure that ENUM server process is running by executing the command ps –ef|grep ipwenum.
5.11.2.3 Confirming Solution
Check whether the Rcode is still Server Fail after the configuration.
5.11.3 Failed to Cache ENUMDnSched to Local MySQL Cluster (for ENUM)
5.11.3.1 Trouble Symptoms
The following warning message is logged in ipwenum.log:
"Tuple already existed when attempting to insert"
5.11.3.2 Locating Fault
The same ENUMDnSched are cached into MySQL Cluster at the same time on both ENUM servers. The reason is, both ENUM servers receive the same ENUM query at the same time, then search in CUDB and cache fetched record to IPWorks MySQL Cluster, this results in that one success and other fail because this record is existed.
5.11.3.3 Confirming Solution
This is warning message, and there is no side effect for any ENUM FE function.
5.11.4 Failed to Cache ENUMDnSched to Local MySQL Cluster (for ENUM FE Sync)
5.11.4.1 Trouble Symptoms
The following error message is logged in the ipworks_fesync.log:
"enumDnSchedCache is disable"
5.11.4.2 Locating Fault
ENUM FE Sync receives an EnumDnSched SOAP message, but the switch enableEnumDnSchedCache is disabled.
- Log on to the ECLI.
# ssh <username>@<OAM IP Address> -t -s cli
- Enable EnumDnSched cache by configuring MO EnumFE.
>configure
(config)>dn ManagedElement=<Node Name>,IpworksFunction=1,IpworksDnsRoot=1,IpworksEnumRoot=1,EnumFE=1
(config-EnumFE=1)>enableEnumDnSchedCache=true
(config-EnumFE=1)>commit
(config-EnumFE=1)>exit
- Note:
- When the value of enableEnumDnSchedCache is set to false, all the cached EnumDnSched in the local will be removed.
- Restart the ENUM server and ENUM FE Sync to make the changes
take effect.
# ipw-ctr restart enum <PL hostname>
# ipw-ctr start fesync <PL hostname>
5.11.4.3 Confirming Solution
Check whether the same issue occurs after the configuration.
5.11.5 Failed to Refresh EnumDnRange
There are totally 4 typical cases about this chapter.
5.11.5.1 Case 1
5.11.5.1.1 Trouble Symptoms
When you perform manual refresh on EnumDnRange by using /opt/ipworks/enumfe/scripts/manual_refresh ENUMDnRange, you will receive the following error message:
"EnumDnRange is initialing."
5.11.5.1.2 Locating Fault
When ENUM FE Sync is starting, if there is no EnumDnRange in local MySQL cluster, ENUM FE Sync will get the EnumDnRange from CUDB and store it in the local MySQL Cluster. Meanwhile, ENUM FE Sync receives a command "Manual refresh EnumDnRange", then it will report this message "EnumDnRange is initialing.".
5.11.5.1.3 Confirming Solution
It is recommended that the manual refresh is performed after the initial.
5.11.5.2 Case 2
5.11.5.2.1 Trouble Symptoms
When you perform manual refresh on EnumDnRange by using /opt/ipworks/enumfe/scripts/manual_refresh ENUMDnRange, you will receive the error message by executing the following steps:
- Execute the following command:
PL-3:~ # /opt/ipworks/enumfe/scripts/manual_refresh ENUMDnRange
% Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 100 891 0 295 100 596 3908 7897 --:--:-- --:--:-- --:--:-- 7946 keep old enum dnrange in cache,enum dnrange refresh fail, detial reason refer to output.log - Execute the following command:
PL-3:~ # vi output.log
The error message is displayed as follows:
<?xml version='1.0' encoding='UTF-8'?><soapenv:Envelope xmlns:soapenv= "http://schemas.xmlsoap.org/soap/envelope/"><soapenv:Body><soapenv:Fault> <faultcode>soapenv:Server</faultcode><faultstring>Can't call rollback when autocommit=true</faultstring><detail /></soapenv:Fault></soapenv:Body></soapenv:Envelope> ~
5.11.5.2.2 Locating Fault
We need to delete the record in the DNRANGEEVENTHANDLE table since it has not rollback. The steps are as follows:
- Log in the database:
SC-1: /usr/local/mysql/bin/mysql -P 3307 --protocol=tcp
- Choose the ipworks database:
mysql> use ipworks;
- Query all the record(s) of the DNRANGEEVENTHANDLE table:
mysql> select * from DNRANGEEVENTHANDLE;
+----+----------------+ | id | eventhandletag | +----+----------------+ | 1 | 0 | +----+----------------+ 1 row in set (0.00 sec)
- Delete the record:
mysql> delete from DNRANGEEVENTHANDLE;
Query OK, 1 row affected (0.00 sec)
- Check if there is any record left in the DNRANGEEVENTHANDLE table:
mysql> select * from DNRANGEEVENTHANDLE;
Empty set (0.00 sec)
- Exit:
mysql> quit
5.11.5.2.3 Confirming Solution
It is recommended that the manual refresh is performed after the above action.
5.11.5.3 Case 3
5.11.5.3.1 Trouble Symptoms
When you perform manual refresh on EnumDnRange by using /opt/ipworks/enumfe/scripts/manual_refresh ENUMDnRange, you will receive the error message by executing the following command:
PL-3:~ # /opt/ipworks/enumfe/scripts/manual_refresh ENUMDnRange
The error message is displayed as follows:
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0curl:
(7) Failed to connect to 127.0.0.1 port 8080: Connection refused
keep old enum dnrange in cache,enum dnrange refresh fail, detial reason refer to output.log
|
5.11.5.3.2 Locating Fault
Execute the following command:
PL-3:~ # ipw-ctr status fesync
If the output is as one of the below items, you can conclude that the issue occurs when fesync in PL-3 is stopped, out of service or working as a standby node.
-
fesync in PL-3 is stopped or out of service. Detail info is as below: safSu=PL-3,safSg=2N,safApp=ERIC-sv.SVENUMFE saAmfSUAdminState=LOCKED-INSTANTIATION(3) saAmfSUOperState=ENABLED(1) saAmfSUPresenceState=UNINSTANTIATED(1) saAmfSUReadinessState=OUT-OF-SERVICE(1)
Or
fesync on PL-3 is running, working as a standby node
5.11.5.3.3 Confirming Solution
The command manual_refresh must be executed only on the PL with active fesync.
5.11.5.4 Case 4
5.11.5.4.1 Trouble Symptoms
When you perform manual refresh on EnumDnRange by using /opt/ipworks/enumfe/scripts/manual_refresh ENUMDnRange, you will receive the error message by executing the following steps:
- Execute the following command:
PL-3:~ # /opt/ipworks/enumfe/scripts/manual_refresh ENUMDnRange
% Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 100 891 0 295 100 596 1067 2157 --:--:-- --:--:-- --:--:-- 2167 keep old enum dnrange in cache,enum dnrange refresh fail, detial reason refer to output.log - Execute the following command:
PL-3:~ # vi output.log
The error message is displayed as follows:
<?xml version='1.0' encoding='UTF-8'?><soapenv:Envelope xmlns:soapenv= "http://schemas.xmlsoap.org/soap/envelope/"><soapenv:Body><soapenv:Fault> <faultcode>soapenv:Server</faultcode><faultstring>no available ldap connection </faultstring><detail /></soapenv:Fault></soapenv:Body></soapenv:Envelope>
5.11.5.4.2 Locating Fault
Refer to Section 5.11.1.2 of Section 5.11.1.
5.11.5.4.3 Confirming Solution
The command manual_refresh must be executed after the problem is solved.
5.11.6 Cannot Find ENUM Zone
5.11.6.1 Trouble Symptoms
The following error message is logged in ipworks_fesync.log:
"can't find enum zone!"
5.11.6.2 Locating Fault
When ENUM FE Sync receives an ENUMDnSched SOAP message, but there is no matched ENUM zone for this record.
Refer to the section Configuring EnumZone according to CUDB ENUM records in Configure DNS and ENUM.
5.11.6.3 Confirming Solution
Check whether the same issue occurs after the configuration.
5.12 Radius AAA Server
This section provides information on resolving problems with the IPWorks Radius AAA Server.
5.12.1 Radius AAA Server Process Not Running
5.12.1.1 Trouble Symptoms
Radius AAA sever processes cannot be started.
5.12.1.2 Locating Fault
Use any of the following methods to locate the fault:
- Use ipw-ctr to check Radius
AAA server status on SC.
# ipw-ctr status aaa_radius_stack <PL hostname>
# ipw-ctr status aaa_radius_backend <PL hostname>
# ipw-ctr status csvengine <SC hostname>
- Use the ps command on PL or SC to check if Radius AAA process is running.
- Check Radius AAA log to see if any error trace. For
example:
#cat /cluster/storage/no-backup/ipworks/logs/<PL hostname>/aaa_radius_stack.log* | grep -i error
#cat /cluster/storage/no-backup/ipworks/logs/<PL hostname>/aaa_radius_backend.log* | grep -i error
#cat /cluster/storage/no-backup/ipworks/logs/<SC hostname>/aaa_radius_csvengine.log* | grep -i error
5.12.1.3 Confirming Solution
Contact the support to fix the issue that is reported in Radius AAA error log.
5.12.2 Unreachable Radius Traffic
5.12.2.1 Trouble Symptoms
Radius AAA server cannot receive traffic from client when Radius AAA server is in operation.
5.12.2.2 Locating Fault
Use any of the following methods to locate the fault:
- Use ping to check the connection between client and server.
- Check the eVIP flow policy for Radius AAA.
- Check the instance of MO RadiusInterface to find out port numbers.
For example:
SC-X:~ # cliss >ManagedElement=<Node Name>,IpworksFunction=1,IPWorksAAARoot=1,⇒ IPWorksRadiusAAARoot=1,RadiusStack=1,RadiusInterface=1 (RadiusInterface=1)>show -v RadiusInterface=1 acctAddress="any" <default> acctPort=1813 <default> authAuthzAddress="any" <default> authAuthzPort=1812 <default> dmCoaPort=3799 <default> localhostBindIPType=IPV4 <default> proxyAddress="any" <default> proxyBindIPType=IPV4 <default> proxyPortsNumEachPL=200 proxyStartPort=10000 <default> radiusInterfaceId="1"
- Check the instances of MO EvipFlowPolicy to see whether the Radius listening ports (1812, 1813, and
3799) are configured already.
For example:
>ManagedElement=1,Transport=1,Evip=1,EvipAlbs=1,⇒ EvipAlb=ipw_sig_sp,EvipFlowPolicies=1,EvipFlowPolicy=radius_port_1812 (EvipFlowPolicy=radius_port_1812)>show -v EvipFlowPolicy=radius_port_1812 addressFamily="ipv4" <obsolete> dest=<VIP_TRF_IP1> destPort="1812" evipFlowPolicyId="radius_port_1812" protocol="udp" soGrp=[] <empty> src=[] <empty> srcPort=[] <empty> targetPool="SIG_pools" usageState=IDLE <read-only>
Where: <VIP_TRF_IP1> represents the Radius AAA traffic eVIP address.
- Check the instance of MO RadiusInterface to find out port numbers.
5.12.2.3 Confirming Solution
Correct the configuration of eVIP policy flow in ECLI.
5.12.3 AAA Rejects Authentication or Authorization Request
5.12.3.1 Trouble Symptoms
Authentication or authorization request from Radius client is rejected by IPWorks AAA server.
Use tcpdump to capture the packet, you receive the following error message:
Reply-Message : fail to verify user password
For how to capture the packet, refer to Section 7 Appendix A: Example of PM, FM, LM, and AMF Logs.
5.12.3.2 Locating Faults
This issue occurs when the configurations of ShareSecret are not synchronized between ECLI and client.
- Check the value of ShareSecret
in a Radius client.
The actual procedure depends on the customer's environment. Details are out of the scope of IPWorks documents.
- Check the value of ClientSharedSecret in ECLI.
# ssh <Username>@<MIP_OAM_IP> Password: <Password> dn ManagedElement=1,IpworksFunction=1,IPWorksAAARoot=1,IPWorksRadiusAAARoot=1,⇒ RadiusStack=1,SharedSecretMgr=1,ClientSharedSecretMgr=1,ClientSharedSecret=1 (ClientSharedSecret=1)>show -v ClientSharedSecret=1 clientIPAddr=<Client IP> clientSharedSecretId="1" <default> sharedSecretValue=<Shared Secret Value> type=ALL <default>
- Ensure that the value of ShareSecret fetched in Step 1 is the same with the value of shareSecretValue fetched in Step 2.
5.12.3.3 Confirming Solution
Check whether you can send out the requests to the client successfully.
5.12.4 AAA Does Not Proxy Radius Message
5.12.4.1 Trouble Symptoms
Authentication or authorization request from Radius client is not forwarded to target server by IPWorks AAA server.
Use tcpdump to capture the packet, you find the radius message IPWorks AAA server received is not forwarded to other servers.
For how to capture the packet, refer to Section 7 Appendix A: Example of PM, FM, LM, and AMF Logs.
5.12.4.2 Locating Faults
This issue occurs when the configurations of proxy rule is not updated to PL-X from IPWCLI.
- Check the configure file /etc/ipworks/<AAA Server host, PL-x>/aaa_radius/aaa_realm.conf. exists in each blade which AAA is running on, for example:
cat /etc/ipworks/PL-3/aaa_radius/aaa_realm.conf
Exampled output:
[REALM] name=Ericsson.com striprealm=false access { destination={192.168.10.1} requestchecklist={( Service-Type = 1 || Service-Type = 2 ) && User-Password ? 1} replychecklist={( Service-Type = 1 || Service-Type = 2 )} requestchangelist={add:Framed-Protocol="1",delete:Service-Type="2"} replychangelist={add:User-Name="AAA-Test@Ericsson.com",add:Framed-Protocol="2",delete:Service-Type="2",replace:Reply-Message="'PAP authenticate successfully.':'Hello,user!'"} } accounting { destination={192.168.10.1} }The actual content depends on the environment of customer. Details are out of the scope of IPWorks documents.
- Check the AAA server configuration in IPWCLI, make sure
AAA server are created for each PL that AAA will be running.
#ipwcli
#list aaaserver
[AAAServer aaasrv1] Name: aaasrv1 Address: 169.254.100.3 [AAAServer aaasrv2] Name: aaasrv1 Address: 169.254.100.4
- Update the configured proxy and realm information is updated
to each blade that aaa server will be running.
#ipwcli
#update aaaserver
Result of performing an export is: Exported aaa realm Ericsson.com Updated the configuration Reload proxy realm configuration successfully Reload proxy realm configuration container successfully
5.12.4.3 Confirming Solution
Check whether AAA server can proxy the requests to the target server.
5.12.5 AAA Rejects EAP-AKA/SIM Authentication Request
5.12.5.1 Trouble Symptoms
IPWorks AAA server rejects the athentication request from Radius client.
Use tcpdump to capture the packet, you can find the following flow:
| ----- Access Request --> | | <---Access Challenge --- | | ------Access Request --> | | <------Access Reject --- |
For how to capture the packet, refer to Section 7 Appendix A: Example of PM, FM, LM, and AMF Logs.
5.12.5.2 Locating Faults
This issue occurs when the AAA cannot connect HLR, do the followings:
- Check the SS7 Stack in IPWorks AAA Server.
The actual output depends on the environment of customer. Details are out of the scope of IPWorks documents.
- Check the SS7 stack configuration by signal manager.
#/opt/sign/EABss7050/bin/signmgui -own.conf /opt/sign/etc/signmgr.cnf &
For more details, refer to section Configuring SS7 for Wi-Fi AAA in Configure SS7 for AAA.
- Check the SS7 configuration in Radius AAA by COMCLI.
>ManagedElement=ipworks_cba,IpworksFunction=1,IPWorksAAARoot=1,IPWorksRadiusAAARoot=1,RadiusAAAService=1,IWLANService=1,RadiusSS7Stack=1 (RadiusSS7Stack=1)>show -v RadiusSS7Stack=1 cpmAddress="ss7cafcpmaddress:6669" isdnNumber="1234567" isdnNumberNature=NOA_NATIONAL_SIGNIFICANT <default> nodeType=1 <default> numberOfAAAProcess=10 <default> numberOfBEInstance=10 originalSignalingPointCode=100 <default> radiusSs7StackId="1" sgsnAddress="192.168.10.13" useGT4CallingPartyAddress=false <default>
- Check the SS7 stack configuration by signal manager.
- Ensure that the Raidus AAA Server is connected to SS7 Stack successfully.
5.12.5.3 Confirming Solution
Check whether you can receive Access Accept from AAA Server.
5.13 EPC AAA Server
This section provides information on resolving problems with the IPWorks EPC AAA Server.
5.13.1 EPC AAA Server Process Not Running
5.13.1.1 Trouble Symptoms
EPC AAA sever processes cannot be started.
5.13.1.2 Locating Fault
Use any of the following methods to locate the fault:
- Use ipw-ctr to check EPC
AAA server status on SC.
# ipw-ctr status aaa_diameter <PL hostname>
- Use the command ps -ef | grep ipwa3d on PL to see if the EPC AAA server is running.
- Check EPC AAA log to see if any error trace. For example:
# /cluster/storage/no-backup/ipworks/logs/<PL hostname>/aaa_diameter_server.log
5.13.1.3 Confirming Solution
Contact the support to fix the issue that is reported in EPC AAA error log.
5.13.2 C-diameter Stack Not Running
For details, see Section 5.17 C-Diameter.
5.13.3 Ineffective Diameter over SCTP
5.13.3.1 Trouble Symptoms
The traffic of SCTP is down.
5.13.3.2 Locating Fault
Use the following methods to locate the fault:
- Use the netstat command to
see whether the connection between Diameter EPC AAA server and SS7
stack is established.
PL-X:~ # netstat –nap|grep 6669|grep beam
tcp 0 0 169.254.100.3:48576 169.254.100.3:6669 ESTABLISHED 4838/beam.smp
- Use the SS7 signaling manager to see whether the SS7 stack works normally. The procedure is the same as the SS7 configuration, refer to the Section Configuring SS7 for Diameter over SCTP in Configure SS7 for AAA.
5.13.3.3 Confirming Solution
Correct the SS7 Stack configuration. Refer to the Section Configuring SS7 for Diameter over SCTP in Configure SS7 for AAA.
Restart the C-Diameter Stack:
- Restart the C-Diameter Stack:
- List installed CDIA Service Unit (SU).
SC-X # cmw-status -v su|grep CDIA
safSu=PL-4,safSg=NWA,safApp=ERIC-sv.SVCDiameter
safSu=PL-3,safSg=NWA,safApp=ERIC-sv.SVCDiameter
- Restart CDIA SU one by one.
SC-X # amf-adm restart safSu=PL-4,safSg=NWA,safApp=ERIC-sv.SVCDiameter
SC-X # amf-state su all safSu=PL-4,safSg=NWA,safApp=ERIC-sv.SVCDiameter
SC-X # amf-adm restart safSu=PL-3,safSg=NWA,safApp=ERIC-sv.SVCDiameter
SC-X # amf-state su all safSu=PL-3,safSg=NWA,safApp=ERIC-sv.SVCDiameter
- List installed CDIA Service Unit (SU).
- Restart the EPC AAA Server:
SC-X:~ #ipw-ctr restart aaa_diameter PL-3
5.13.4 High failure ratio caused by discarding DERs
5.13.4.1 Trouble Symptoms
A High failure ratio is caused by discarding DERs.
5.13.4.2 Locating Fault
AAA server deals with DERs from UE and gets the response from HSS then replies DEAs to UE correctly.
AAA server can deal with DERs in the case of different sessionid from UE.
When UE sends DERs with same sessionid to IPWorks, AAA server deals with only the first DER and deletes the remaining requests, then sends DEAs "DIAMETER_UNABLE_TO_COMPLY". It is a normal working way for AAA server.
In this case, if the AAA server can not deal with the first DER from UE, please report the problem to the maintenance support through a CSR.
5.14 License Problems
This section describes licenses related troubleshooting cases.
5.14.1 License Control Problem
5.14.1.1 Trouble Symptoms
When the user creates an ENUMDNSCHED object, the operation is rejected.
5.14.1.2 Locating Fault
The ENUMDNSCHED object is controlled by ENUMDNSCHED Capacity license. When any problem in license control happens, the output " License exception detected: <Fault Reason>" is shown in the ipwcli. See Table 11 for details.
When the problem happens, the specific server might receive license-related alarms, such as License Management, License Key Not Available, License Management, Capacity Usage Threshold Reached.
|
Fault Reason |
Locating Fault | |
|
1 |
The license key file used by LM is not available. For details, refer to License Management, Key File Fault. |
>ManagedElement=<Node Name>,SystemFunctions=1,Lm=1 >show lmState Check whether the output is LOCKED. If so, refer to License Management, Key File Fault. |
|
2 |
The operation mode in the current version of License Manager is not supported by JavaOaM. |
Collect the cmw-repository-list and contact Ericsson support. |
|
3 |
License is expired. Please update license. For details, refer to License Management, License Key Not Available |
>ManagedElement=<Node Name>,SystemFunctions=1,Lm=1 >show all CapacityKey=<Id> expiration="<yyyy-mm-dd>" keyId="FAT1023219/2" Check whether the license has expired through the expiration attribute. If the license has expired, refer to License Management, License Key Not Available. |
|
4 |
The requested license is not yet available for use. It will become valid in the future. |
>ManagedElement=<Node Name>,SystemFunctions=1,Lm=1 >show all CapacityKey=<Id> keyId="FAT1023219/2" validFrom="<yyyy-mm-dd>" Check whether the date in validFrom is reached. If the day and time is not reached, refer to License Management, License Key Not Available, or wait for the license to be available. |
|
5 |
The requested licensed capacity cannot be used because the corresponding license keys are unavailable. |
>ManagedElement=<Node Name>,SystemFunctions=1,Lm=1 >show all Check whether there is CapacityKey=<Id> with keyId="FAT1023219/2" in the output. If no, refer to License Management, License Key Not Available. |
|
6 |
|
>ManagedElement=<Node Name>,SystemFunctions=1,Lm=1
>show all
CapacityKey=<Id>
licensedCapacityLimit
value=<LmCapacityValue>
Check whether the number of provisioned ENUMDNSCHED is equal or larger than the value. If so, refer to License Management, Capacity Usage Threshold Reached. |
|
7 |
|
Step 1: >ManagedElement=<Node Name>,SystemFunctions=1,Lm=1 >show all Check whether there is CapacityKey=<Id> with keyId="FAT1023219/2" in the output. |
|
Step 2: Check whether the file /etc/ipworks/root_cert.cfg exists or corrupted. If so, contact Ericsson support. | ||
|
Step 3: >ManagedElement=<Node Name>,SystemFunctions=1,Lm=1 >publishLicenseInventory If the result is “ERROR: Call command failed, error code: ComNotExist”. Use the following command: amf-adm unlock safSu=SC-1,safSg=2N,safApp=ERIC-lm.server.aggregation amf-adm unlock safSu=SC-2,safSg=2N,safApp=ERIC-lm.server.aggregation | ||
|
Step 4: If the fault cannot be located by the above methods, collect the log under /cluster/storage/no-backup/ipworks/logs/SC-1/ipworks_ss_SC-1.log and /cluster/storage/no-backup/ipworks/logs/SC-2/ipworks_ss_SC-2.log. | ||
|
8 |
Software issue, restart Storage Server to fix the issue. |
Software fault, restart SS. pw-ctr stop ss <active SC> |
5.14.1.3 Confirming Solution
After applying corresponding methods to resolve the issues, check whether the license is available. For details, refer to View License Information.
5.14.2 Clear the Emergency Unlock Alarm
5.14.2.1 Trouble Symptoms
"Emergency Unlock Reset Key Required" alarm is raised by IPWorks.
5.14.2.2 Locating Fault
Emergency Unlock mode is NOT supported by IPWorks LM component. If Emergency Unlock mode is activated by mistake, the "Emergency Unlock Reset Key Required" alarm will be raised by LM.
5.14.2.3 Confirming Solution
Make sure the license key is existed.
To clear the alarm "Emergency Unlock Reset Key Required", run the following command:
SC-1:~ # ntfsend -s 0 -c 193,6,0 -n "lmId=1" -N "lmId=1" -a "" -p 74 -e 16384
5.15 MySQL NDB Cluster
This section describes NDB Cluster troubleshooting cases.
5.15.1 SQL Node Not Started
5.15.1.1 Trouble Symptoms
The following example shows an error message after executing the command: /etc/init.d/ipworks.mysql show-status. This output indicates that one of SQL Nodes is not started.
[...] [mysqld(API)] 24 node(s) id=3 (not connected, accepting connect from any host) [...]
5.15.1.2 Locating Fault
This issue occurs because Data Node has not been started completely. Check Data Node status by using /etc/init.d/ipworks.mysql show-status, following is a sample output:
[ndbd(NDB)] 2 node(s) id=27 @169.254.100.1 (mysql-5.6.27 ndb-7.4.8, starting, Nodegroup: 0, *) id=28 @169.254.100.2 (mysql-5.6.27 ndb-7.4.8, starting, Nodegroup: 0) [...]
ndbd (NDB), id=27 and id=28 show that the status of Data Node is starting. When any Data Node is no longer starting, which means Data Nodes are started completely, then the SQL Node can be started successfully.
5.15.1.3 Confirming Solution
After the Data Nodes are started completely, start the SQL Node and check whether SQL Node is started successfully.
The following output indicates that both the SQL Nodes are started.
[...] [mysqld(API)] 24 node(s) id=3 @169.254.101.1 (mysql-5.6.27 ndb-7.4.8) id=4 (not connected, accepting connect from SC-2) [...]
5.15.2 Management Node Down
5.15.2.1 Trouble Symptoms
The Storage Server, MySQL Cluster node Unreachable might be raised when the Management Node is down.
5.15.2.2 Locating Fault
To check if the Management Node is down, use either of the following ways:
- Method 1: Checking the Management
Node status:
# ps -ef | grep ndb_mgmd
root 29963 1 0 09:13 ? 00:00:00 /opt/ipworks/mysql/mysql/sbin/ndb_mgmd -f /home/ipworks/mysql/confs/ipworks_mgm.conf --initial
If there is not pid displayed in the output for the Management Node, it means that the Management Node is down.
- Method 2: Checking the Management
Node Status
# /etc/init.d/ipworks.mysql show-status
If it displays message as followings, it means that the Management Node is down.
For example,
Unable to connect with connect string:
nodeid=0,localhost:1186 Retrying every 5 seconds. Attempts left: 2 1, failed.
To fix the issue, preform the following steps based on the status of Data Node and SQL Node:
- If the Data Node and SQL Node are running, start the
Management Node by script.
# /etc/init.d/ipworks.mysql start-mgmd
- If the Data Node and SQL Node are down, start the Management
Node, Data Node, and SQL Node in sequence.
# /etc/init.d/ipworks.mysql start-mgmd
# /etc/init.d/ipworks.mysql start-ndbd
# /etc/init.d/ipworks.mysql start-sqlnode
5.15.2.3 Confirming Solution
After performing the solution, check whether the Management Nodes are started through ps -ef | grep ndb_mgmd or /etc/init.d/ipworks.mysql show-status.
5.15.3 Data Node Down
5.15.3.1 Trouble Symptoms
The Storage Server, MySQL Cluster Node Unreachable might be raised when the Data Node is down.
5.15.3.2 Locating Fault
In some situations, the data node is down. Users need to start the data node manually by using ipworks.mysql script or must configure some files to avoid the data node down.
Check if there is any error log in /local/ipworks/mysql-cluster/datanode/ndb_<id>_out.log.
The users need to start the data node manually by using ipworks.mysql script, see Section 5.15.3.2.1.
5.15.3.2.1 Starting Data Node
To troubleshoot the issues caused by the Data Node down, perform one or all the following steps:
- Check whether the Data Node is down and start the Data
Node by using ipworks.mysql script.
# /etc/init.d/ipworks.mysql show-status
If the status of Data Node is displayed like the following, it means that the Data Node (id=27) is down:
[ndbd(NDB)] 2 node(s) id=27 (not connected, accepting connect from SC-1)
If the Data Node (id=27) is down, use the following command to start it.
# /etc/init.d/ipworks.mysql start-ndbd
- Check if the issue is caused by the data node memory size problem and fix the specific issues according to Section 5.15.3.2.2.
5.15.3.2.2 Data Node Memory Size Problem
The memory size value of Data Node depends on the size of IPWorks application need. Large Data requires large memory size of Data Node.
Too small memory size also causes several problems, such as ENUM Server or Data Node cannot be started successfully, slow response speed of machine.
Users can adjust the data node memory in /home/ipworks/mysql/confs/ipworks_mgm.conf.
5.15.4 SQL Node Down
5.15.4.1 Trouble Symptoms
The Storage Server, MySQL Cluster Node Unreachable might be raised when the Management Node is down.
5.15.4.2 Locating Fault
To troubleshoot the issues caused by the SQL Node down, do the following:
- Check whether the SQL Nodes on SC-1 and SC-2 are started and fix the issue described in Section 5.15.1.
- Log on SC-1.
# ssh <Username>@<SC-1 or SC-2 IP Address>
- Check whether the accessing privilege is granted to the
NDB.
#/usr/local/mysql/bin/mysql \
-P 3307 -h localhost --protocol=tcpmysql>select user, host from mysql.user;
Check the output to see if <SS OAM IP Address> is assigned to the user. For example,
+------+-----------+ | user | host | +------+-----------+ | root | 127.0.0.1 | | root | ::1 | | | SC-1 | | root | SC-1 | | | ipw_ss | | | localhost | | root | localhost | +------+-----------+ 7 rows in set (0.01 sec)
The example shows that the privilege is assigned.
If an output shows that the privilege is not assigned, use the following commands to grant the privilege on the NDB side:
# /usr/local/mysql/bin/mysql \
-P 3307 -h localhost --protocol=tcpmysql> grant all privileges on *.* to ''@'ipw_ss'
- Repeat Step 2 to Step 3 to check the accessing privilege on SC-2.
5.15.4.3 Confirming Solution
After performing the solution, check whether the SQL Nodes are started through /etc/init.d/ipworks.mysql show-status.
5.15.5 MySQL NDB Cluster Status Abnormal
This section describes how to troubleshoot the issues caused by the abnormal status of MySQL NDB Cluster.
5.15.5.1 Trouble Symptoms
Table 12 lists the figures of the abnormal status of MySQL NDB Cluster.
|
Abnormal Status of NDB Cluster (1) |
Abnormal Status of NDB Cluster (2) |
|
Abnormal Status of NDB Cluster (3) |
Abnormal Status of NDB Cluster (4) |
5.15.5.2 Locating Fault
Table 13 lists the situations causing the abnormal status of MySQL NDB Cluster and provides the solutions to the issues.
|
Situation |
Solution |
Command |
|---|---|---|
|
The Management Node is stopped, and both of the Data Nodes are running (as shown in the Figure 1). |
Start the Management Node. |
# /etc/init.d/ipworks.mysql start-mgmd |
|
The Management Node is stopped, and only one of the Data Nodes is stopped or in the starting state (as shown in the Figure 2). |
Start the Data Node. |
# /etc/init.d/ipworks.mysql start-ndbd |
|
The Management Node and one of the Data Nodes are stopped(as shown in the Figure 3). |
Start the Management Node and the Data Node. |
# /etc/init.d/ipworks.mysql start-mgmd # /etc/init.d/ipworks.mysql start-ndbd |
|
Both of the Data Nodes are stopped or in the starting state (as shown in the Figure 4). |
Start the MySQL NDB cluster. |
# /etc/init.d/ipworks.mysql start-ndbcluster |
5.15.5.3 Confirming Solution
Users can check the status of MySQL NDB Cluster nodes by /etc/init.d/ipworks.mysql show-status. Figure 6 shows the normal status of NDB Cluster, the Management Node and both of the Data Nodes are running.
5.15.6 MySQL NDB Cluster Cannot Work Normally
This section describes how to recover the NDB Cluster by performing the initial start of the cluster.
5.15.6.1 Trouble Symptoms
The MySQL Cluster cannot work normally, and some serious error might occur. For example, mysql table is missing, Data Node cannot start.
5.15.6.2 Locating Fault
To recover the NDB Cluster, do the following:
- Stop Storage Server.
SC-1:~# ipw-ctr stop ss SC-1
SC-1:~# ipw-ctr stop ss SC-2
- Stop all the running ENUM server and AAA server.
SC-1:~# ipw-ctr stop enum <PL of running enum>
SC-1:~# ipw-ctr stop aaa-diameter <PL of running aaa diameter>
- Initial start NDB Cluster.
SC-1:~# /opt/ipworks/ss/scripts/init_ndb.sh
- Initialize the Storage Server.
SC-1:~# /opt/ipworks/ss/scripts/init_ss.sh
5.15.6.3 Confirming Solution
This issue is fixed when the operator can log on to the IPWCLI successfully.
5.15.7 SQL Node Start Failure with Wrong Folder Permission
5.15.7.1 Trouble Symptoms
When you try to start the SQL Node, it fails. Additionally, you receive with the following error message in the error log file /local/ipworks/mysql-cluster/sqlnode/sqlnode.err:
"Fatal error: Can't open and lock privilege tables: Table 'host' is read only"
This issue occurs because of the wrong permission for the SQL Node data related folders.
- Note:
- In the normal condition, the folder permission must not be changed. However, if the folder permission is changed, and this change causes the issue, the operator should follow this method to recover the startup of SQL Node.
5.15.7.2 Locating Fault
Check permission for the following folder, make sure that the permission for each level of the folder is 755:
/local/ipworks/mysql-cluster/sqlnode
Use the following command to change the folder permission:
# chmod 755 <folder name>
5.15.7.3 Confirming Solution
After the folder permission is changed to 755, the SQL Node can start successfully.
For more information on how to start SQL Node, refer to Configure MySQL NDB Cluster.
5.15.8 MySQL Data Lost on an SC
5.15.8.1 Trouble Symptoms
MySQL data is lost on an SC. This issue results in the abnormal work of the MySQL nodes.
5.15.8.2 Locating Fault
Log on to SC-1 and SC-2 respectively, and check whether MySQL data (located in /local/ipworks) is lost.
If the MySQL data on one of the SCs is lost (for example, SC-1), use the following way to recover the data:
- Stop all the MySQL Nodes on the SC that have the problem.
SC-1:~ # /etc/init.d/ipworks.mysql stop
- Recover the lost data on the SC.
SC-1:~ # /etc/init.d/ipworks.mysql recover
5.15.8.3 Confirming Solution
After the recovery operation is performed successfully, you can login the MySQL successfully, and all the data in /local/ipworks is restored.
5.16 Backup and Restore
This section provides information on resolving problems in backing up or restoring IPWorks data.
Backup handling enables the operator to schedule backups at periodic intervals, at a fixed time, or at one point of time. It provides a complete backup for all the configured and provisioned data or a partial backup for only the configured data for IPWorks, and it is possible to restore the system fully to a point of time from when the backup was taken or partially restore the system without the provisioned data as well.
There are several problems could cause backup and restore handling failed. First check the log file if the process has not been finished successfully. The detail error information should be recorded in the log file in /cluster/storage/no-backup/ipworks/logs/<hostname>/ipwbrf.log.
5.16.1 No Enough Space in the Disk
5.16.1.1 Trouble Symptoms
If there is not enough space there, backup or restore handling is failed.
5.16.1.2 Locating Fault
When backup or restore handling started, ensure that there is enough space in the disk, especially for the directory /cluster/ipwbrf and which stored the backup archived file. The tool df can be used to check the capability of disk space. See Section 2.1.11 for details.
5.16.1.3 Confirming Solution
Try the backup or restore operation again, check whether the operation is successful. For details, refer to Create Backup and Restore Backup.
5.16.2 Complete Backup or Restore Failed due to MySQL NDB Process Not Started
5.16.2.1 Trouble Symptoms
The backup or restore operation is failed if MySQL NDB process is not started.
5.16.2.2 Locating Fault
When the complete backup or restore operation is started, MySQL NDB is used to dump the database or restore the database. This issue occurs if the MySQL NDB process is not started or the MySQL NDB process is killed during backup or restore phase.
To fix this issue, do the following:
- Stop the MySQL NDB cluster.
# /etc/init.d/ipworks.mysql stop-ndbcluster
- Start the MySQL NDB cluser.
# /etc/init.d/ipworks.mysql start-ndbcluster
- Check the MySQL NDB cluster status.
# /etc/init.d/ipworks.mysql show-status
- Try to perform complete backup or restore again.
Refer to Create Backup and Restore Backup.
5.16.2.3 Confirming Solution
The complete backup or restore operation is successful.
5.16.3 Restart Server Failed
5.16.3.1 Trouble Symptoms
Even though the restore operation is completed successfully, certain processes does not start automatically.
5.16.3.2 Locating Fault
The restore handling would stop all the running IPWorks processes except MySQL process. These stopped processes will start automatically after the restore operation is completed. Sometimes, certain processes do not restart successfully. Try to start the process manually. If the process still cannot start, refer to service-specific troubleshooting (for example, Section 5.8)
5.16.3.3 Confirming Solution
The process starts.
5.16.4 Slow Backup or Restore Operation
5.16.4.1 Trouble Symptoms
The backup or restore operation takes more time (about 10 minutes) than expected.
5.16.4.2 Locating Fault
This issue occurs when one System Controller (SC) is down in an abnormal way (such as, the power outage).
This is a limitation of the CBA common component BRF-C.
To resolve the issue, start the SC that is down, and make sure that the startup performs successfully and the SC works normally.
- Note:
- If the SC is down in a normal way (such as, using poweroff command), the backup or restore will not be affected.
5.16.4.3 Confirming Solution
Check whether the operation is complete in a normal time.
5.17 C-Diameter
This section provides information on resolving problems with C-Diameter.
5.17.1 C-Diameter OperState is DISABLED
5.17.1.1 Trouble Symptoms
C-Diameter OperState is DISABLED and C-Diameter processes cannot be started.
5.17.1.2 Locating Fault
Use the following methods to locate the fault:
- Use cmw-status to check C-Diameter
status on any SC or PL.
# cmw-status -v su | grep -i CDIA -A 4
- Check the result to see if any abnormal state found.
If any error occurs, the command output is shown as below:
safSu=ERIC-CDIA-Runtime-1,safSg=ERIC-CDIA-SG,safApp=ERIC-CDIA-Runtime AdminState=UNLOCKED(1) OperState=DISABLED(2) PresenceState=TERMINATION-FAILED(7) ReadinessState=IN-SERVICE(2)
5.17.1.3 Confirming Solution
If the output shows that “ OperState=DISABLED”, it represents the C-diameter status is abnormal.
Repair the C-diameter stack on any SC or PL:
# amf-adm repaired safSu=PL-3,safSg=NWA,safApp=ERIC-sv.SVCDiameter
# amf-adm repaired safSu=PL-4,safSg=NWA,safApp=ERIC-sv.SVCDiameter
If the C-diameter stack is repaired successfully, the output of cmw-status will be shown:
safSu=ERIC-CDIA-Runtime-1,safSg=ERIC-CDIA-SG,safApp=ERIC-CDIA-Runtime AdminState=UNLOCKED(1) OperState=ENABLED(1) PresenceState=UNINSTANTIATED(1) ReadinessState=IN-SERVICE(2)
5.17.2 C-Diameter Stack Cannot Listen the Listening Port (3868)
5.17.2.1 Trouble Symptoms
C-diameter stack cannot listen the listening port (3868).
5.17.2.2 Locating Fault
Use the following methods to locate the fault:
- Use ps command to check process status
of C-Diameter on all PLs.
# ps -ef | grep DiaServer
root 8943 8769 0 18:19 pts/0 00:00:00 grep DiaServer root 9435 1 0 Jan22 ? 00:15:51 /opt/diacc/bin//DiaServer root 9490 9435 0 Jan22 ? 00:00:31 DSDTrace[local{1}](9435): /opt/diacc/bin//DiaServer - Use ps command to check process
status of IPWorks AAA on any PL.
# ps -ef | grep ipwa3d
root 7687 1 4 Jan22 ? 01:04:02 /opt/ipworks/aaa_diameter/bin/ipwa3d root 8019 7687 0 Jan22 ? 00:00:00 [ipwa3d] <defunct> root 8028 7687 0 Jan22 ? 00:00:00 [ipwa3d] <defunct> root 15940 8769 0 18:22 pts/0 00:00:00 grep ipwa3d
- Use the DiaDictManager command
to see whether the dictionary of Diameter exists on PLs.
# /opt/diacc/bin/DiaDictManager list
dictionary_sta dictionary_swm dictionary_s13 dictionary_sh dictionary_swx dictionary_s6b dictionary_ts29273
5.17.2.3 Confirming Solution
If any process information or dictionary is not found. Use the following method to repair the environment.
- If the dictionaries of Diameter are not installed, use DiaDictManager command to install them on all
PLs.
PL-X:~ # /opt/diacc/bin/DiaDictManager add /etc/ipworks/aaa_diameter/dict/ dictionary_ts29273
PL-X:~ # /opt/diacc/bin/DiaDictManager add /etc/ipworks/aaa_diameter/dict/*
After the directories are installed successfully, command output is shown as below:
PL-X:~ # /opt/diacc/bin/DiaDictManager list
dictionary_sh dictionary_s13 dictionary_s6b dictionary_sta dictionary_swm dictionary_swx dictionary_ts29273
- Restart the C-Diameter Stack.
- List installed CDIA Service Unit (SU).
SC-X # cmw-status -v su|grep -i CDIA
safSu=PL-4,safSg=NWA,safApp=ERIC-sv.SVCDiameter
safSu=PL-3,safSg=NWA,safApp=ERIC-sv.SVCDiameter
- Restart CDIA SU one by one.
SC-X # amf-adm restart safSu=PL-4,safSg=NWA,safApp=ERIC-sv.SVCDiameter
SC-X # amf-state su all safSu=PL-4,safSg=NWA,safApp=ERIC-sv.SVCDiameter
SC-X # amf-adm restart safSu=PL-3,safSg=NWA,safApp=ERIC-sv.SVCDiameter
SC-X # amf-state su all safSu=PL-3,safSg=NWA,safApp=ERIC-sv.SVCDiameter
- Restart the EPC AAA Server.
PL-X:~ #ipw-ctr restart aaa_diameter
- List installed CDIA Service Unit (SU).
5.18 Geographic Redundancy
This section provides information on resolving problems with Geographic Redundancy.
5.18.1 MySQL Replication for Geographic Redundancy Failed on One Site
5.18.1.1 Trouble Symptoms
When the alarm of MySQL Replication for Geographic Redundancy Failed appears on only one site, it means that the MySQL Replication have some problem on this node.
5.18.1.2 Locating Fault
5.18.1.2.1 Checking the AAANSDUser Data (For Non-SIM service)
The replicated AAA user data contains aaansduser, aaapolicy, aaauser, aaauser_policy, aaauser_groupname, and aaausergroup_policy, check if any of them on the two sites are different.
- Note:
- All other AAA user data is not replicated automatically, they also must be same on both sites.
Take AAANSDUser as example, check that if the AAANSDUser data on the two sites are different:
- Perform checksum on AAANSDUser on SC-1 or SC-2 of Site
A:
# mysql -P3307 -h ipw_sql --protocol=tcp -e "select sum(crc32(concat_ws(',', name, password, imsi, msisdn, apn, userstatus, certificateissuername, certificateid))) from ipw_prov_aaa.aaansduser;"
Record the output integer value as [CHECKSUM_A].
This command will perform about 30 s to show the output.
- Perform checksum on SC-1 or SC-2 of Site B:
# mysql -P3307 -h ipw_sql --protocol=tcp -e "select sum(crc32(concat_ws(',', name, password, imsi, msisdn, apn, userstatus, certificateissuername, certificateid))) from ipw_prov_aaa.aaansduser;"
Record the output integer value as [CHECKSUM_B].
If [CHECKSUM_A] equals to [CHECKSUM_B], it is almost certain that the tables are the same. There is no need to recovering data synchronization. Refer to Storage Server, The MySQL Replication for Geographic Redundancy Failed.
If [CHECKSUM_A] does not equals to [CHECKSUM_B], do the following steps in Section 5.18.1.2.3.
5.18.1.2.2 Checking ENUM User Data (For ENUM Service)
The replicated ENUM user data contains enumzone, enumview, enumzvrel, enumacl, destnode, enumdnrage, enumdnsched, check if any of them on the two sites are different.
- Note:
- All other ENUM user data is not replicated automatically, they also should be same on both sites.
Take ENUMZONE as example:
- Perform checksum on ENUMZONE on SC-1 or SC-2 of Site A:
# mysql -P3307 -h ipw_sql --protocol=tcp -e "select sum(crc32(concat_ws(',', id, enumzoneid, enumzonename, indefaultview, defaultttl))) from ipw_enum. ENUMZONE;"
Record the output integer value as [CHECKSUM_A].
This command will perform about 30 s to show the output.
- Perform checksum on SC-1 or SC-2 of Site B:
# mysql -P3307 -h ipw_sql --protocol=tcp -e "select sum(crc32(concat_ws(',', id, enumzoneid, enumzonename, indefaultview, defaultttl))) from ipw_enum. ENUMZONE;"
Record the output integer value as [CHECKSUM_B].
If [CHECKSUM_A] equals to [CHECKSUM_B], it is almost certain that the tables are the same. There is no need to recovering data synchronization. Refer to Storage Server, The MySQL Replication for Geographic Redundancy Failed.
If [CHECKSUM_A] does not equals to [CHECKSUM_B], do the following steps in Section 5.18.1.2.3.
5.18.1.2.3 Recovering Data Synchronization
All these following steps are performed on either SC-1 or SC-2, take AAANSDUser as example:
- Note:
-
- All AAANSDUser data on Site B will be erased and resynchronized to that on Site A.
- AAA User data mentioned above is stored in database ipw_prov_aaa and ENUM user data is stored in database ipw_enum. So, the mysql commands should be applied with corresponding database name and table name in different scenario.
- Stop AAANSDUser provision on both Site A and Site B.
- On both Site A and Site B, stop MySQL slave:
# mysql -P3307 -h ipw_sql --protocol=tcp -e “stop slave;”
- On Site A, dump AAANSDUser data:
# mysqldump -P3307 -h ipw_sql --protocol=tcp --no-create-info --opt ipw_prov_aaa.aaansduser > ~/aaansduser_dump.sql
- On Site A, transfer the SQL dump file to Site B.
# scp ~/aaansduser_dump.sql root@[OAM IP of SiteB]:~
- On Site A, reset MySQL slave:
# mysql -P3307 -h ipw_sql --protocol=tcp
- On Site B, delete aaansduser:
# mysql -P3307 -h ipw_sql --protocol=tcp -e “delete from ipw_prov_aaa.aaansduser;”
- On Site B, restore AAANSDuser data:
# mysql -P 3307 -h ipw_sql --protocol=tcp -f ipworks_prov_aaa < ~/aaansduser_dump.sql
- On Site B, record File and Position in the output of the following command as
[BINLOG_NAME_SITEB] and [BINLOG_POS_SITEB]:
# mysql -P 3307 -h ipw_sql --protocol=tcp -e “show master status;”
- On Site A, configure and start MySQL slave:
mysql> change master to master_host='<MIP of MySQL Cluster SQL Node in Site B>', master_log_file='<BINLOG_NAME_SITEB>', master_log_pos=<BINLOG_POS_SITEB>,master_user='ipworks',master_password='ipworks',master_port=3307, master_retry_count=86400,master_connect_retry=5; mysql> start slave; mysql> exit;
- On Site B, start MySQL slave:
# mysql -P 3307 -h ipw_sql --protocol=tcp -e “start slave;”
5.18.1.3 Confirming Fault
After MySQL Replication for Geographic Redundancy Failed alarm cleared, use Check steps Section 5.18.1.1 to verify if [CHECKSUM_A] equals to [CHECKSUM_B].
5.18.2 MySQL Replication for Geographic Redundancy Failed On All Sites
5.18.2.1 Trouble Symptoms
When the alarm MySQL Replication for Geographic Redundancy Failed appears on only all site, it means that the MySQL Replication have some problem on all site.
5.18.2.2 Locating Fault
5.18.2.2.1 Checking the AAANSDUser Data (For Non-SIM service)
Before recovery steps, you must check that if the AAANSDUser data on the two sites are different:
- Make sure the AAANSDUser data on the two sites are different.
- Perform checksum on AAANSDUser on SC-1 or SC-2 of Site
A:
# mysql -P3307 -h ipw_sql --protocol=tcp -e "select sum(crc32(concat_ws(',', name, password, imsi, msisdn, apn, userstatus, certificateissuername, certificateid))) from ipw_prov_aaa.aaansduser;"
Record the output integer value as [CHECKSUM_A].
This command will perform about 30 s to show the output.
- Perform checksum on AAANSDUser on SC-1 or SC-2 of Site
B:
# mysql -P3307 -h ipw_sql --protocol=tcp -e "select sum(crc32(concat_ws(',', name, password, imsi, msisdn, apn, userstatus, certificateissuername, certificateid))) from ipw_prov_aaa.aaansduser;"
Record the output integer value as [CHECKSUM_B].
If [CHECKSUM_A] equals to [CHECKSUM_B], the AAANSDUser data on the two sites are the same, it is almost certain that the tables are the same. So, there is no need to recovering data synchronization. For more detail, refer to Storage Server, The MySQL Replication for Geographic Redundancy Failed.
- Perform checksum on AAANSDUser on SC-1 or SC-2 of Site
A:
- Make sure perl DBI is installed.
# perl -e “use DBI;”
The output shall contain no error messages. If not, install perl-DBI and perl-DBD-mysql.
# cd /opt/ipworks/sqlnodemgr/scripts/
# rpm -i libmysqlclient18-10.0.11-6.4.x86_64.rpm perl-DBI-1.628-3.214.x86_64.rpm perl-DBD-mysql-4.021-7.178.x86_64.rpm
5.18.2.2.2 Recovering Data Synchronization for AAA User Data
If the replication of both directions are down, you can recover data synchronization by the following steps.
- Check AAANSDUser consistency on SC-1 or SC-2 of Site A.
# cd /opt/ipworks/sqlnodemgr/scripts/
# ./ipw-db-checker --mysqld1 h=ipw_sql:P=3307:u=root --mysqld2 h=[MIP prv of Site B]:P=3307:u=ipworks:p=ipworks --database ipw_prov_aaa --tables aaansduser
- Check the output.
If the following output is displayed, the data is consistent. No further operation is needed.
If the following output is displayed, the data is inconsistent. Continue with next step.
- If the result is inconsistent, two cli scripts are generated
under /tmp:
/tmp/sync_commands_for_sqlnode1_aaansduser.cli
The script contains commands that can make the data on Site A be the same as Site B.
/tmp/sync_commands_for_sqlnode2_aaansduser.cli
The script contains commands that can make the data on Site B be the same as Site A.
- Review and modify the scripts mentioned above according to the need.
- Execute the modified script
on SC-1 or SC-2 of Site A.
#ipwcli -user=[ipwcli User Name] -password=[ipwcli Password] /tmp/sync_commands_for_sqlnode1_aaansduser.cli
- Transfer the second
script to Site B.
# scp /tmp/sync_commands_for_sqlnode2_aaansduser.cli root@[OAM IP of Site B]:/tmp/
- Execute the modified script on SC-1 or SC-2 of Site B.
#ipwcli -user=[ipwcli User Name] -password=[ipwcli Password] /tmp/sync_commands_for_sqlnode2_aaansduser.cli
- Go back to step 2.
5.18.2.2.3 Checking the ENUM User Data and Radius User Data
- Stop the provisioning of user data.
- Stop the MySQL Slave on both sites.
# mysql -P 3307 --protocol=tcp -h ipw_sql
mysql> stop slave;
mysql> exit;
- Checking the data consistency.
# mkdir /tmp/db_checker # cp /opt/ipworks/common/bin/ipw-db-checker /tmp/db_checker # cp /opt/ipworks/common/etc/DbChecker.conf /tmp/db_checker # cd /tmp/db_checker # ./ipw-db-checker <MIP_PROV_IP of the other Site> <Database name needed to be checked>
- Note:
- ENUM user data is stored in database ipw_enum while Radius user data is stored in database ipw_prov_aaa.
For example:
./ipw-db-checker "10.175.171.76" "ipw_enum" Tables in ipw_enum is: DESTNODE;ENUMACL;ENUMDNRANGE;ENUMDNSCHED;ENUMVIEW;ENUMZONE;ENUMZVREL; Checking table DESTNODE start. connect to sqlnode1 ipw_sql:::ipw_enum:DESTNODE connect to sqlnode2 10.175.171.76:ipworks:ipworks:ipw_enum:DESTNODE reading data...please wait...finished comparing data...please wait...finished Checking table DESTNODE ------------------------------------------------ Consistent Checking table ENUMACL start. connect to sqlnode1 ipw_sql:::ipw_enum:ENUMACL connect to sqlnode2 10.175.171.76:ipworks:ipworks:ipw_enum:ENUMACL reading data...please wait...finished comparing data...please wait...finished Checking table ENUMACL ------------------------------------------------- Consistent ...
5.18.2.2.4 Recovering Data Synchronization for ENUM User Data and Radius User Data
If the checking result is inconsistent, sql files will be generated in path /tmp/db_checker.
For example:
SC-1:/#ls -l /tmp/db_checker
total 4935052 -rw-r--r-- 1 root root 1504 Jul 17 11:50 DbChecker.conf -rw-r--r-- 1 root root 14900 Jul 18 15:28 dbchecker.log -rwxr-xr-x 1 root root 7898108 Jul 17 12:56 ipw-db-checker -rwxr-xr-x 1 root root 7893923 Jul 17 11:48 ipw-db-checker_back -rw-r--r-- 1 root root 5032718815 Jul 18 15:28 sync_commands_for_sqlnode1.sql -rw-r--r-- 1 root root 5032718815 Jul 18 15:28 sync_commands_for_sqlnode2.sql
To synchronize the data between two sites, load the sync_commands_for_sqlnode1.sql in Site A and sync_commands_for_sqlnode2.sql in Site B.
- Synchronize the data in Site.
If sync_commands_for_sqlnode1.sql is not generated, then just skip this step.
- Login SC-1 or SC-2 in Site A
- Login SQL Node.
# mysql -P 3307 --protocol=tcp -h ipw_sql mysql> use ipw_enum; mysql> source /tmp/db_checker/sync_commands_for_sqlnode1.sql; mysql> exit;
- Note:
- If Radius user data is to be synchronized, execute use ipw_prov_aaa.
- Synchronize the data in Site B.
If sync_commands_for_sqlnode2.sql is not generated, then just skip this step.
- Login SC-1 or SC-2 in Site B.
- Login SQL Node.
# mysql -P 3307 --protocol=tcp -h ipw_sql mysql> use ipw_enum; mysql> source /tmp/db_checker/sync_commands_for_sqlnode2.sql; mysql> exit;
- Note:
- If Radius user data is to be synchronized, execute use ipw_prov_aaa.
- Changing Master-Host and Setting Binlog.
Refer to section Change Master-Host and Setting Binlog in IPWorks Geographic Redundancy.
5.18.2.3 Confirming Fault
After the alarm MySQL Replication for Geographic Redundancy Failed is cleared on all sites, use the checking steps in Section 5.18.2.2.1 to verify if [CHECKSUM_A] equals to [CHECKSUM_B].
5.19 Data Migration
The section is a quick troubleshooting guide for the data migration from HP to IPWorks 1.
5.19.1 Backup failed
5.19.1.1 Trouble Symptoms
"Error Copying Configuration files:…" is displayed.
5.19.1.2 Locating Fault
When the configuration file or folder specified in the rule file does not exist on current environment.
- Check whether the backup is running on the right Node,
SS or PS.
For example:
The DNS, ENUM, or other service configuration files will not be backed up, if the backup runs on SS.
- Otherwise, if the file or folder does not exist actually on current environment for the service, remove the file or folder from the rule file.
Redo the backup.
5.19.2 Required configuration files did not migrate from HP to IPWorks 1
5.19.2.1 Trouble Symptoms
- Configuration files are missing for backup.
- "Src file … does not exists." is displayed.
5.19.2.2 Locating Fault
If configuration files are not backed up, add them into ipw_service_backup_rule.csv.
Redo the backup.
5.19.3 Files missing in the migration process
5.19.3.1 Trouble Symptoms
"Dest file … does not exists." is displayed.
5.19.3.2 Locating Fault
The issue occurs, when the destination file is not configured correctly in the corresponding rule file.
Check the file name in the rule file, correct it and redo the migration steps.
5.19.4 Failed to import the netconf xml file to ECIM with netconf command
5.19.4.1 Trouble Symptoms
This issue occurs when import the netconf xml file into ECIM using netconf command.
5.19.4.2 Locating Fault
For details of NETCONF for importing the netconf configuration, refer to 5.3 Operation <edit–config> in Ericsson NETCONF Interface.
5.20 IPWorks Scaling
5.20.1 Unable Scale-In PL in ECLI
5.20.1.1 Trouble Symptoms
Take PL-5 as example.
When scale-in IPWorks in ECLI, the error No scale operation possible is reported.
>ManagedElement=1,SystemFunctions=1,SysM=1,CrM=1,ComputeResourceRole=PL-5 (ComputeResourceRole=PL-5)>configure (config-ComputeResourceRole=PL-5)>no provides (config-ComputeResourceRole=PL-5)>up (config-CrM=1)>commit ERROR: Transaction not committed due to validation errors Transaction validation failed! No scale operation possible, maintenance lock not available
5.20.1.2 Locating Fault
- Check PL-5 status.
>ManagedElement=1,SystemFunctions=1,SysM=1,CrM=1 (CrM=1)>ComputeResourceRole=PL-5 (ComputeResourceRole=PL-5)>show -v ComputeResourceRole=PL-5 adminState=UNLOCKED computeResourceRoleId="PL-5" instantiationState=INSTANTIATING <read-only> operationalState=DISABLED <read-only> provides="ManagedElement=1,SystemFunctions=1,SysM=1,CrM=1,Role=Default-Role" uses="ManagedElement=1,Equipment=1,ComputeResource=PL-5" <read-only>
- Remove PL-5 after the value of instantiationState changes to INSTANTIATED.
>ManagedElement=1,SystemFunctions=1,SysM=1,CrM=1,ComputeResourceRole=PL-5 (ComputeResourceRole=PL-5)>configure (config-ComputeResourceRole=PL-5)>no provides (config-ComputeResourceRole=PL-5)>up (config-CrM=1)>commit
5.20.1.3 Confirming Solution
If the problem still remains, contact next level of Ericsson support.
5.20.2 Failed to Start Scale-Out VM on KVM
5.20.2.1 Trouble Symptoms
On KVM Platform, it fails to start PL with following errors:
cluster1-b-2:~ # virsh start Scale1 error: Failed to start domain Scale1 error: monitor socket did not show up: No such file or directory
5.20.2.2 Locating Fault
- Restarts the service libvirtd to fix the error.
- If the issue is still there, check the service libvirtd
restart log, and find the bug in /etc/hosts.
cluster1-b-2:~ # service libvirtd status * libvirtd.service - Virtualization daemon Loaded: loaded (/usr/lib/systemd/system/libvirtd.service; enabled; vendor preset: enabled) Active: active (running) since Sun 2017-08-27 01:38:58 EDT; 3min 9s ago Docs: man:libvirtd(8) http://libvirt.org Main PID: 19509 (libvirtd) Tasks: 16 (limit: 512) CGroup: /system.slice/libvirtd.service `-19509 /usr/sbin/libvirtd --listen Aug 27 01:38:58 cluster1-b-2 libvirtd[19509]: 2017-08-27 05:38:58.102+0000: 19509: warning : virGetHostnameImpl:707 : getaddrinfo failed for 'cluster1-b-2': Name or service not known Aug 27 01:38:58 cluster1-b-2 systemd[1]: Started Virtualization daemon. Aug 27 01:39:05 cluster1-b-2 libvirtd[19509]: libvirt version: 2.0.0 Aug 27 01:39:05 cluster1-b-2 libvirtd[19509]: hostname: cluster1-b-2 Aug 27 01:39:05 cluster1-b-2 libvirtd[19509]: getaddrinfo failed for 'cluster1-b-2': Name or service not known
- Add host name cluster1-b-2 in /etc/hosts and restart service libvirtd.
5.20.2.3 Confirming Solution
If the problem still remains, contact next level of Ericsson support.
5.20.3 Unable Scale-Out PL for Core Middleware
5.20.3.1 Trouble Symptoms
After scale-out operation is taken by heat stack-update, the new PL-5 cannot be scaled out and the compute resource cannot be found in ECLI DN:
>ManagedElement=1,SystemFunctions=1,SysM=1,CrM=1
5.20.3.2 Locating Fault
Check the failure reason by following steps:
- Check both SCs /var/log/messages.
#grep -E "CSM|clustermonitor" /var/log/messages
You can find the SC-1 has more PL-5 scale-out log and related CSM log as below:
Sep 7 07:41:11 SC-1 clustermonitor: cmw-node-up received Sep 7 07:41:11 SC-1 clustermonitor: ClusterMonitorTimer::stop() Sep 7 07:41:11 SC-1 clustermonitor: ClusterMonitorTimer::start() called with timeout 140 Sep 7 07:43:31 SC-1 clustermonitor: ClusterMonitorTimer::stop() Sep 7 07:43:31 SC-1 clustermonitor: searchObjectNames error 12 Sep 7 07:43:31 SC-1 clustermonitor: send signal to thread-Scaleout, vector.size 1 Sep 7 07:43:31 SC-1 clustermonitor: got scaleout signal, starting scale out... Sep 7 07:43:31 SC-1 clustermonitor: ElasticEngine_Impl::scaleOut : node<PL-5> : not in cluster Sep 7 07:43:31 SC-1 clustermonitor: ElasticEngine_Impl::scaleOut : node<PL-5> : continue scale out operation Sep 7 07:43:31 SC-1 clustermonitor: scaleOut: maint_lock_cnt =1 Sep 7 07:43:31 SC-1 clustermonitor: successful to set state <1> for EE Sep 7 07:43:32 SC-1 clustermonitor: Create ComputeResourceRole object request Sep 7 07:43:32 SC-1 clustermonitor: ComputeResourceRole object Successfully created Sep 7 07:43:32 SC-1 clustermonitor: addNodeToScalingList <PL-5> Sep 7 07:43:32 SC-1 clustermonitor: CSM job started, EE-state=<1>. Sep 7 07:43:33 SC-1 clustermonitor: successful to set state <2> for EE Sep 7 07:43:33 SC-1 clustermonitor: successful to set state <3> for EE Sep 7 07:43:33 SC-1 clustermonitor: error, csm-apply, err <89> Sep 7 07:43:34 SC-1 clustermonitor: Calling /opt/csm/bin/csm-repair after /opt/csm/bin/csm-apply failure Sep 7 07:43:34 SC-1 clustermonitor: successful to set state <4> for EE Sep 7 07:43:34 SC-1 clustermonitor: error, /opt/csm/bin/csm-repair failed, rc <89> Sep 7 07:43:35 SC-1 clustermonitor: Delete ComputeResourceRole object request Sep 7 07:43:35 SC-1 clustermonitor: ComputeResourceRole object Successfully deleted Sep 7 07:43:35 SC-1 clustermonitor: clearScalingList Sep 7 07:43:35 SC-1 clustermonitor: successful to set state <0> for EE
If the error log can not provide enough information, go to next step to check clustermonitor log.
- Check SC clustermonitor log.
#cd /var/opt/coremw/clustermonitor
#cat clustermonitor.log
Setting CDF_CONFIGPATH to /tmp/tmp.a456B4Ng47 Updated unit SH/IPWRAD in directory /usr/lib/ericsson/cba/csm/plugin/SH-IPWRADStuff-SH_IPWRAD Updated unit SH/SS7CAF2 in directory /usr/lib/ericsson/cba/csm/plugin/SH-SS7CAF2 Updated unit SH/IPWDIA in directory /usr/lib/ericsson/cba/csm/plugin/SH-IPWDIAStuff-SH_IPWDIA Updated unit SH/CoreMW1 in directory /usr/lib/ericsson/cba/csm/plugin/SH-CoreMW1-CXC12345 Updated unit SH/CoreMW2 in directory /usr/lib/ericsson/cba/csm/plugin/SH-CoreMW2-CXC12345 Updated unit SH/IPWENUM in directory /usr/lib/ericsson/cba/csm/plugin/SH-IPWENUMStuff-SH_IPWENUM Updated unit SH/LDE in directory /usr/lib/ericsson/cba/csm/plugin/LDE_SH Updated unit SH/SS7CAF1 in directory /usr/lib/ericsson/cba/csm/plugin/SH-SS7CAF1 Updated unit SH/IPWDNS in directory /usr/lib/ericsson/cba/csm/plugin/SH-IPWDNSStuff-SH_IPWDNS ERROR exception caught <type 'exceptions.IndentationError'> File "/usr/share/ericsson/csm/repo/DT-CSM-DT_CSM/lib/python2.7/csm/csmapply.py", line 203, in <module> environments = CSMEnvironments) File "/usr/share/ericsson/csm/repo/DT-Cdf-DT_Cdf/lib/python2.7/cdf/clicommon.py", line 350, in loadPlugins for module in getPluginsInDirectory(pythonDir, filter, verbose): File "/usr/share/ericsson/csm/repo/DT-Cdf-DT_Cdf/lib/python2.7/cdf/clicommon.py", line 305, in getPluginsInDirectory module = imp.load_source("plugin%s" % (postfix), file) unexpected indent (csmplugin.py, line 57)
5.20.3.3 Confirming Solution
For this kind of issue, collect the log and then contact next level of Ericsson support.
5.20.4 Unable Scale-Out PL for SS7CAF
5.20.4.1 Trouble Symptoms
After scale-out operation is taken by heat stack-update, the new PL-6 can not scale-out and the compute resource can not be found in ECLI DN:
>ManagedElement=1,SystemFunctions=1,SysM=1,CrM=1
5.20.4.2 Locating Fault
Check the failure reason by following steps:
- Check both SCs /var/log/messages.
#grep -E "CSM|clustermonitor" /var/log/messages
May 18 13:56:00 SC-1 CSM: ss7caf_csm_plugin_scale_out: EXCEPTION: Command:"sudo /opt/sign/EABss7077/ss7caf_scaling.sh -t OUT -s PL-6" returned non zero exit code 1 May 18 13:56:00 SC-1 osafimmnd[6594]: NO Ccb 1006 COMMITTED (EquipmentOwner) May 18 13:56:00 SC-1 clustermonitor: successful to set state <3> for EE May 18 13:56:00 SC-1 clustermonitor: error, csm-apply, err <1> May 18 13:56:00 SC-1 osafimmnd[6594]: NO Ccb 1007 COMMITTED (EquipmentOwner) May 18 13:56:00 SC-1 clustermonitor: Calling /opt/csm/bin/csm-repair after /opt/csm/bin/csm-apply failure May 18 13:56:01 SC-1 CSM: scale in started May 18 13:56:01 SC-1 CSM: SH/SS7CAF1 started prepare step May 18 13:56:01 SC-1 CSM: ss7caf_csm_plugin_scale_in: SS7CAF Scale In plugin - prepare() called for use case ScaleIn May 18 13:56:01 SC-1 CSM: SH/SS7CAF1 finished prepare step May 18 13:56:01 SC-1 CSM: SH/IPW1 started prepare step May 18 13:56:01 SC-1 CSM: IPWorks-ScaleIn: IPWorks plugin - prepare() called for use case ScaleIn May 18 13:56:01 SC-1 CSM: SH/IPW1 finished prepare step May 18 13:56:01 SC-1 CSM: SH/EVIP started prepare step May 18 13:56:01 SC-1 CSM: SH/EVIP finished prepare step May 18 13:56:01 SC-1 CSM: SH/CoreMW1 started prepare step May 18 13:56:01 SC-1 CSM: CMW-scale_in: prepare uc: ScaleIn May 18 13:56:02 SC-1 CSM: CMW-scale_in: scale in node: PL-6 May 18 13:56:02 SC-1 clustermonitor: Received cluster update, Number of members in cluster=4 May 18 13:56:02 SC-1 clustermonitor: EE update node leave <PL-6>. May 18 13:56:02 SC-1 clustermonitor: searchObjectNames error 12 May 18 13:56:02 SC-1 clustermonitor: Failure seraching for <CmwMonitorImmCkptId=PL-6,CmwMonitorId=1,CmwSysConfigId=1> object May 18 13:56:02 SC-1 clustermonitor: searchObjectNames error 12 May 18 13:56:02 SC-1 clustermonitor: Successfully write 'downTime' for rdn <CmwMonitorImmCkptId=PL-6> : 1495086962 May 18 13:56:02 SC-1 clustermonitor: Node "safNode=PL-6,safCluster=myClmCluster" is no longer a member of cluster May 18 13:56:04 SC-1 CSM: CMW-scale_in: Clm node already locked May 18 13:56:07 SC-1 CSM: CMW-scale_in: exec: sudo /opt/coremw/lib/cmwmdf_gcc cleanup PL-6 May 18 13:56:07 SC-1 CSM: CMW-scale_in: scale-in node: PL-6 done May 18 13:56:07 SC-1 CSM: SH/CoreMW1 finished prepare step May 18 13:56:07 SC-1 CSM: SH/LDE started prepare step May 18 13:56:07 SC-1 CSM: LDE OS plugin - prepare called for use case ScaleIn (repair: True) May 18 13:56:07 SC-1 CSM: SH/LDE finished prepare step May 18 13:56:07 SC-1 CSM: SH/SS7CAF1 started perform step May 18 13:56:07 SC-1 CSM: ss7caf_csm_plugin_scale_in: SS7CAF Scale In plugin - perform() called for use case ScaleIn May 18 13:56:07 SC-1 CSM: ss7caf_csm_plugin_scale_in: model.yml file not found. Will run check based on SwM 1.0. May 18 13:56:07 SC-1 CSM: ss7caf_csm_plugin_scale_in: Checking that at least one SS7CAF payload is included in Scaling Domain... May 18 13:56:07 SC-1 CSM: ss7caf_csm_plugin_scale_in: PL-3 is in Scaling Domain([u'PL-6', u'PL-4', u'PL-3']) May 18 13:56:07 SC-1 CSM: ss7caf_csm_plugin_scale_in: Call /opt/sign/EABss7077/ss7caf_scaling.sh with the following args: -t IN -s PL-6 May 18 13:56:07 SC-1 systemd[1]: Starting Session c128 of user root.
5.20.4.3 Confirming Solution
For this kind of issue, collect the log and then contact next level of Ericsson support.
- Collect SS7CAF scaling log in PL-6.
#/opt/sign/log/ss7caf_scaling.log[<log number>]
- Collect SS7caf log by using SS7CAF tool. Execute the command
in PL-6.
#/opt/sign/EABss7049/bin/sysCollTool.sh
- Collect core middleware log.
Collect clustermonitor log in SC which report many CSM and clustermonitor log in /var/log/messages. The core middleware log is:
#/var/opt/coremw/clustermonitor/clustermonitor.log
5.20.5 AAA Cannot Start in Scale-Out PL
5.20.5.1 Trouble Symptoms
In the scale-out PL, AAA service cannot start. Take PL-5 as an example, PL-5 is a scale-out PL.
SC-1:/cluster # ipw-ctr status all | grep PL-5 -A20
on PL-5:
aaa_diameter need repair.
aaa_radius_stack need repair.
aaa_radius_backend need repair.
aaasm is running.
5.20.5.2 Locating Fault
Check the failure reason by following steps:
- Check the serviceType, ensure the serviceType includes
“AAA”.
SC-X:~ #/opt/com/bin/cliss
>ManagedElement=<Node Name>,IpworksFunction=1,IpworksCommonRoot=1
(IpworksCommonRoot=1)>show -v
IpworksCommonRoot=1 ipworksCommonRootId="1" serviceType="AAA" DataBaseInfo=1 StorageServer=1
- Ensure AAAServer=PL-5 exists under IPWorksAAACommonRoot.
>ManagedElement=<Node Name>,IpworksFunction=1,IPWorksAAARoot=1,IPWorksAAACommonRoot=1
(IPWorksAAACommonRoot=1)>show -v
IPWorksAAACommonRoot=1 ipworksAAACommonRootId="1" AAAServer=PL-3 AAAServer=PL-4 AAAServer=PL-5 AAAServerManager=1 GTConvertManager=1
If AAAServer=PL-5 doesn’t exist, the following procedures are needed on SC:
- Open a new file.
#vi /tmp/addAAAServer.sh
- Insert following content into /tmp/addAAAServer.sh. Note to change the aaaServer to a corresponding PL name.
#!/bin/bash aaaServer=PL-5 immcfg << EOF immcfg -u -c AAAServer aaaServerId=$aaaServer,ipworksAAACommonRootId=1,ipworksAAARootId=1 immcfg -u -c LogManagement logManagementId=1,aaaServerId=$aaaServer,ipworksAAACommonRootId=1,ipworksAAARootId=1 immcfg -u -c ThreadControlManager threadControlManagerId=1,aaaServerId=$aaaServer,ipworksAAACommonRootId=1,ipworksAAARootId=1 immcfg -u -c IPWorksLog logId=AAA_DIAMETER_SERVER,logManagementId=1,aaaServerId=$aaaServer,ipworksAAACommonRootId=1,ipworksAAARootId=1 immcfg -u -c IPWorksLog logId=AAA_RADIUS_BACKEND,logManagementId=1,aaaServerId=$aaaServer,ipworksAAACommonRootId=1,ipworksAAARootId=1 immcfg -u -c IPWorksLog logId=AAA_RADIUS_STACK,logManagementId=1,aaaServerId=$aaaServer,ipworksAAACommonRootId=1,ipworksAAARootId=1 immcfg -u -c ThreadControl processId=AAA_DIAMETER_SERVER,threadControlManagerId=1,aaaServerId=$aaaServer,ipworksAAACommonRootId=1,ipworksAAARootId=1 immcfg -u -c ThreadControl processId=AAA_RADIUS_BACKEND,threadControlManagerId=1,aaaServerId=$aaaServer,ipworksAAACommonRootId=1,ipworksAAARootId=1 immcfg -u -c ThreadControl processId=AAA_RADIUS_STACK,threadControlManagerId=1,aaaServerId=$aaaServer,ipworksAAACommonRootId=1,ipworksAAARootId=1 EOF - Execute the script.
#bash /tmp/addAAAServer.sh
- Open a new file.
- Repair the AAA.
#ipw-ctr repaired aaa_diameter PL-5
#ipw-ctr repaired aaa_radius_stack PL-5
#ipw-ctr repaired aaa_radius_backend PL-5
- Start the AAA
#ipw-ctr start aaa_diameter PL-5
#ipw-ctr start aaa_radius_stack PL-5
#ipw-ctr start aaa_radius_backend PL-5
5.20.5.3 Confirming Solution
Use ipw-ctr to get server status. The AAA services should be running.
SC-1:/cluster # ipw-ctr status all | grep PL-5 -A20
on PL-5:
aaa_diameter is running.
aaa_radius_stack is running.
aaa_radius_backend is running.
aaasm is running.
5.20.6 Restore User Backup in Superset Cluster
5.20.6.1 Trouble Symptoms
Restore in a superset cluster is used in scenarios where backup was taken in a smaller cluster than the current size of the cluster. Cluster has been scaled out after the backup was taken.
In this situation, the restore operation will be failed.
5.20.6.2 Locating Fault
You can do followings:
- Scale-in the IPWorks to remove the PLs which are not included in backup package.
- Restore user data with backup package.
- Scale out to desired PLs.
5.20.6.3 Confirming Solution
If the problem still remains, contact next level of Ericsson support.
5.20.7 Scale-Out Failure Triggers Scale-Out/Scale-In Cyclically
5.20.7.1 Trouble Symptoms
When scale-out PL-X failed because of incorrect configuration, CMW triggers automatic scale-in, but CMW dose not shutdown VM resource of PL-X. Then IPWorks continues "DHCP recovery", and triggers scale-out/scale-in cyclically. During a scale-in operation, LDE attempts to power off the node(s) being scaled in. This operation relies on ssh connectivity to the payload node, and should shutdown -h now remote command not succeed, there is a risk that the node will remain alive, with active TIPC and IP configuration but is no longer reachable by LDE or middleware. This is a limitation of LDE, details refer to section "Fencing during a scale in" in LDE Scaling User’s Guide.
For scale-out/scale-in cyclically, check SC-X /var/opt/coremw/clustermonitor/clustermonitor.log file.
For example:
SC-1:~ #grep -E 'addNodeToScalingList|hostname "PL-5"' /var/opt/coremw/clustermonitor/clustermonitor.log Dec 28 10:04:47.105663 clustermonitor [9869][../../../src/clmon/ClusterMonitorImm.cc:0633] IN addNodeToScalingList <PL-5> Deleting ComputeResource node with hostname "PL-5" Dec 28 10:10:23.318797 clustermonitor [9869][../../../src/clmon/ClusterMonitorImm.cc:0633] IN addNodeToScalingList <PL-5> Deleting ComputeResource node with hostname "PL-5" Dec 28 10:15:59.651078 clustermonitor [9869][../../../src/clmon/ClusterMonitorImm.cc:0633] IN addNodeToScalingList <PL-5> Deleting ComputeResource node with hostname "PL-5" |
5.20.7.2 Locating Fault
After scale-out failed, remove VM instance to fix the issue:
- For CEE, refer to section "Remove VM Instance" in IPWorks Scaling Guide for CEE.
- For KVM, refer to section "Remove VM Instance" in IPWorks Scaling Guide for KVM.
5.20.7.3 Confirming Solution
If the problem still remains, contact next level of Ericsson support.
5.21 IPWorks Deployment for KVM
5.21.1 Both SCs Cyclic Reboot after Deployment
5.21.1.1 Trouble Symptoms
On KVM Platform, after deployment, both SC cyclic reboot. The console log is as below:
[ 1302.862068] drbd drbd0: meta connection shut down by peer. [ 1449.584045] drbd drbd0: PingAck did not arrive in time. Starting NFS Mount Daemon... [ OK ] Started NFS Mount Daemon. Starting NFS Server... [ OK ] Started NFS Server. [ OK ] Created slice system-lde\x2dtftpd.slice. Starting LDE tftpd... [ OK ] Started LDE tftpd. Stopping ISC DHCPv4 Server... [ OK ] Stopped ISC DHCPv4 Server. Starting ISC DHCPv4 Server... [ OK ] Started ISC DHCPv4 Server. Starting LDE dumpd... [ OK ] Started LDE dumpd. [ OK ] Stopped LDE CSM update service. Starting LDE CSM update service... [ OK ] Started LDE CSM update service. [FAILED] Failed to start NTP Daemon. See "systemctl status lde-ntp.service" for details. Stopping NTP Daemon... [ OK ] Stopped NTP Daemon. Starting NTP Daemon... [FAILED] Failed to start NTP Daemon. See "systemctl status lde-ntp.service" for details. Stopping NTP Daemon... [ OK ] Stopped NTP Daemon. Starting NTP Daemon... [ OK ] Reached target Network is Online.
5.21.1.2 Locating Fault
This issue is mostly caused by disk performance issue. Try to suspend SC2 and start SC1 firstly:
- Suspend SC2.
# virsh suspend SC-2
- Wait until SC-1 startup successfully and SC-1 login can be launched.
- Resume SC2.
#virsh resume SC-2
- Check drdb status.
#cat /proc/drbd
5.21.2 Failed to Execute Scripts ipwInit.sh after a Re-deployment of IPWorks for KVM
5.21.2.1 Trouble Symptoms
The following warning message is logged when user executes scripts ipwInit.sh:
" CMW: ERROR (cmw-sdp-import): Already imported [ERIC-LmClientLibrary-CXP9022092_5-R2B30] (/cluster/lm/lm/ERIC-LmClientLibrary-CXP9022092_5-R2B30.sdp), Failed to import '/cluster/lm/lm/ERIC-LmClientLibrary-CXP9022092_5-R2B30.sdp' cmw-sdp-import /cluster/lm/lm/*.sdp execute failed, exit"
5.21.2.2 Locating Fault
The issue occurs when the qcow2 image on Host1 is not replaced by the original qcow2 image from the image package.
The following procedure is an example to fix this issue:
- Check the parameter QCOW2_DIR configured in ipwenv.conf.
# grep -r “QCOW2_DIR” /root/auto_deployment/kvm_deployment/config/ipwenv.conf
Example output:
#QCOW2_DIR QCOW2_DIR=/root/auto_deployment/images
- Stop VMs and remove image files on both Host1 and Host2.
- On Host1:
# virsh destroy SC-1 2>/dev/null
# rm /root/auto_deployment/images/ipw-sc-22.qcow2
- On Host2:
# virsh destroy SC-2 2>/dev/null
# rm /root/auto_deployment/images/ipw-sc-22.qcow2
- On Host1:
- Unzip the image package into /root/auto_deployment to get the qcow2 image on Host1.
# cd /root/auto_deployment
#tar -zxvf /root/19010-CXP9023809_2_Ux_<Revision Number>.tar.gz
Example output:
images/ images/pxeboot.qcow2 images/ipw-sc-22.qcow2 temp/ temp/mode22/ temp/mode22/ipw-vnf-22-zone.yaml temp/mode22/ipw-vnf-22.yaml
- Clean up IPWorks.
#./ipwdeploy.sh -a cleanup
- Re-execute the script ipwdeploy.sh on Host1 to re-deploy IPWorks VNF.
# ./ipwdeploy.sh -a deploy
5.21.2.3 Confirming Solution
Check whether the same issue occurs when running scripts ipwInit.sh.
5.22 IPWorks Deployment for CEE
5.22.1 Fault Symptoms
When you deployed IPWorks successfully, the hosts timezone is mismatching with the /cluster/etc/cluster.conf.Then, you must manually synchronize the timezone.
5.22.2 Locating Fault
You can execute bellow steps to check if need to manually synchronize the timezone.
- Log on to host, for example, log on to SC-1.
#ssh root@<SC-1_IP_Address>
- Open cluster.conf file to check timezone information.
SC-1:~# vi /cluster/etc/cluster.conf
For example, The timezone in /cluster/etc/cluster.conf as below:
#Define time zone #See/usr/share/zoneinfo/ for supported time zones #timezone Asia/Shanghai #timezone Asia/Shanghai ...
- Check the timezone link and the host time.
SC-1:~# ll /etc/localtime
For example, execute the command ll /etc/localtime and output as below:
lrwxrwxrwx 1 root root 38 Mar 6 2017 /etc/localtime
../usr/share/zoneinfo/Europe/Stockholm
SC-1:~# date
- Chck if the Step 2 and Step 3 timezone information is matching.
- If not matched, you must manually synchronize the timezone.
#lde-config -r
5.22.3 Confirming Solution
Not applicable.
5.23 "COM SA, AMF Component Instantiation Failed" on SC-1
5.23.1 Trouble Symptoms
An alarm “COM SA, AMF Component Instantiation Failed” is issued on SC-1 node. And SC-1 node will be failed to take ownership of Management VIP (MIP_OAM_IP) when SC-2 is rebooting.
5.23.2 Locating Fault
The RPM com-comsa-cxp*.sle12 is installed on SC-1. However, the folder /opt/com/lib/comp and files under this folder are missing. This causes the COM process to hang before invoking AMF API.
Check the alarm by using ECLI:
>show ManagedElement=1,SystemFunctions=1,Fm=1 -m FmAlarm ... FmAlarm=148 activeSeverity=MAJOR additionalText="Instantiation of Component safComp=Cmw,safSu=SC-1,safSg=2N,safApp=ERIC-com.oam.access.aggregation failed" eventType=PROCESSINGERRORALARM lastEventTime="2017-07-10T04:19:08.168+00:00" majorType=18568 minorType=131074 originalAdditionalText="Instantiation of Component safComp=Cmw,safSu=SC-1,safSg=2N,safApp=ERIC-com.oam.access.aggregation failed" originalEventTime="2017-07-10T04:19:08.168+00:00" originalSeverity=MAJOR probableCause=418 sequenceNumber=325 source="ManagedElement=UVIW-DEFRA-03-0001,SaAmfApplication.safApp=ERIC-ComSa,SaAmfSG.safSg=2N,SaAmfSU.safSu=Cmw1,SaAmfComp.safComp=Cmw" specificProblem="COM SA, AMF Component Instantiation Failed" additionalInfo name="" value="ManagedElement=1,SaAmfCluster.safAmfCluster=myAmfCluster,SaAmfNode.safAmfNode=SC-1" ...
Check the alarm by using CMW command:
SC-1:~ # cmw-status si |grep -A2 -i "comsa" ... safSi=2N,safApp=ERIC-ComSa AdminState=UNLOCKED(1) AssignmentState=PARTIALLY_ASSIGNED(3) ...
The following procedure is an example to fix this issue:
- Run "cluster rootfs -c -o -n 1" on SC-1, reboot SC-1.
Then the COMSA RPM will be re-installed, and the directory /opt/com/lib/comp/
and files will be created automatically.
- SC-1:~ # cluster rootfs -c -o -n 1
- SC-1:~ # reboot
- Check if the alarm is still existed.
SC-1:~ # cmw-status si |grep -A2 -i "comsa"
- If the alarm exists, remove it.
SC-1:~ # amf-adm -t 200 repaired safSu=SC-1,safSg=2N,safApp=ERIC-com.oam.access.aggregation
5.23.3 Confirming Solution
Check the alarm again by using ECLI:
>show ManagedElement=1,SystemFunctions=1,Fm=1 -m FmAlarm
The previous alarm information will be removed when the issue is fixed.
Check the alarm by using CMW command:
SC-1:~ # cmw-status si |grep -A2 -i "comsa"
The previous alarm information will be removed when the issue is fixed.
If the alarm remains or the folder and files are still missing, contact next level of Ericsson support.
5.24 IPWorks Workflows Problems
This section provides information on resolving problems on IPWorks workflows.
All the tasks status is shown on the workflow application GUI. In Workflow Diagram, tasks with blue frame are passed, tasks with yellow frame are in process, and tasks with red frame are failed.
You can find the detailed information about the task on Workflow Log. And the logs are recorded in /ericsson/3pp/jboss/standalone/log/server.log.
For more information about IPWorks Workflow, refer to IPWorks VNF Life Cycle Management.
5.24.1 Authentication Failed
5.24.1.1 Trouble Symptoms
The termination workflow failed at "Collect User Data" task.
The status of workflow is failed.
5.24.1.2 Locating Fault
Log on the VNF-LCM services VM:
#vi /ericsson/3pp/jboss/standalone/log/server.log
Search "Authentication Failed" to view the detailed log.
5.24.1.3 Confirming Solution
Ensure the cloud VIM configuration properties (such as cloudUserName, cloudUserPassword, cloudBaseURL, and cloudTenantId) are configured correctly. For how to check the VIM details, refer to the document VNF-Lifecycle Manager System Administration Guide, Reference [33].
If the issue remains, collect the log and then contact next level of Ericsson support.
5.24.2 Parameter Value Is Wrong
5.24.2.1 Trouble Symptoms
The instantiation workflow failed at "Perform Stack Create" task.
The status of workflow is failed.
The workflow log on GUI shows "Instance cancelled".
5.24.2.2 Locating Fault
Log on the VNF-LCM services VM:
#vi /ericsson/3pp/jboss/standalone/log/server.log
Search "is invalid: Error validating value" to locate the invalid parameter.
5.24.2.3 Confirming Solution
Ensure the parameter value is correct in env.yaml.
If the issue remains, collect the log and then contact next level of Ericsson support.
5.24.3 Missing File in Configuration Directory
5.24.3.1 Trouble Symptoms
The instantiation workflow failed at "Post Instantiation" task, but "Perform Stack Create" task succeeded.
The status of workflow is failed.
The workflow log on GUI shows "No such file or directory".
5.24.3.2 Locating Fault
Log on the VNF-LCM services VM:
#vi /ericsson/3pp/jboss/standalone/log/server.log
Search "No such file or directory" to locate the missing file.
5.24.3.3 Confirming Solution
Check onboarding steps. Refer to the section Onboarding in IPWorks VNF Life Cycle Management.
Ensure the configure file is put under the configuration path.
5.24.4 Environment Has Been Used
5.24.4.1 Trouble Symptoms
The instantiation workflow failed at "Perform Stack Create" task.
The status of workflow is failed.
The workflow log on GUI shows "Instance cancelled".
5.24.4.2 Locating Fault
Log on the VNF-LCM services VM:
#vi /ericsson/3pp/jboss/standalone/log/server.log
Search "In Used" to locate the network or environment resources (such as vlan) that has been used.
5.24.4.3 Confirming Solution
Delete the server that is using the environment, or start a new available environment.
If the issue remains, collect the log and then contact next level of Ericsson support.
5.24.5 IPWorks lm or sql init Failed
5.24.5.1 Trouble Symptoms
The instantiation workflow failed at "Post Instantiation" task, but "Perform Stack Create" task succeeded.
The status of workflow is failed.
The workflow log on GUI shows "Instance Failed".
5.24.5.2 Locating Fault
Log on the VNF-LCM services VM:
#vi /ericsson/3pp/jboss/standalone/log/server.log
Search "ipw_init_phase_one failed" or "ipw_init_phase_two failed" to view the detailed failure of ipw_init_phase_failed.
5.24.5.3 Confirming Solution
Terminate the IPWorks, then instantiate it again.
If the issue remains, collect the log and then contact next level of Ericsson support.
5.24.6 Missing Parameter Value
5.24.6.1 Trouble Symptoms
The instantiation workflow failed at "Perform Stack Create" task.
The status of workflow is failed.
The workflow log on GUI shows "Instance cancelled".
5.24.6.2 Locating Fault
Log on the VNF-LCM services VM:
#vi /ericsson/3pp/jboss/standalone/log/server.log
Search "is not configured" to see what parameter is not configured, such as “EMERGENCY_USER”.
5.24.6.3 Confirming Solution
Ensure the parameter value is correct in env.yaml. Then run instantiation steps, which will regenerate new env.yaml and main.yaml files. For more information about env.yaml and main.yaml, refer to the section Instantiate VNF in IPWorks VNF Life Cycle Management.
If the issue remains, collect the log and then contact next level of Ericsson support.
5.24.7 Termination Script Missed in IPWorks
5.24.7.1 Trouble Symptoms
The termination workflow failed at "Pre Termination" task.
The status of workflow is failed.
The workflow log on GUI shows "Instance Failed".
5.24.7.2 Locating Fault
Log on the VNF-LCM services VM:
#vi /ericsson/3pp/jboss/standalone/log/server.log
Search "No such file or directory" to locate which file is missed.
5.24.7.3 Confirming Solution
Collect the log and then contact next level of Ericsson support.
5.24.8 Workflow Gets no Stacks
5.24.8.1 Trouble Symptoms
The termination workflow failed at "Collect Stack Details" task.
The status of workflow is failed.
The workflow log on GUI shows "Instance Failed".
5.24.8.2 Locating Fault
Log on the VNF-LCM services VM:
#vi /ericsson/3pp/jboss/standalone/log/server.log
You can find the detailed information about this problem, such as "stacklist is none".
IPWorks Workflows can only manage the stacks with tags.
5.24.8.3 Confirming Solution
Workflow can only manage the stacks with tags. Use OpenStack command to delete this stack.
#heat stack-delete <stack-name or stack-id>
If the issue remains, collect the log and then contact next level of Ericsson support.
6 Trouble Reporting
Problems identified that cannot be solved by using this document must be reported to the next level of maintenance support through a Customer Service Report (CSR).
The details of the trouble reporting process is outside the scope of this document.
When collecting information for further support, ensure that all current logs are recorded. See time and date for the logs.
For more information on how to collect information, refer to Data Collection Guideline for IPWorks.
When sending crash dumps, ensure that the dump is of the actual scenario. See time and date for the dump.
7 Appendix A: Example of PM, FM, LM, and AMF Logs
This section gives examples of the Common Component logs.
Example 19 Performance Management Logs
================== 2015/04/29 10:30:31|DNS|Error|PM_Adaptor|system 140548769142528 - /vobs/ims/ipworks/src/common/coremw_adaptor/pm_adaptor_scc/src/PmObserver .cpp:27 initialize. saPmInitialize FAILED: 4 2015/04/29 10:30:42|DNS|Error|PM_Adaptor|system 140548733282064 - /vobs/ims/ipworks/src/dns/dnspm_scc/src/PmObserverImpl.cpp:374 uploadPmData. PM re-initialize FAILED: 4 2015/04/29 10:30:42|DNS|Error|PM_Adaptor|system 140548733282064 - /vobs/ims/ipworks/src/dns/dnspm_scc/src/PmObserverImpl.cpp:671 uploadPmData. pm not intialized! 2015/04/29 10:30:43|DNS|Error|PM_Adaptor|system 140548769142528 - /vobs/ims/ipworks/src/dns/dnspm_scc/src/PmObserverImpl.cpp:144 initialize. saPmPGaugeRefGet FAILED: 9 2015/04/29 10:30:53|DNS|Error|PM_Adaptor|system 140548733282064 - /vobs/ims/ipworks/src/dns/dnspm_scc/src/PmObserverImpl.cpp:404 uploadPmData. saPmPGaugeIntegerSet FAILED: 9
Example 20 Fault Management Logs
================== 2015/04/23 14:27:14|DNS|Info|DNSFM|user 140542940722944 - /vobs/ims/ipworks/src/dns/dnsfm_ou/src/IpworksFmInterfaceImpl.cpp:74 finalize. /vobs/ims/ipworks/src/dns/dnsfm_ou/src/IpworksFmInterfaceImpl.cpp74 finalize -finalize successfully! 2015/04/23 14:27:14|DNS|Debug|DNSFM|user 140542940722944 - /vobs/ims/ipworks/src/dns/dnsfm_ou/src/IpworksFmService.cpp:364 run. /vobs/ims/ipworks/src/dns/dnsfm_ou/src/IpworksFmService.cpp:364 run exit the thread.
Example 21 License Management Logs
================== 2015/04/09 00:00:18|DNS|Info|LM|user 139788399494912 - /vobs/ims/ipworks/src/common/coremw_adaptor/lm_adaptor_scc/src/IpworksLmCallbacks.cpp:24 operationalModeNotificationCallback. /vobs/ims/ipworks/src/common/coremw_adaptor/lm_adaptor_scc/src/IpworksLmCallbacks.cpp:24 operationalModeNotificationCallback >> currentMode:0 2015/04/09 00:00:18|DNS|Warning|LM|user 139788399494912 - /vobs/ims/ipworks/src/common/coremw_adaptor/lm_adaptor_scc/src/IpworksLmService.cpp:212 notifyLmChangeToApp. /vobs/ims/ipworks/src/common/coremw_adaptor/lm_adaptor_scc/src/ IpworksLmService.cpp:212 notifyLmChangeToApp. ⇒ Local license info is not in a good status! currentLicenseStatus = 5 2015/04/09 00:00:18|DNS|Warning|LM|user 139788399494912 - /vobs/ims/ipworks/src/common/coremw_adaptor/lm_adaptor_scc/src/IpworksLmService.cpp:226 ⇒ notifyLmChangeToApp. /vobs/ims/ipworks/src/common/coremw_adaptor/lm_adaptor_scc/src/IpworksLmService.cpp:226 notifyLmChangeToApp. License Expired! No Service provided! 2015/04/09 00:00:18|DNS|Info|LM|user 139788399494912 - /vobs/ims/ipworks/src/common/coremw_adaptor/lm_adaptor_scc/src/IpworksLmCallbacks.cpp:72 operationalModeNotificationCallback. /vobs/ims/ipworks/src/common/coremw_adaptor/lm_adaptor_scc/src/IpworksLmCallbacks.cpp:72 operationalModeNotificationCallback Update License Done!
Example 22 AMF Logs
------------- 2015/04/09 00:15:51|amfwrapper|Trace|AMF_Adaptor|system 140376849086208 - /vobs/ims/ipworks/src/common/coremw_adaptor/amf_adaptor_scc/src/AmfMonitorThread.cpp:251 amfHealthCheck. Healthcheck successful 2015/04/09 00:15:51|amfwrapper|Trace|AMF_Adaptor|system 140376849086208 - /vobs/ims/ipworks/src/common/coremw_adaptor/amf_adaptor_scc/src/AmfMonitorThread.cpp:267 amfHealthCheck. << saAmfResponse aisRet = 1 2015/04/09 00:16:02|amfwrapper|Trace|AMF_Adaptor|system 140376849086208 - /vobs/ims/ipworks/src/common/coremw_adaptor/amf_adaptor_scc/src/AmfMonitorThread.cpp:242 amfHealthCheck. >> 2015/04/09 00:16:02|amfwrapper|Trace|AMF_WRAPPER|TRACE 140376849086208 - /vobs/ims/ipworks/src/common/amfwrapper/amfwrapper_scc/src/AmfObserverImpl. cpp:85 doHealthCheck. >>
8 Appendix B: Capturing and Tracing the Messages
8.1 Capturing and Tracing the Access-Request Messages
To capture and analyze the Access-request messages between GGSN node and IPWorks Radius node, do the following:
- Note:
- Type whatever you want to filteror search directly in the Filter area.
- Capture the authentication/authorization traces between
GGSN node and IPWorks Radius node.
#tcpdump -i sig_data_sp -s 0 port 1812 -w trace20130104_PS1.cap
trace20130104_PS1.cap is the name of trace file that you want to use to save the captured message.
- Download the trace file trace20130104_PS1.cap and open it by the package analyzer-Wireshark.
- In Wireshark, analyze the captured message by the following
steps:
- Filter the string radius.code == 1 to get the number of Access-request messages.
- Filter the string radius.code == 2 to get the number of Access-accept messages.
- Filter the string radius.code == 3 to get the number of Access-reject messages.
Based on the filter output:
- Normal scenario
The best case is the number of Access-request messages = Access-accept + Access-reject. Additionally, the deviation can be ignored when the diff < 10 because of the manual operation.
- Abnormal scenario
If the number of Access-request messages > Access-accept + Access-reject, the Radius AAA server does not reply all the request messages sent from the GGSN side. In this abnormal situation, enable the logs in Radius AAA and trace the message in the Radius stack, Radius Backend by Acct-Session-Id and the time.
8.2 Capturing and Tracing the Accounting-request Messages
To capture and analyze the Accounting-request messages between GGSN node and IPWorks Radius node, do the following:
- Note:
- Type whatever you want to filteror search directly in the Filter area.
- Capture the accounting traces between GGSN node and IPWorks
Radius node.
# tcpdump -i bond0 -s 0 port 1813 -w trace20130104_PS1.cap
trace20130104_PS1.cap is the name of trace file that you want to use to save the captured message.
- Download the trace file and open it by the package analyzer-Wireshark.
- In Wireshark, analyze the captured message by the following
steps:
Prerequisite: The proxy function is enabled, the interim update function is enabled, and the Disconnection message (DM) is disabled. For how to enable and disable the previous functions, see the following subsections.
- Filter the string radius.Acct_Status_Type == 1 to get the number of accounting-start messages.
- Filter the string radius.Acct_Status_Type == 2 to get the number of accounting-stop messages.
- Filter the string radius.Acct_Status_Type == 3 to get the number of accounting-update messages.
- Filter the string radius.code == 5 to get the number of accounting-response messages.
Reference List
| Ericsson Documents |
|---|
| [1] IPWorks Manual Health Check. |
| [2] Glossary of Terms and Acronyms. |
| [3] Trademark Information. |
| [4] Typographic Conventions. |
| [5] Check Alarm Status. |
| [6] Fault Management. |
| [7] Data Collection Guideline for IPWorks. |
| [8] IPWorks Alarm List. |
| [9] IPWorks Measurement List. |
| [10] IPWorks Performance Measurements. |
| [11] Performance Management Report File Format. |
| [12] View Software Information. |
| [13] IPWorks DNS, ASDNS, ENUM Parameter Description. |
| [14] Configure MySQL NDB Cluster. |
| [15] IPWorks Configuration Management. |
| [16] View License Information. |
| [17] Storage Server, MySQL Cluster Node Unreachable. |
| [18] Create Backup. |
| [19] Restore Backup. |
| [20] Managed Object Model (MOM). |
| [21] Storage Server, MySQL Cluster Node Unreachable. |
| [22] Storage Server, MySQL Database Unreachable. |
| [23] Storage Server, The MySQL Replication for Geographic Redundancy Failed. |
| [24] IPWorks Initial Configuration, 5/1553-AVA 901 33/3 Uen |
| [25] IPWorks VNF Life Cycle Management, 31/1553-AVA 901 33/3 Uen |
| [26] CEE Troubleshooting Guideline, 2/1553-AZE 102 01 Uen |
| [27] COM Advanced Troubleshooting Guideline, 3/154 51-CAA 901 2587/7 |
| [28] Core MW Troubleshooting Guideline, 6/154 51-CAA 901 2624/4 |
| [29] eVIP Advanced Troubleshooting Guideline, 1/154 51-APR 901 0467/3 |
| [30] JavaOaM Troubleshooting Guideline, 1/154 51-APR 901 0487/2 |
| [31] LM Troubleshooting Guideline, 1/154 51-APR 901 0503/5 |
| [32] SS7 CAF Troubleshooting Guideline, 154 51-ANA 901 37 |
| [33] VNF-Lifecycle Manager System Administration Guide, 1543-APR 901 0578 Uen |
| [34] LDE Scaling User’s Guide, 3/1553-ANA 901 39/4 Uen |
| Online References |
|---|
| [35] MySQL 5.5 Reference Manual. |

Contents










