1 Introduction
This document describes the configuration management operation on auto health check. For more detail information on Health Check Framework, refer to the document Health Check Management.
1.1 Prerequisites
This section states the prerequisites for performing the auto health check procedure.
- An Ericsson Command-Line Interface (ECLI) session in Exec mode is in progress.
- All the operations described in this document are done via Ericsson Command-Line Interface (ECLI).
1.1.1 Documents
Before starting this procedure, ensure that the following information or documents are available:
- For how to use Ericsson Command-Line Interface (ECLI), refer to Ericsson Command-Line Interface User Guide.
- For the network plan, refer to IPWorks Network Connectivity Overview.
- For the detail information about auto heath check, refer to Health Check Management.
1.1.2 Tools
To perform auto health check, the following tool is required:
- The health check rules must be loaded.
To check the version of the rules loaded:
hcrsfm -l
Example output:
|
RULE SET FILE ID |
REVISION |
|
CXC1739883_20 |
A |
1.1.3 Conditions
- The system version is IPWorks 1.1 or higher.
- By default all actions are performed on the SC, unless otherwise specified.
- Have the knowledge of the network plan and the System Controller (SC) address of the cluster.
1.2 Related Information
For the trademark information, typographic conventions, definition, and explanation of acronyms and terminology, see the following documents:
2 View Health Check Rule
Health Check Rule is responsible for checking some specific function area or environment, that is to say, the whole set of health check rules represent the scope of health checking supported by IPWorks. Each rule has the attribute of “categories”, which aims to facilitate the execution of rules to help customers the planning of their execution according to specific O&M activity. For more information, refer to the document Health Check Management.
You can check the auto health check rules sorted by rule ID by executing the command:
>dn ManagedElement=<Node name>,SystemFunctions=1,HealthCheckM=1
(HealthCheckM=1)>show-table -m HcRule -p hcRuleId,description,name,severity,administrativeState --sort
Example output:
| hcRuleId | description | name | severity | administrativeState | ==================================================================================================================================================================== | IPW_001 | Check Active WARNING Alarms | Active WARNING Alarms | WARNING | UNLOCKED | | IPW_002 | Check Active MINOR Alarms | Active MINOR Alarms | WARNING | UNLOCKED | | IPW_003 | Check Active MAJOR Alarms | Active MAJOR Alarms | CRITICAL | UNLOCKED | | IPW_004 | Check Active CRITICAL Alarms | Active CRITICAL Alarms | CRITICAL | UNLOCKED | | IPW_101 | Check LDE Dumps Files | Core Dumps Files | WARNING | UNLOCKED | | IPW_102 | Verify the idle CPU load is within the expected range. | CPU usage | CRITICAL | UNLOCKED | | IPW_103 | Check for the disk usage percentage. | Disks usage | CRITICAL | UNLOCKED | | IPW_104 | Check the current available amount of free memory. | Memory usage | CRITICAL | UNLOCKED | | IPW_105 | Check the status of the NTP server. | NTP status | CRITICAL | UNLOCKED | | IPW_106 | A check of the cluster internal messaging bus. | Internal Cluster Communications | CRITICAL | UNLOCKED | | IPW_107 | The status of the Ethernet interfaces are check. All should be UP and RUNNING. | Ethernet interfaces status | CRITICAL | UNLOCKED | | IPW_108 | Check for software raid keeps the disks synchronised. | DRBD Status | CRITICAL | UNLOCKED | | IPW_201 | Check Virtual IP addresses. | Virtual IP address status | CRITICAL | UNLOCKED | | IPW_202 | Check Abstract Load Balancer (ALB) Status | Abstract Load Balancer (ALB) Status | CRITICAL | UNLOCKED | | IPW_301 | Check COREMW services are up | COREMW services are up | WARNING | UNLOCKED | | IPW_302 | Check if lm status is locked | LM status is not locked | WARNING | UNLOCKED | | IPW_401 | Check IPW PMF Counters Activated | IPW PMF Counters Activated | WARNING | UNLOCKED | | IPW_402 | Check right number of IPW DB tables | IPW DB table number | WARNING | UNLOCKED | | IPW_403 | Check right status of all mysql NDB node | IPW NDB status | CRITICAL | UNLOCKED | | IPW_404 | Check right status of SS | IPW SS status is correct | WARNING | UNLOCKED | | IPW_405 | Check if IPW licenses are expired | IPW licenses are not expired | WARNING | UNLOCKED | | IPW_409 | Check dns error log for the recent one day | IPW DNS server check error log | WARNING | UNLOCKED | | IPW_410 | Do validation of dns zone file for grammar | IPW DNS server checking zone file | CRITICAL | UNLOCKED | | IPW_411 | Check enum error log for the recent one day | IPW ENUM server error log | WARNING | UNLOCKED | | IPW_412 | Check if dns listen udp and tcp 53 port | IPW DNS server Port listern status | CRITICAL | UNLOCKED | | IPW_413 | Check if dns listen at port 5300, enum listen 53 port | IPW ENUM server Port listern status | CRITICAL | UNLOCKED | | IPW_501 | Check if Diameter identity exists | IPW Diameter identity exists | WARNING | UNLOCKED | | IPW_502 | Check Diameter stack DN | IPW Diameter stack DN | WARNING | UNLOCKED | | IPW_503 | IPW AAA Diameter log does not contain fatal or error messages | IPW AAA Diameter log | WARNING | UNLOCKED | | IPW_504 | IPW AAA SM log does not contain fatal or error messages | IPW AAA SM log | WARNING | UNLOCKED | | IPW_505 | Check right status of IPW AAA | IPW AAA status is correct | WARNING | UNLOCKED | ==================================================================================================================================================================== |
The explanation of the attributes are as following:
|
hcRuleId |
The identity of Health Check Rule. |
|
description |
The purpose of Health Check Rule. |
|
name |
The name of health check rule. |
|
severity |
The severity of result for Health Check Rule. |
|
administrativeState |
The Administrate state of the Health Check Rule. If the value is LOCKED, this rule will not be executed at anytime. If the value is UNLOCKED, this rule will be executed in associated Health Check Job. |
|
categories |
The category this health check rule belongs to. One health check rule could be associated with several categories. |
You can list rules for each category by executing the command:
>dn ManagedElement=<Node name>,SystemFunctions=1,HealthCheckM=1
(HealthCheckM=1)> show -m HcRule | filter -B 2 <CATEGORY_NAME>
Here is an example for DAILY, TROUBLESHOOT, SHORT:
>dn ManagedElement=<Node name>,SystemFunctions=1,HealthCheckM=1
(HealthCheckM=1)> show -m HcRule | filter -B 2 DAILY
HcRule=IPW_409
categories
DAILY
--
HcRule=IPW_410
categories
DAILY
--
HcRule=IPW_411
categories
DAILY
--
HcRule=IPW_412
categories
DAILY
--
HcRule=IPW_413
categories
DAILY
(HealthCheckM=1)>show -m HcRule | filter -B 2 TROUBLESHOOT
HcRule=IPW_102
categories
TROUBLESHOOT
--
HcRule=IPW_103
categories
TROUBLESHOOT
--
HcRule=IPW_104
categories
TROUBLESHOOT
--
HcRule=IPW_105
categories
TROUBLESHOOT
--
HcRule=IPW_106
categories
TROUBLESHOOT
--
HcRule=IPW_107
categories
TROUBLESHOOT
--
HcRule=IPW_108
categories
TROUBLESHOOT
--
HcRule=IPW_201
categories
TROUBLESHOOT
--
HcRule=IPW_202
categories
TROUBLESHOOT
--
HcRule=IPW_302
categories
TROUBLESHOOT
--
HcRule=IPW_401
categories
TROUBLESHOOT
(HealthCheckM=1)>show -m HcRule | filter -B 2 SHORT
HcRule=IPW_001
categories
SHORT
--
HcRule=IPW_002
categories
SHORT
--
HcRule=IPW_003
categories
SHORT
--
HcRule=IPW_004
categories
SHORT
--
HcRule=IPW_101
categories
SHORT
--
HcRule=IPW_301
categories
SHORT
--
HcRule=IPW_402
categories
SHORT
--
HcRule=IPW_403
categories
SHORT
--
HcRule=IPW_404
categories
SHORT
--
HcRule=IPW_405
categories
SHORT
--
HcRule=IPW_501
categories
SHORT
--
HcRule=IPW_502
categories
SHORT
--
HcRule=IPW_503
categories
SHORT
--
HcRule=IPW_504
categories
SHORT
--
HcRule=IPW_505
categories
SHORT
(HealthCheckM=1)>
|
3 Health Check Actions
This section describes the general procedures of auto health check with the health check job.
Health check job (HcJob) is associated with a group of health check rules by specifying the job attribute of rulesCategories. When executing the health check job, all the associated health check rules will be evaluated. For more information about health check job, refer to the document Health Check Management.
To perform the health check, execute below steps:
- Create health check job.
After IPWorks installation, there are three default health check jobs, named as “short”, “daily” and “troubleshoot”, and each one is associated with the corresponding category: SHORT, DAILY and TROUBLESHOOT.
SHORT
Rules that should be executed for quick checks.
DAILY
Rules that should be executed daily.
TROUBLESHOOT
Rules that should be executed for troubleshooting.
For details about the rules, refer to HCF Integration Guideline.
If the default jobs are not enough, refer to Section 3.1 to customize new health check job.
- Execute health check job.
Refer to Section 3.2 to execute the health check job and wait until the execution is done.
- Examine health check result.
Refer to Section 3.3 to examine the health check result and do analysis if the result shows that the system is NOT HEALTHY.
3.1 Create Health Check Job
There are three supported rule categories: SHORT, TROUBLESHOOT and DAILY, the value of attribute rulesCategories should be one of them.
3.1.1 Create Health Check Job with One Rule Category
Here is an example of creating Health Check Job with SHORT category:
>dn ManagedElement=<Node name>,SystemFunctions=1,HealthCheckM=1
(HealthCheckM=1)>configure
(config-HealthCheckM=1)> HcJob=Basic
(config-HcJob=Basic)> rulesCategories=SHORT
(config-HcJob=Basic)>commit
(HcJob=Basic)>show
HcJob=Basic
localFileStorePath="/storage/no-backup/nbi_root/health_check"
rulesCategories
SHORT
progressReport
actionName=""
progressInfo=""
progressPercentage=0
result=NOT_AVAILABLE
resultInfo=""
HcJobScheduler=1
(HcJob=Basic)>
3.1.2 Create Health Check Job with Multiple Rule Categories
Here is an example of creating a Health Check Job with SHORT, TROUBLESHOOT, DAILY categories:
>dn ManagedElement=<Node name>,SystemFunctions=1,HealthCheckM=1
(HealthCheckM=1)>configure
(config-HealthCheckM=1)>HcJob=All
(config-HcJob=All)>rulesCategories=SHORT
(config-HcJob=All)>rulesCategories=TROUBLESHOOT
(config-HcJob=All)>rulesCategories=DAILY
(config-HcJob=All)>commit
(HcJob=All)>show
HcJob=All
localFileStorePath="/storage/no-backup/nbi_root/health_check"
rulesCategories
SHORT
TROUBLESHOOT
DAILY
progressReport
actionName=""
progressInfo=""
progressPercentage=0
result=NOT_AVAILABLE
resultInfo=""
HcJobScheduler=1
(HcJob=All)>
3.2 Execute Health Check Job
Login Ericsson Command-Line Interface, and then navigate to some HcJob Managed Object, for example:
>dn ManagedElement=<Node name>,SystemFunctions=1,HealthCheckM=1,HcJob=Basic
(HcJob=Basic)>execute
3.3 Examine Health Check Job Result
You can check the job result by executing the command:
(HcJob=Basic)>show
HcJob=Basic
lastReportFileName="_Basic_20161128T172153_man"
lastRunTime="2016-11-28T17:21:53"
localFileStorePath="/storage/no-backup/nbi_root/health_check"
rulesCategories
SHORT
status=NOT_HEALTHY
failedRules
hcRule="hcRuleId=IPW_003"
reason="Node have alarm, Severity Level MAJOR"
severity=CRITICAL
progressReport
actionName="EXECUTE"
progressInfo="Job Execution completed"
progressPercentage=100
result=SUCCESS
resultInfo="Job correctly executed"
state=FINISHED
timeActionCompleted="2016-11-28T17:21:53"
timeActionStarted="2016-11-28T17:21:32"
timeOfLastStatusUpdate="2016-11-28T17:21:53"
HcJobScheduler=1
(HcJob=Basic)>
The result of a job execution, in terms of success or failure, is available in the result attribute. It shows the value NOT_AVAILABLE until job completion.
Once the job is executed without problems, it shows SUCCESS. If the job execution terminates because of an error, it shows FAILURE.
If the result is FAILURE, it may be related to one of the following reasons:
- It was not possible to write the report file in the output directory. Check the output directory on the file system.
- A rule set file contains rules that are not correct from a syntactic or semantic perspective. Check the rule set file (/opt/ipworks/common/confs/RuleFileSet.xml). For details, refer to HCF Troubleshooting Guideline.
In these cases, check the file stored under the folder /storage/no-backup/coremw/var/log/saflog/. If the root cause of the problem cannot be found, report it to next level of support and continue with the manual health check procedure.
If there are failed rules exist in the result of Health Check Job, refer to the recommended action in corresponding rule.
3.4 Delete Health Check Job
Delete the created health check job, execute the following command:
>ManagedElement=<Node name>,SystemFunctions=1,HealthCheckM=1 (HealthCheckM=1)>configure (config-HealthCheckM=1)>no HcJob=Basic (config-HealthCheckM=1)>commit (HealthCheckM=1)>show
Double check whether Health Check Job is deleted.
3.5 Schedule Health Check Job
Based on the customized requirement, users can refer to the specific documents shown below to schedule health check job:
- Schedule Single Health Check Job
- Schedule Health Check Job Based on Calendar Event
- Schedule Health Check Job Based on Periodic Event
Reference List
| IPWorks Library Document |
|---|
| [1] Schedule Health Check Job Based on Periodic Event . |
| [2] Schedule Health Check Job Based on Calendar Event. |
| [3] Schedule Single Health Check Job. |
| [4] Health Check Management. |
| [5] Unlock Local Authorization Method. |
| Documents in other Library |
|---|
| [6] HCF Troubleshooting Guideline, 1553-APR 901 0574/2 |
| [7] HCF Integration Guideline, 1/1531-APR 901 0574/2 |

Contents