IPWorks Auto Health Check

Contents

1Introduction
1.1Prerequisites
1.1.1Documents
1.1.2Tools
1.1.3Conditions
1.2Related Information

2

View Health Check Rule

3

Health Check Actions
3.1Create Health Check Job
3.1.1Create Health Check Job with One Rule Category
3.1.2Create Health Check Job with Multiple Rule Categories
3.2Execute Health Check Job
3.3Examine Health Check Job Result
3.4Delete Health Check Job
3.5Schedule Health Check Job

Reference List

1   Introduction

This document describes the configuration management operation on auto health check. For more detail information on Health Check Framework, refer to the document Health Check Management.

1.1   Prerequisites

This section states the prerequisites for performing the auto health check procedure.

1.1.1   Documents

Before starting this procedure, ensure that the following information or documents are available:

1.1.2   Tools

To perform auto health check, the following tool is required:

To check the version of the rules loaded:

hcrsfm -l

Example output:

RULE SET FILE ID

REVISION

CXC1739883_20

A

1.1.3   Conditions

1.2   Related Information

For the trademark information, typographic conventions, definition, and explanation of acronyms and terminology, see the following documents:

2   View Health Check Rule

Health Check Rule is responsible for checking some specific function area or environment, that is to say, the whole set of health check rules represent the scope of health checking supported by IPWorks. Each rule has the attribute of “categories”, which aims to facilitate the execution of rules to help customers the planning of their execution according to specific O&M activity. For more information, refer to the document Health Check Management.

You can check the auto health check rules sorted by rule ID by executing the command:

>dn ManagedElement=<Node name>,SystemFunctions=1,HealthCheckM=1

(HealthCheckM=1)>show-table -m HcRule -p hcRuleId,description,name,severity,administrativeState --sort

Example output:

| hcRuleId | description                                                                    | name                                | severity | administrativeState |
====================================================================================================================================================================
| IPW_001  | Check Active WARNING Alarms                                                    | Active WARNING Alarms               | WARNING  | UNLOCKED            |
| IPW_002  | Check Active MINOR Alarms                                                      | Active MINOR Alarms                 | WARNING  | UNLOCKED            |
| IPW_003  | Check Active MAJOR Alarms                                                      | Active MAJOR Alarms                 | CRITICAL | UNLOCKED            |
| IPW_004  | Check Active CRITICAL Alarms                                                   | Active CRITICAL Alarms              | CRITICAL | UNLOCKED            |
| IPW_101  | Check LDE Dumps Files                                                          | Core Dumps Files                    | WARNING  | UNLOCKED            |
| IPW_102  | Verify the idle CPU load is within the expected range.                         | CPU usage                           | CRITICAL | UNLOCKED            |
| IPW_103  | Check for the disk usage percentage.                                           | Disks usage                         | CRITICAL | UNLOCKED            |
| IPW_104  | Check the current available amount of free memory.                             | Memory usage                        | CRITICAL | UNLOCKED            |
| IPW_105  | Check the status of the NTP server.                                            | NTP status                          | CRITICAL | UNLOCKED            |
| IPW_106  | A check of the cluster internal messaging bus.                                 | Internal Cluster Communications     | CRITICAL | UNLOCKED            |
| IPW_107  | The status of the Ethernet interfaces are check. All should be UP and RUNNING. | Ethernet interfaces status          | CRITICAL | UNLOCKED            |
| IPW_108  | Check for software raid keeps the disks synchronised.                          | DRBD Status                         | CRITICAL | UNLOCKED            |
| IPW_201  | Check Virtual IP addresses.                                                    | Virtual IP address status           | CRITICAL | UNLOCKED            |
| IPW_202  | Check Abstract Load Balancer (ALB) Status                                      | Abstract Load Balancer (ALB) Status | CRITICAL | UNLOCKED            |
| IPW_301  | Check COREMW services are up                                                   | COREMW services are up              | WARNING  | UNLOCKED            |
| IPW_302  | Check if lm status is locked                                                   | LM status is not locked             | WARNING  | UNLOCKED            |
| IPW_401  | Check IPW PMF Counters Activated                                               | IPW PMF Counters Activated          | WARNING  | UNLOCKED            |
| IPW_402  | Check right number of IPW DB tables                                            | IPW DB table number                 | WARNING  | UNLOCKED            |
| IPW_403  | Check right status of all mysql NDB node                                       | IPW NDB status                      | CRITICAL | UNLOCKED            |
| IPW_404  | Check right status of SS                                                       | IPW SS status is correct            | WARNING  | UNLOCKED            |
| IPW_405  | Check if IPW licenses are expired                                              | IPW licenses are not expired        | WARNING  | UNLOCKED            |
| IPW_409  | Check dns error log for the recent one day                                     | IPW DNS server check error log      | WARNING  | UNLOCKED            |
| IPW_410  | Do validation of dns zone file for grammar                                     | IPW DNS server checking zone file   | CRITICAL | UNLOCKED            |
| IPW_411  | Check enum error log for the recent one day                                    | IPW ENUM server error log           | WARNING  | UNLOCKED            |
| IPW_412  | Check if dns listen udp and tcp 53 port                                        | IPW DNS server Port listern status  | CRITICAL | UNLOCKED            |
| IPW_413  | Check if dns listen at port 5300, enum listen 53 port                          | IPW ENUM server Port listern status | CRITICAL | UNLOCKED            |
| IPW_501  | Check if Diameter identity exists                                              | IPW Diameter identity exists        | WARNING  | UNLOCKED            |
| IPW_502  | Check Diameter stack DN                                                        | IPW Diameter stack DN               | WARNING  | UNLOCKED            |
| IPW_503  | IPW AAA Diameter log does not contain fatal or error messages                  | IPW AAA Diameter log                | WARNING  | UNLOCKED            |
| IPW_504  | IPW AAA SM log does not contain fatal or error messages                        | IPW AAA SM log                      | WARNING  | UNLOCKED            |
| IPW_505  | Check right status of IPW AAA                                                  | IPW AAA status is correct           | WARNING  | UNLOCKED            |
====================================================================================================================================================================

The explanation of the attributes are as following:

hcRuleId

The identity of Health Check Rule.

description

The purpose of Health Check Rule.

name

The name of health check rule.

severity

The severity of result for Health Check Rule.

administrativeState

The Administrate state of the Health Check Rule. If the value is LOCKED, this rule will not be executed at anytime. If the value is UNLOCKED, this rule will be executed in associated Health Check Job.

categories

The category this health check rule belongs to. One health check rule could be associated with several categories.

You can list rules for each category by executing the command:

>dn ManagedElement=<Node name>,SystemFunctions=1,HealthCheckM=1

(HealthCheckM=1)> show -m HcRule | filter -B 2 <CATEGORY_NAME>

Here is an example for DAILY, TROUBLESHOOT, SHORT:

>dn ManagedElement=<Node name>,SystemFunctions=1,HealthCheckM=1 
(HealthCheckM=1)> show -m HcRule | filter -B 2 DAILY

HcRule=IPW_409
   categories
      DAILY
   --
HcRule=IPW_410
   categories
      DAILY
   --
HcRule=IPW_411
   categories
      DAILY
   --
HcRule=IPW_412
   categories
      DAILY
   --
HcRule=IPW_413
   categories
      DAILY
(HealthCheckM=1)>show -m HcRule | filter -B 2 TROUBLESHOOT
HcRule=IPW_102
   categories
      TROUBLESHOOT
   --
HcRule=IPW_103
   categories
      TROUBLESHOOT
   --
HcRule=IPW_104
   categories
      TROUBLESHOOT
   --
HcRule=IPW_105
   categories
      TROUBLESHOOT
   --
HcRule=IPW_106
   categories
      TROUBLESHOOT
   --
HcRule=IPW_107
   categories
      TROUBLESHOOT
   --
HcRule=IPW_108
   categories
      TROUBLESHOOT
   --
HcRule=IPW_201
   categories
      TROUBLESHOOT
   --
HcRule=IPW_202
   categories
      TROUBLESHOOT
   --
HcRule=IPW_302
   categories
      TROUBLESHOOT
   --
HcRule=IPW_401
   categories
      TROUBLESHOOT
(HealthCheckM=1)>show -m HcRule | filter -B 2 SHORT
HcRule=IPW_001
   categories
      SHORT
   --
HcRule=IPW_002
   categories
      SHORT
   --
HcRule=IPW_003
   categories
      SHORT
   --
HcRule=IPW_004
   categories
      SHORT
   --
HcRule=IPW_101
   categories
      SHORT
   --
HcRule=IPW_301
   categories
      SHORT
   --
HcRule=IPW_402
   categories
      SHORT
   --
HcRule=IPW_403
   categories
      SHORT
   --
HcRule=IPW_404
   categories
      SHORT
   --
HcRule=IPW_405
   categories
      SHORT
   --
HcRule=IPW_501
   categories
      SHORT
   --
HcRule=IPW_502
   categories
      SHORT
   --
HcRule=IPW_503
   categories
      SHORT
   --
HcRule=IPW_504
   categories
      SHORT
   --
HcRule=IPW_505
   categories
      SHORT
(HealthCheckM=1)>

3   Health Check Actions

This section describes the general procedures of auto health check with the health check job.

Health check job (HcJob) is associated with a group of health check rules by specifying the job attribute of rulesCategories. When executing the health check job, all the associated health check rules will be evaluated. For more information about health check job, refer to the document Health Check Management.

To perform the health check, execute below steps:

  1. Create health check job.

    After IPWorks installation, there are three default health check jobs, named as “short”, “daily” and “troubleshoot”, and each one is associated with the corresponding category: SHORT, DAILY and TROUBLESHOOT.

    SHORT

    Rules that should be executed for quick checks.

    DAILY

    Rules that should be executed daily.

    TROUBLESHOOT

    Rules that should be executed for troubleshooting.

    For details about the rules, refer to HCF Integration Guideline.

    If the default jobs are not enough, refer to Section 3.1 to customize new health check job.

  2. Execute health check job.

    Refer to Section 3.2 to execute the health check job and wait until the execution is done.

  3. Examine health check result.

    Refer to Section 3.3 to examine the health check result and do analysis if the result shows that the system is NOT HEALTHY.

3.1   Create Health Check Job

There are three supported rule categories: SHORT, TROUBLESHOOT and DAILY, the value of attribute rulesCategories should be one of them.

3.1.1   Create Health Check Job with One Rule Category

Here is an example of creating Health Check Job with SHORT category:

>dn ManagedElement=<Node name>,SystemFunctions=1,HealthCheckM=1
(HealthCheckM=1)>configure
(config-HealthCheckM=1)> HcJob=Basic
(config-HcJob=Basic)> rulesCategories=SHORT
(config-HcJob=Basic)>commit
(HcJob=Basic)>show
HcJob=Basic
   localFileStorePath="/storage/no-backup/nbi_root/health_check"
   rulesCategories
      SHORT
   progressReport
      actionName=""
      progressInfo=""
      progressPercentage=0
      result=NOT_AVAILABLE
      resultInfo=""
   HcJobScheduler=1
(HcJob=Basic)>

3.1.2   Create Health Check Job with Multiple Rule Categories

Here is an example of creating a Health Check Job with SHORT, TROUBLESHOOT, DAILY categories:

>dn ManagedElement=<Node name>,SystemFunctions=1,HealthCheckM=1
(HealthCheckM=1)>configure
(config-HealthCheckM=1)>HcJob=All
(config-HcJob=All)>rulesCategories=SHORT
(config-HcJob=All)>rulesCategories=TROUBLESHOOT
(config-HcJob=All)>rulesCategories=DAILY
(config-HcJob=All)>commit
(HcJob=All)>show
HcJob=All
   localFileStorePath="/storage/no-backup/nbi_root/health_check"
   rulesCategories
      SHORT
      TROUBLESHOOT
      DAILY
   progressReport
      actionName=""
      progressInfo=""
      progressPercentage=0
      result=NOT_AVAILABLE
      resultInfo=""
   HcJobScheduler=1
(HcJob=All)>

3.2   Execute Health Check Job

Login Ericsson Command-Line Interface, and then navigate to some HcJob Managed Object, for example:

>dn ManagedElement=<Node name>,SystemFunctions=1,HealthCheckM=1,HcJob=Basic

(HcJob=Basic)>execute

3.3   Examine Health Check Job Result

You can check the job result by executing the command:

(HcJob=Basic)>show

HcJob=Basic
   lastReportFileName="_Basic_20161128T172153_man"
   lastRunTime="2016-11-28T17:21:53"
   localFileStorePath="/storage/no-backup/nbi_root/health_check"
   rulesCategories
      SHORT
   status=NOT_HEALTHY
   failedRules
      hcRule="hcRuleId=IPW_003"
      reason="Node have alarm, Severity Level MAJOR"
      severity=CRITICAL
   progressReport
      actionName="EXECUTE"
      progressInfo="Job Execution completed"
      progressPercentage=100
      result=SUCCESS
      resultInfo="Job correctly executed"
      state=FINISHED
      timeActionCompleted="2016-11-28T17:21:53"
      timeActionStarted="2016-11-28T17:21:32"
      timeOfLastStatusUpdate="2016-11-28T17:21:53"
   HcJobScheduler=1
(HcJob=Basic)>

The result of a job execution, in terms of success or failure, is available in the result attribute. It shows the value NOT_AVAILABLE until job completion.

Once the job is executed without problems, it shows SUCCESS. If the job execution terminates because of an error, it shows FAILURE.

If the result is FAILURE, it may be related to one of the following reasons:

In these cases, check the file stored under the folder /storage/no-backup/coremw/var/log/saflog/. If the root cause of the problem cannot be found, report it to next level of support and continue with the manual health check procedure.

If there are failed rules exist in the result of Health Check Job, refer to the recommended action in corresponding rule.

3.4   Delete Health Check Job

Delete the created health check job, execute the following command:

>ManagedElement=<Node name>,SystemFunctions=1,HealthCheckM=1
(HealthCheckM=1)>configure
(config-HealthCheckM=1)>no HcJob=Basic
(config-HealthCheckM=1)>commit
(HealthCheckM=1)>show

Double check whether Health Check Job is deleted.

3.5   Schedule Health Check Job

Based on the customized requirement, users can refer to the specific documents shown below to schedule health check job:


Reference List

IPWorks Library Document
[1] Schedule Health Check Job Based on Periodic Event .
[2] Schedule Health Check Job Based on Calendar Event.
[3] Schedule Single Health Check Job.
[4] Health Check Management.
[5] Unlock Local Authorization Method.
Documents in other Library
[6] HCF Troubleshooting Guideline, 1553-APR 901 0574/2
[7] HCF Integration Guideline, 1/1531-APR 901 0574/2


Copyright

© Ericsson AB 2017, 2018. All rights reserved. No part of this document may be reproduced in any form without the written permission of the copyright owner.

Disclaimer

The contents of this document are subject to revision without notice due to continued progress in methodology, design and manufacturing. Ericsson shall have no liability for any error or damage of any kind resulting from the use of this document.

Trademark List
All trademarks mentioned herein are the property of their respective owners. These are shown in the document Trademark Information.

    IPWorks Auto Health Check