LOTC Disk Usage

Contents

1Introduction
1.1Alarm Description
1.2Prerequisites

2

Procedure
2.1Analyzing Alarm
2.2Actions for /
2.3Actions for /boot
2.4Actions for /var/log
2.5Actions for /cluster

1   Introduction

This instruction concerns alarm handling.

1.1   Alarm Description

The alarm is raised when the disk use on a mount point exceeds a threshold value.

The possible alarm causes and fault locations are explained in Table 1.

Table 1    Alarm Causes

Alarm Cause

Description

Fault Reason

Fault Location

Impact

Disk use over threshold value

The disk use on a mount point exceeds a defined threshold value

Disk space is taken up by files (logs, dumps, and so on)

Files

Service performance degradation or service downtime

Note:  
This alarm can appear as a result of a maintenance activity.

The alarm attributes are listed and explained in Table 2.

Table 2    Alarm Attributes

Attribute Name

Attribute Value

Major Type

193

Minor Type

3341942787

Source

One of the following:


  • ManagedElement=<node_name>,HostName=<hostname>,ERIC-LINUX_CONTROL-*

  • ManagedElement=<node_name>,HostName=<hostname>,ERIC-LINUX_PAYLOAD-*

Specific Problem

LOTC Disk Usage

Event Type

environmentalAlarm (6)

Probable Cause

x736UnspecifiedReason (418)

Additional Text

Disk usage above threshold critical <threshold_value> (<disk_partition> (<usage_value>%))

Disk usage above threshold major <threshold_value> (<disk_partition> (<usage_value>%))

Disk usage above threshold minor <threshold_value> (<disk_partition> (<usage_value>%))

Perceived Severity

critical (3)

major (4)

minor (5)

1.2   Prerequisites

This section provides information on the documents, tools, and conditions that apply to the procedure.

1.2.1   Documents

This instruction references the following documents:

1.2.2   Tools

No tools are required.

1.2.3   Conditions

Before starting this procedure, ensure that the following condition is met:

2   Procedure

This section describes the procedure to follow when this alarm is received.

2.1   Analyzing Alarm

Do the following:

  1. Is the alarm severity Major or Critical?

    Yes: Continue with the next step.

    No: The alarm severity is Minor. No further immediate action is needed from this procedure. If the alarm severity level rises, re-enter this procedure.

  2. Log on to the host to access a Linux® shell:

    ssh <user>@<hostname> -p 22

    The hostname is part of alarm attribute Source.

  3. Show the current disk use:

    df -h -t ext3

    The following is an example output:

    Filesystem                                     Size  Used Avail Use% Mounted on
    /dev/sda4                                      2.0G  1.5G  427M  78% /
    /dev/sda3                                      9.9G  8.8G  568M  95% /var/log
    /dev/sda1                                      4.0G  226M  3.6G   6% /boot
    /dev/mapper/lde--cluster--vg-lde--cluster--lv  5.9G  4.6G  1.1G  82% /.cluster
  4. Check whether there are more disk partitions (than the one indicated in alarm attribute Additional Text) that are used above the threshold value indicated in alarm attribute Additional Text.
  5. Select the appropriate actions based on the observations in Step 4:

2.2   Actions for /

Do the following:

  1. Show the large files in /tmp that have remained unchanged, for example, at least three days and are larger than 100k:

    find /tmp –noleaf –mount –mtime +3 –size +100k –exec ls –lt {} \;

    The following is an example output:

    -rw-r----- 1 root root 385723000 Sep  1 17:00 /tmp/FILES/software3.tar.gz
  2. Delete the files returned in the output of the previous command:

    rm <file1> [<file2> …]

  3. Is the alarm cleared?

    Yes: Proceed with Step 11.

    No: Continue with the next step.

  4. Check the active alarm list.

    For information on how to check the active alarm list, refer to Check Alarm Status.

  5. Is the File Management, Max Size in FileGroup Exceeded alarm or the File Management, Number of Files in FileGroup Exceeded alarm raised?

    Yes: Further actions are outside the scope of this instruction. Follow the procedure in File Management, Max Size in FileGroup Exceeded or File Management, Number of Files in FileGroup Exceeded to clear the File Management alarm.

    No: Continue with the next step.

The disk partition use must be collected, but with other means than the standard data collection procedure that creates large files on the disk. Perform data collection as follows:

  1. Show the disk use:

    df -h -t ext3

    The following is an example output:

    Filesystem                                     Size  Used Avail Use% Mounted on
    /dev/sda4                                      2.0G  1.9G   59M  97% /
    /dev/sda3                                      9.9G  163M  9.2G   2% /var/log
    /dev/sda1                                      4.0G  226M  3.6G   6% /boot
    /dev/mapper/lde--cluster--vg-lde--cluster--lv  5.9G  4.6G  1.1G  82% /.cluster
  2. Show the directory use:

    du / -hx -d 2

    The following is an example output:

    76K     /var/filem
    38M     /var/lib
    59M     /var
    4.0K    /tmp/.ICE-unix
    12K     /tmp/UP
    4.0K    /tmp/lde-script-fifos
    4.0K    /tmp/.X11-unix
    1.1G    /tmp/FILES
    1.1G    /tmp
    4.0K    /.lv_snapshot
    117M    /opt/com
    5.7M    /opt/coremw
    6.2M    /opt/eric
    18M     /opt/lm
    2.0M    /opt/ericsson
    104K    /opt/lde-pm-counter
    8.0K    /opt/comsa
    4.2M    /opt/brf
    152M    /opt
    12K     /srv/www
    4.0K    /srv/ftp
    4.0K    /srv/tftpboot
    24K     /srv
    0       /sys
    4.0K    /selinux
  3. Show the files that have been recently produced by the system, for example (to show the files produced in the last two hours and are larger than 100k):

    find / -noleaf -mount -mmin -120 -size +100k -exec ls -lt {} \;

    The following is an example output:

    -rw-r--r-- 1 root root 839328 Sep  8 09:05 /var/opt/sec/sec.log
    -rw------- 1 root root 217016 Sep  8 09:05 /var/run/nscd/services
    -rw------- 1 root root 217016 Sep  8 10:15 /var/run/nscd/group
    -rw------- 1 root root 217016 Sep  8 10:16 /var/run/nscd/passwd
    -rw-r----- 1 root root 385723000 Sep  8 10:08 /tmp/FILES/software2.tar.gz
    -rw-r----- 1 root root 385723000 Sep  8 10:04 /tmp/FILES/software.tar.gz
    -rw-r--r-- 1 root root 110515 Sep  8 09:05 /opt/lm/log/maf.stdout
    -rw-r--r-- 1 root root 274789 Sep  8 09:05 /opt/lm/log/maf.log
  4. Show the files that have been on the system for a long time, for example (to show the files that have remained unchanged for at least three days and are larger than 100k):

    find / -noleaf -mount -mtime +3 -size +100k -exec ls -lt {} \;

    The following is an example output:

    -rw-r--r-- 1 root root 536396 Aug  9  2013 /lib/modules/3.0.82-0.7-default/updates/drbd.ko
    -rwxr-xr-x 1 root root 186910 Feb 14  2014 /lib/libm-2.11.3.so
    -rwxr-xr-x 1 root root 116348 May 11  2013 /lib/libgcc_s.so.1
    -rwxr-xr-x 1 root root 297300 Feb 21  2009 /lib/libncursesw.so.5.6
    -rwxr-xr-x 1 root root 190844 Feb 14  2014 /lib/libcidn-2.11.3.so
    -rwxr-xr-x 1 root root 156728 Aug  9  2013 /lib/drbd/drbdadm-83
    -rwxr-xr-x 1 root root 143987 Feb 14  2014 /lib/ld-2.11.3.so
    -rwxr-xr-x 1 root root 297288 Feb 21  2009 /lib/libncursesw.so.6.0
    -rwxr-xr-x 1 root root 1693100 Feb 14  2014 /lib/libc-2.11.3.so
    -rwxr-xr-x 1 root root 243848 Jul  9  2010 /lib/libsepol.so.1
    -r-xr-xr-x 1 root root 252520 May 29  2013 /lib/libdevmapper.so.1.02
    -rwxr-xr-x 1 root root 226508 Oct 15  2013 /lib/libreadline.so.5.2
    -rwxr-xr-x 1 root root 243856 Feb 21  2009 /lib/libncurses.so.5.6
    -rwxr-xr-x 1 root root 103167 Feb 14  2014 /lib/libnsl-2.11.3.so
    -rwxr-xr-x 1 root root 116776 Jul  8  2010 /lib/libselinux.so.1
    -rwxr-xr-x 1 root root 124942 Feb 14  2014 /lib/libpthread-2.11.3.so
  5. Collect the output of Step 6 through Step 9 and consult the next level of maintenance support. Further actions are outside the scope of this instruction.
  6. Job is completed.

2.3   Actions for /boot

Do the following:

  1. Perform data collection, refer to Data Collection Guideline. The /boot disk partition use and file creation information must be collected.
    Attention!

    Risk of data loss or data corruption.

    Do not delete any files unless required by the next level of maintenance support.

  2. Consult the next level of maintenance support. Further actions are outside the scope of this instruction.
  3. Job is completed.

2.4   Actions for /var/log

Do the following:

  1. Show the large files in /var/log that have remained unchanged, for example, at least three days and are larger than 100k:

    find /var/log –noleaf –mount –mtime +3 –size +100k –exec ls –lt {} \;

    The following is an example output:

    -rw-rw-r-- 1 root tty 524544 Aug 20 19:25 /var/log/wtmp.1
    -rw------- 1 root root 1254835 Aug 16 14:31 /var/log/SC-2/messages
    -rw------- 1 root root 147668 Aug 16 14:31 /var/log/SC-2/kernel
    -rw-r----- 1 root root 385723000 Sep  1 17:00 /var/log/mylog/mylog0
  2. Delete the files returned in the output of the previous command:

    rm <file1> [<file2> …]

  3. Is the alarm cleared?

    Yes: Proceed with Step 9.

    No: Continue with the next step.

The /var/log disk partition use must be collected, but with other means than the standard data collection procedure that creates large files on the disk. Perform data collection as follows:

  1. Show the disk use:

    df -h -t ext3

    The following is an example output:

    Filesystem                                     Size  Used Avail Use% Mounted on
    /dev/sda4                                      2.0G  1.5G  427M  78% /
    /dev/sda3                                      9.9G  8.8G  568M  95% /var/log
    /dev/sda1                                      4.0G  226M  3.6G   6% /boot
    /dev/mapper/lde--cluster--vg-lde--cluster--lv  5.9G  4.6G  1.1G  82% /.cluster
  2. Show the directory use:

    du /var/log -hx -d 2

    The following is an example output:

    8.0K    /var/log/YaST2
    3.9M    /var/log/SC-1
    1.4M    /var/log/SC-2
    4.0K    /var/log/lde-scripts
    12K     /var/log/audit
    16K     /var/log/lost+found
    4.0K    /var/log/sa
    4.0K    /var/log/krb5
    4.0K    /var/log/opensaf/saflog
    6.1M    /var/log/opensaf
    8.7G    /var/log/mylog
    8.7G    /var/log
  3. Show the files that have been recently produced by the system, for example (to show the files produced in the last two hours and are larger than 100k):

    find /var/log -noleaf -mount -mmin -120 -size +100k -exec ls -lt {} \;

    The following is an example output:

    -rw------- 1 root root 3639803 Sep  8 11:05 /var/log/SC-1/messages
    -rw------- 1 root root 282779 Sep  8 10:04 /var/log/SC-1/kernel
    -rw-r--r-- 1 root root 1228803 Sep  8 10:57 /var/log/opensaf/mds.log
    -rw-r----- 1 root root 385723000 Sep  8 10:42 /var/log/mylog/mylog1
    -rw-r----- 1 root root 385723000 Sep  8 10:48 /var/log/mylog/mylog5
    -rw-r----- 1 root root 385723000 Sep  8 10:48 /var/log/mylog/mylog4
    -rw-r----- 1 root root 385723000 Sep  8 10:49 /var/log/mylog/mylog7
    -rw-r----- 1 root root 385723000 Sep  8 10:45 /var/log/mylog/mylog3
    -rw-r--r-- 1 root root 3085793280 Sep  8 10:58 /var/log/mylog/mylog.tar
    -rw-r----- 1 root root 385723000 Sep  8 10:48 /var/log/mylog/mylog6
    -rw-r----- 1 root root 385723000 Sep  8 10:44 /var/log/mylog/mylog2
    -rw-r--r-- 1 root root 3085793280 Sep  8 10:54 /var/log/mylog/mylog2.tar
  4. Show the files that have been on the system for a long time, for example (to show the files that have remained unchanged for at least three days and are larger than 100k):

    find /var/log -noleaf -mount -mtime +3 -size +100k -exec ls -lt {} \;

    The following is an example output:

    -rw-rw-r-- 1 root tty 524544 Aug 20 19:25 /var/log/wtmp.1
    -rw------- 1 root root 1254835 Aug 16 14:31 /var/log/SC-2/messages
    -rw------- 1 root root 147668 Aug 16 14:31 /var/log/SC-2/kernel
    -rw-r----- 1 root root 385723000 Sep  1 17:00 /var/log/mylog/mylog0
  5. Collect the output of Step 4 through Step 7 and consult the next level of maintenance support. Further actions are outside the scope of this instruction.
  6. Job is completed.

2.5   Actions for /cluster

Do the following:

  1. Review the contents of Linux directory /cluster/home/user, which is used by accounts that can log on to the Managed Element (ME):

    du /cluster/home –hx -d 2

    The following is an example output:

    4.0K    /cluster/home/sec/certificates
    8.0K    /cluster/home/sec
    8.0K    /cluster/home/ericuser/.ssh
    20K     /cluster/home/ericuser
    4.0K    /cluster/home/coremw_appdata
    4.0K    /cluster/home/comsa/repository
    4.0K    /cluster/home/comsa/backup
    12K     /cluster/home/comsa
    4.0K    /cluster/home/nohome
    52K     /cluster/home
  2. Contact the account owners and request them to delete the unwanted files.
  3. Is the alarm cleared?

    Yes: Proceed with Step 14.

    No: Continue with the next step.

  4. List the backups locally stored in the ME.

    For information on how to list the backups, refer to List Backups.

  5. Is any locally stored manual or scheduled backup no longer required on the ME?

    Yes: Continue with the next step.

    No: Proceed with Step 9.

    Note:  
    A local backup file is not required if there is no immediate need to restore it on the ME or once it has been exported to a remote file storage.

  6. If needed, export to the remote file storage the following locally stored backups:
    • Backups that must be preserved and have not been exported yet
    • Backups that have been deleted from the remote file storage

    For information on how to export a backup, refer to Export Backup.

  7. Delete any locally stored backup not required on the ME.
    Attention!

    Risk of data loss or data corruption.

    Do not delete backups listed in attribute restoreEscalationList.

  8. Is the alarm cleared?

    Yes: Proceed with Step 14.

    No: Continue with the next step.

The /cluster disk partition use must be collected, but with other means than the standard data collection procedure that creates large files on the disk. Perform data collection as follows:

  1. Show the disk use:

    df -h -t ext3

    The following is an example output:

    Filesystem                                     Size  Used Avail Use% Mounted on
    /dev/sda4                                      2.0G  751M  1.2G  40% /
    /dev/sda3                                      9.9G  163M  9.2G   2% /var/log
    /dev/sda1                                      4.0G  226M  3.6G   6% /boot
    /dev/mapper/lde--cluster--vg-lde--cluster--lv  5.9G  5.3G  305M  95% /.cluster
  2. Show the directory use:

    du /cluster -hx -d 2

    The following is an example output:

    4.0K    /cluster/home/sec/certificates
    8.0K    /cluster/home/sec
    8.0K    /cluster/home/ericuser/.ssh
    20K     /cluster/home/ericuser
    4.0K    /cluster/home/coremw_appdata
    4.0K    /cluster/home/comsa/repository
    4.0K    /cluster/home/comsa/backup
    12K     /cluster/home/comsa
    4.0K    /cluster/home/nohome
    52K     /cluster/home
  1. Show the files that have been recently produced by the system, for example (to show the files produced in the last two hours and are larger than 100k):

    find /cluster -noleaf -mount -mmin -120 -size +100k -exec ls -lt {} \;

    The following is an example output:

    -rw------- 1 root root 728064 Sep  8 09:35 /cluster/storage/clear/coremw/etc/imm.db
    -rw-r--r-- 1 root root 1361281 Sep  8 09:05 /cluster/storage/clear/com-apr9010443/log/SC-1/com.log
    -rw-r--r-- 1 root root 143586 Sep  8 09:05 /cluster/storage/clear/com-apr9010443/log/SC-1/com.1.⇒
    stdout
  2. Show the files that have been on the system for a long time, for example (to show the files that have remained unchanged for at least three days and are larger than 100k):

    find /cluster -noleaf -mount -mtime +3 -size +100k -exec ls -lt {} \;

    The following is an example output:

    -rw-r--r-- 2 65476 16416 2017443 Jun 30 12:31 /cluster/rpms/com-4.0-17.x86_64.58f8890e707a834e68⇒
    6949a6a8f14ed3.rpm
    -rw-r--r-- 2 root root 188508 Aug  3 13:04 /cluster/rpms/opensaf-log-server-4.4.0-R8C01.5044.79.⇒
    x86_64.3a3faffc91598bdcf5ce4db849ff0994.rpm
    -rw-rw-r-- 2 72971 1060 923770 Jul 14 18:23 /cluster/rpms/LmServer-CXP9022159-3-R2B01.x86_64.816⇒
    e465af468d20d493902e6f2b0d88b.rpm
    -rw-r--r-- 2 65476 16416 4417456 Jun 30 12:31 /cluster/rpms/maf-R2-A47.x86_64.d9aa55b289fcdcea46⇒
    355070c03600f3.rpm
    -rw-r--r-- 2 root root 1175601 Aug  3 13:04 /cluster/rpms/COREMW_SC-R8C01-3.4.x86_64.15ad6458c09⇒
    fc984031dfe9d27705d9c.rpm
    -rw-r--r-- 2 root root 174111 Aug  3 13:04 /cluster/rpms/COREMW_COMMON-R8C01-3.4.x86_64.4a0c4fcf⇒
    60c6d920f8e2dd84b1186cfc.rpm
    -rw-r--r-- 2 72971 1060 196667 May 26 15:28 /cluster/rpms/BrfCmwA-CXP9018859-1-R3C03.x86_64.62d7⇒
    f6fe5267fe601e31fde167fbb8f3.rpm
    -rw-r--r-- 2 65476 16416 114995 Jun 30 12:31 /cluster/rpms/com_security_mgmt_tls-4.0-17.x86_64.8⇒
    be9e39343d432487b70d5eca51737c2.rpm
    -rw-r--r-- 2 root root 260850 Aug  3 13:04 /cluster/rpms/opensaf-imm-libs-4.4.0-R8C01.5044.79.x8⇒
    ⇒6_64.0609a2984262051a8719197354a4ce50.rpm
    -rw-r--r-- 2 root root 95201122 Jul  8 04:47 /cluster/rpms/linux-control-R7B02-0.x86_64.961b0971⇒
    99a6090bdf2fccea81694818.rpm
    -rw-rw-r-- 2 72971 1060 883649 Jul 14 18:23 /cluster/rpms/lm-maf-R2-A42.x86_64.3c72cca19d14c1b99⇒
    47879e397df8c22.rpm
    -rw-r--r-- 2 65476 16416 416760 Jun 30 12:31 /cluster/rpms/com_pm-4.0-17.x86_64.32ea3be84c0bdcfd⇒
    7de0a817f47dc071.rpm
    -rw-r--r-- 2 root root 164473 Aug  3 13:04 /cluster/rpms/opensaf-ckpt-nodedirector-4.4.0-R8C01.5⇒
    044.79.x86_64.047399a568fdb504e71f16dc5d06c619.rpm
    -rw-r--r-- 2 root root 161262 Aug  3 13:04 /cluster/rpms/opensaf-clm-server-4.4.0-R8C01.5044.79.⇒
    x86_64.c9a6fe6335fb31e70f59c537892964db.rpm
    -rw-r--r-- 2 root root 177331 Aug  3 13:04 /cluster/rpms/opensaf-ckpt-director-4.4.0-R8C01.5044.⇒
    79.x86_64.5fa6d7453fa68af65d1bd31ebc6711d8.rpm
    -rw-r--r-- 2 65476 16416 3403651 Jun 30 12:31 /cluster/rpms/com_cli-4.0-17.x86_64.b5adb6a8355dc3⇒
    0df7420b7803c63510.rpm
    -rw-r--r-- 2 65476 16416 876162 Jun 30 12:31 /cluster/rpms/com_file_management-4.0-17.x86_64.26d⇒
    3d1042babdbc8823d0cbf71e0163c.rpm
    -r--r--r-- 2 root root 105438386 Jan  1  2007 /cluster/rpms/linux-payload-R7B02-0.x86_64.rpm
    -rw-r--r-- 2 root root 777761 Aug  3 15:12 /cluster/rpms/SEC-CERT-AGENT-CXP9024180-R1B02-1.x86_6⇒
    4.1b346c964f5e31a7c2b1e73c1ccc57d6.rpm
    -rw-r--r-- 2 root root 705683 Aug  3 13:04 /cluster/rpms/opensaf-imm-nodedirector-4.4.0-R8C01.50⇒
    44.79.x86_64.f17debdfcbf7a9822598f08eab9a92ab.rpm
    -rw-r--r-- 2 root root 403900 Aug  3 13:04 /cluster/rpms/opensaf-libs-4.4.0-R8C01.5044.79.x86_64⇒
    .f815bcddbd946cdc1632085e10112d48.rpm
    -rw-r--r-- 2 root root 490778 Aug  3 13:04 /cluster/rpms/opensaf-imm-director-4.4.0-R8C01.5044.7⇒
    9.x86_64.de2f9380df3e2884ca4d812370b24466.rpm
    -rw-r--r-- 2 72971 1060 567717 May 26 15:28 /cluster/rpms/Brfc-CXP9018859-1-R3C03.x86_64.6164da7⇒
    96b0ba49ced0a9d5127c5f08e.rpm
    -rw-r--r-- 2 72971 1060 784200 Apr 28 17:40 /cluster/rpms/LmSa-CXP9021377_1-R1D02.x86_64.2f47eb6⇒
    bdf8f55d2090dc46008ffd4e3.rpm
    -rw-r--r-- 2 root root 852524 Aug  3 13:04 /cluster/rpms/opensaf-pm-director-R8C01-3.4.x86_64.f9⇒
    b7a6c1c577776c94b9a0f44817b57a.rpm
    -rw-r--r-- 2 109383 1115 3027958 Jun 17 10:41 /cluster/rpms/ComSa-CXP9017697_3-R5B02.x86_64.d3cb⇒
    4c6d881d10a205b9715345fdcd1e.rpm
    -rw-r--r-- 2 65476 16416 2075296 Jun 30 12:31 /cluster/rpms/com_netconf-4.0-17.x86_64.89ce918a98⇒
    37b4a53ec4e47fc72354b0.rpm
    -rw-r--r-- 2 65476 16416 4757828 Jun 30 12:09 /cluster/rpms/poco-1.4-5p03.x86_64.5986e37f6820312⇒
    520f1b09464e3af5b.rpm
    -rw-r--r-- 2 65476 16416 1543980 Jun 30 12:31 /cluster/rpms/maf-optional-R2-A47.x86_64.9d15d8483⇒
    10ad26ee3cae4b18d2fe299.rpm
  3. Collect the output of Step 9 through Step 12 and consult the next level of maintenance support. Further actions are outside the scope of this instruction.
  4. Job is completed.


Copyright

© Ericsson AB 2014, 2015. All rights reserved. No part of this document may be reproduced in any form without the written permission of the copyright owner.

Disclaimer

The contents of this document are subject to revision without notice due to continued progress in methodology, design and manufacturing. Ericsson shall have no liability for any error or damage of any kind resulting from the use of this document.

Trademark List
All trademarks mentioned herein are the property of their respective owners. These are shown in the document Trademark Information.

    LOTC Disk Usage