1 Alarm Description
The alarm is raised when the disk use on a mount point exceeds a threshold value.
|
Alarm Cause |
Description |
Fault Reason |
Fault Location |
Impact |
|---|---|---|---|---|
|
Disk use over threshold value |
The disk use on a mount point exceeds a defined threshold value |
Disk space is taken up by files (logs, dumps, and so on) |
Files |
Service performance degradation or service downtime |
- Note:
- This alarm can appear as a result of a maintenance activity.
2 Procedure
2.1 Handle Alarm LOTC Disk Usage
Prerequisites
- This instruction references the following documents:
- No tools are required.
- The following condition must apply:
- The alarm is raised.
Steps
- Is the alarm severity major or critical?
Yes: Continue with the next step.
No: The alarm severity is minor; no further immediate action is needed from this procedure. If the alarm severity level rises, re-enter this procedure.
- Log on to the host to access a Linux® shell, for example:
ssh <user>@<hostname> -p 7022
The hostname is part of alarm attribute Source.
- Show the current disk use:
df -h -t ext3
The following is an example output:
Filesystem Size Used Avail Use% Mounted on /dev/sda4 2.0G 1.5G 427M 78% / /dev/sda3 9.9G 8.8G 568M 95% /var/log /dev/sda1 4.0G 226M 3.6G 6% /boot /dev/mapper/lde--cluster--vg-lde--cluster--lv 5.9G 4.6G 1.1G 82% /.cluster
- Check whether there are more disk partitions (than the one indicated in alarm attribute Additional Text) that are used above the threshold value.
- Select the appropriate actions based on the observations
in Step 4:
- If the / disk partition is used over the threshold value, proceed with Section 2.2 Handle Partition '/' over Threshold Value.
- If the /boot disk partition is used over the threshold value, proceed with Section 2.3 Handle Partition '/boot' over Threshold Value.
- If the /var/log disk partition is used over the threshold value, proceed with Section 2.4 Handle Partition '/var/log' over Threshold Value.
- If the /cluster disk partition is used over the threshold value, proceed with Section 2.5 Handle Partition '/cluster' over Threshold Value.
2.2 Handle Partition '/' over Threshold Value
Steps
- Show the large files in /tmp that have remained unchanged, for example, at least three days and
are larger than 100k:
find /tmp –noleaf –mount –mtime +3 –size +100k –exec ls –lt {} \;
The following is an example output:
-rw-r----- 1 root root 385723000 Sep 1 17:00 /tmp/FILES/software3.tar.gz
- Delete the files returned in the output of the previous
command:
rm <file1> [<file2> …]
- Is the alarm cleared?
Yes: Proceed with Step 7.
No: Continue with the next step.
- Are there any File Management-related alarms raised?
Yes: Act on those alarms first. Further actions are outside the scope of this instruction.
No: Continue with the next step.
- The disk partition use
must be collected, but with other means than the standard data collection
procedure that creates large files on the disk. Perform data collection
as follows:
- Show the disk use:
df -h -t ext3
The following is an example output:
Filesystem Size Used Avail Use% Mounted on /dev/sda4 2.0G 1.9G 59M 97% / /dev/sda3 9.9G 163M 9.2G 2% /var/log /dev/sda1 4.0G 226M 3.6G 6% /boot /dev/mapper/lde--cluster--vg-lde--cluster--lv 5.9G 4.6G 1.1G 82% /.cluster
- Show the directory use:
du / -hx -d 2
The following is an example output:
76K /var/filem 38M /var/lib 59M /var 4.0K /tmp/.ICE-unix 12K /tmp/UP 4.0K /tmp/lde-script-fifos 4.0K /tmp/.X11-unix 1.1G /tmp/FILES 1.1G /tmp 4.0K /.lv_snapshot 117M /opt/com 5.7M /opt/coremw 6.2M /opt/eric 18M /opt/lm 2.0M /opt/ericsson 104K /opt/lde-pm-counter 8.0K /opt/comsa 4.2M /opt/brf 152M /opt 12K /srv/www 4.0K /srv/ftp 4.0K /srv/tftpboot 24K /srv 0 /sys 4.0K /selinux
- Show the files that have been recently produced by the
system, for example (to show the files produced in the last two hours
and are larger than 100k):
find / -noleaf -mount -mmin -120 -size +100k -exec ls -lt {} \;
The following is an example output:
-rw-r--r-- 1 root root 839328 Sep 8 09:05 /var/opt/sec/sec.log -rw------- 1 root root 217016 Sep 8 09:05 /var/run/nscd/services -rw------- 1 root root 217016 Sep 8 10:15 /var/run/nscd/group -rw------- 1 root root 217016 Sep 8 10:16 /var/run/nscd/passwd -rw-r----- 1 root root 385723000 Sep 8 10:08 /tmp/FILES/software2.tar.gz -rw-r----- 1 root root 385723000 Sep 8 10:04 /tmp/FILES/software.tar.gz -rw-r--r-- 1 root root 110515 Sep 8 09:05 /opt/lm/log/maf.stdout -rw-r--r-- 1 root root 274789 Sep 8 09:05 /opt/lm/log/maf.log
- Show the files that have been on the system for a long
time, for example (to show the files that have remained unchanged
for at least three days and are larger than 100k):
find / -noleaf -mount -mtime +3 -size +100k -exec ls -lt {} \;
The following is an example output:
-rw-r--r-- 1 root root 536396 Aug 9 2013 /lib/modules/3.0.82-0.7-default/updates/drbd.ko -rwxr-xr-x 1 root root 186910 Feb 14 2014 /lib/libm-2.11.3.so -rwxr-xr-x 1 root root 116348 May 11 2013 /lib/libgcc_s.so.1 -rwxr-xr-x 1 root root 297300 Feb 21 2009 /lib/libncursesw.so.5.6 -rwxr-xr-x 1 root root 190844 Feb 14 2014 /lib/libcidn-2.11.3.so -rwxr-xr-x 1 root root 156728 Aug 9 2013 /lib/drbd/drbdadm-83 -rwxr-xr-x 1 root root 143987 Feb 14 2014 /lib/ld-2.11.3.so -rwxr-xr-x 1 root root 297288 Feb 21 2009 /lib/libncursesw.so.6.0 -rwxr-xr-x 1 root root 1693100 Feb 14 2014 /lib/libc-2.11.3.so -rwxr-xr-x 1 root root 243848 Jul 9 2010 /lib/libsepol.so.1 -r-xr-xr-x 1 root root 252520 May 29 2013 /lib/libdevmapper.so.1.02 -rwxr-xr-x 1 root root 226508 Oct 15 2013 /lib/libreadline.so.5.2 -rwxr-xr-x 1 root root 243856 Feb 21 2009 /lib/libncurses.so.5.6 -rwxr-xr-x 1 root root 103167 Feb 14 2014 /lib/libnsl-2.11.3.so -rwxr-xr-x 1 root root 116776 Jul 8 2010 /lib/libselinux.so.1 -rwxr-xr-x 1 root root 124942 Feb 14 2014 /lib/libpthread-2.11.3.so
- Show the disk use:
- Collect the different outputs from Step 5 and consult the next level of maintenance support. Further actions are outside the scope of this instruction.
- Job is completed.
2.3 Handle Partition '/boot' over Threshold Value
Steps
- Perform data collection, refer to Data Collection Guideline. The /boot disk partition use and file
creation information must be collected.[Missing image alert.eps]Attention!
Risk of data loss or data corruption.
Do not delete any files unless required by the next level of maintenance support.
- Consult the next level of maintenance support. Further actions are outside the scope of this instruction.
- Job is completed.
2.4 Handle Partition '/var/log' over Threshold Value
Steps
- Show the large files in /var/log that have remained unchanged, for example, at least three days and
are larger than 100k:
find /var/log –noleaf –mount –mtime +3 –size +100k –exec ls –lt {} \;
The following is an example output:
-rw-rw-r-- 1 root tty 524544 Aug 20 19:25 /var/log/wtmp.1 -rw------- 1 root root 1254835 Aug 16 14:31 /var/log/SC-2/messages -rw------- 1 root root 147668 Aug 16 14:31 /var/log/SC-2/kernel -rw-r----- 1 root root 385723000 Sep 1 17:00 /var/log/mylog/mylog0
- Delete the files returned in the output of the previous
command:
rm <file1> [<file2> …]
- Is the alarm cleared?
Yes: Proceed with Step 6.
No: Continue with the next step.
- The /var/log disk partition use must be collected, but with other means than
the standard data collection procedure that creates large files on
the disk. Perform data collection as follows:
- Show the disk use:
df -h -t ext3
The following is an example output:
Filesystem Size Used Avail Use% Mounted on /dev/sda4 2.0G 1.5G 427M 78% / /dev/sda3 9.9G 8.8G 568M 95% /var/log /dev/sda1 4.0G 226M 3.6G 6% /boot /dev/mapper/lde--cluster--vg-lde--cluster--lv 5.9G 4.6G 1.1G 82% /.cluster
- Show the directory use:
du /var/log -hx -d 2
The following is an example output:
8.0K /var/log/YaST2 3.9M /var/log/SC-1 1.4M /var/log/SC-2 4.0K /var/log/lde-scripts 12K /var/log/audit 16K /var/log/lost+found 4.0K /var/log/sa 4.0K /var/log/krb5 4.0K /var/log/opensaf/saflog 6.1M /var/log/opensaf 8.7G /var/log/mylog 8.7G /var/log
- Show the files that have been recently produced by the
system, for example (to show the files produced in the last two hours
and are larger than 100k):
find /var/log -noleaf -mount -mmin -120 -size +100k -exec ls -lt {} \;
The following is an example output:
-rw------- 1 root root 3639803 Sep 8 11:05 /var/log/SC-1/messages -rw------- 1 root root 282779 Sep 8 10:04 /var/log/SC-1/kernel -rw-r--r-- 1 root root 1228803 Sep 8 10:57 /var/log/opensaf/mds.log -rw-r----- 1 root root 385723000 Sep 8 10:42 /var/log/mylog/mylog1 -rw-r----- 1 root root 385723000 Sep 8 10:48 /var/log/mylog/mylog5 -rw-r----- 1 root root 385723000 Sep 8 10:48 /var/log/mylog/mylog4 -rw-r----- 1 root root 385723000 Sep 8 10:49 /var/log/mylog/mylog7 -rw-r----- 1 root root 385723000 Sep 8 10:45 /var/log/mylog/mylog3 -rw-r--r-- 1 root root 3085793280 Sep 8 10:58 /var/log/mylog/mylog.tar -rw-r----- 1 root root 385723000 Sep 8 10:48 /var/log/mylog/mylog6 -rw-r----- 1 root root 385723000 Sep 8 10:44 /var/log/mylog/mylog2 -rw-r--r-- 1 root root 3085793280 Sep 8 10:54 /var/log/mylog/mylog2.tar
- Show the files that have been on the system for a long
time, for example (to show the files that have remained unchanged
for at least three days and are larger than 100k):
find /var/log -noleaf -mount -mtime +3 -size +100k -exec ls -lt {} \;
The following is an example output:
-rw-rw-r-- 1 root tty 524544 Aug 20 19:25 /var/log/wtmp.1 -rw------- 1 root root 1254835 Aug 16 14:31 /var/log/SC-2/messages -rw------- 1 root root 147668 Aug 16 14:31 /var/log/SC-2/kernel -rw-r----- 1 root root 385723000 Sep 1 17:00 /var/log/mylog/mylog0
- Show the disk use:
- Collect the different outputs from Step 4 and consult the next level of maintenance support. Further actions are outside the scope of this instruction.
- Job is completed.
2.5 Handle Partition '/cluster' over Threshold Value
Steps
- Review the contents of Linux directory /cluster/home/user, which is used by accounts that can log on to the Managed Element
(ME):
du /cluster/home –hx -d 2
The following is an example output:
4.0K /cluster/home/sec/certificates 8.0K /cluster/home/sec 8.0K /cluster/home/ericuser/.ssh 20K /cluster/home/ericuser 4.0K /cluster/home/coremw_appdata 4.0K /cluster/home/comsa/repository 4.0K /cluster/home/comsa/backup 12K /cluster/home/comsa 4.0K /cluster/home/nohome 52K /cluster/home
- Contact the account owners and request them to delete the unwanted files.
- Is the alarm cleared?
Yes: Proceed with Step 11.
No: Continue with the next step.
- List the backups locally stored in the ME.
For information on how to list the backups, refer to List Backups.
- Is any locally stored manual or scheduled backup no longer
required on the ME?
Yes: Continue with the next step.
No: Proceed with Step 9.
- Note:
- A local backup file is not required if there is no immediate need to restore it on the ME or once it has been exported to a remote file storage.
- If needed, export to the remote file storage the following
locally stored backups:
- Backups that must be preserved and have not been exported yet
- Backups that have been deleted from the remote file storage
For information on how to export a backup, refer to Export Backup.
- Delete any locally stored backup not required on the ME.[Missing image alert.eps]Attention!
Risk of data loss or data corruption.
Do not delete backups listed in attribute restoreEscalationList.
- Is the alarm cleared?
Yes: Proceed with Step 11.
No: Continue with the next step.
- The /cluster disk partition use must be collected, but with other means than
the standard data collection procedure that creates large files on
the disk. Perform data collection as follows:
- Show the disk use:
df -h -t ext3
The following is an example output:
Filesystem Size Used Avail Use% Mounted on /dev/sda4 2.0G 751M 1.2G 40% / /dev/sda3 9.9G 163M 9.2G 2% /var/log /dev/sda1 4.0G 226M 3.6G 6% /boot /dev/mapper/lde--cluster--vg-lde--cluster--lv 5.9G 5.3G 305M 95% /.cluster
- Show the directory use:
du /cluster -hx -d 2
The following is an example output:
4.0K /cluster/home/sec/certificates 8.0K /cluster/home/sec 8.0K /cluster/home/ericuser/.ssh 20K /cluster/home/ericuser 4.0K /cluster/home/coremw_appdata 4.0K /cluster/home/comsa/repository 4.0K /cluster/home/comsa/backup 12K /cluster/home/comsa 4.0K /cluster/home/nohome 52K /cluster/home
- Show the files that have been recently produced by the
system, for example (to show the files produced in the last two hours
and are larger than 100k):
find /cluster -noleaf -mount -mmin -120 -size +100k -exec ls -lt {} \;
The following is an example output:
-rw------- 1 root root 728064 Sep 8 09:35 /cluster/storage/clear/coremw/etc/imm.db -rw-r--r-- 1 root root 1361281 Sep 8 09:05 /cluster/storage/clear/com-apr9010443/log/SC-1/com.log -rw-r--r-- 1 root root 143586 Sep 8 09:05 /cluster/storage/clear/com-apr9010443/log/SC-1/com.1.⇒ stdout
- Show the files that have been on the system for a long
time, for example (to show the files that have remained unchanged
for at least three days and are larger than 100k):
find /cluster -noleaf -mount -mtime +3 -size +100k -exec ls -lt {} \;
The following is an example output:
-rw-r--r-- 2 65476 16416 2017443 Jun 30 12:31 /cluster/rpms/com-4.0-17.x86_64.58f8890e707a834e68⇒ 6949a6a8f14ed3.rpm -rw-r--r-- 2 root root 188508 Aug 3 13:04 /cluster/rpms/opensaf-log-server-4.4.0-R8C01.5044.79.⇒ x86_64.3a3faffc91598bdcf5ce4db849ff0994.rpm -rw-rw-r-- 2 72971 1060 923770 Jul 14 18:23 /cluster/rpms/LmServer-CXP9022159-3-R2B01.x86_64.816⇒ e465af468d20d493902e6f2b0d88b.rpm -rw-r--r-- 2 65476 16416 4417456 Jun 30 12:31 /cluster/rpms/maf-R2-A47.x86_64.d9aa55b289fcdcea46⇒ 355070c03600f3.rpm -rw-r--r-- 2 root root 1175601 Aug 3 13:04 /cluster/rpms/COREMW_SC-R8C01-3.4.x86_64.15ad6458c09⇒ fc984031dfe9d27705d9c.rpm -rw-r--r-- 2 root root 174111 Aug 3 13:04 /cluster/rpms/COREMW_COMMON-R8C01-3.4.x86_64.4a0c4fcf⇒ 60c6d920f8e2dd84b1186cfc.rpm -rw-r--r-- 2 72971 1060 196667 May 26 15:28 /cluster/rpms/BrfCmwA-CXP9018859-1-R3C03.x86_64.62d7⇒ f6fe5267fe601e31fde167fbb8f3.rpm -rw-r--r-- 2 65476 16416 114995 Jun 30 12:31 /cluster/rpms/com_security_mgmt_tls-4.0-17.x86_64.8⇒ be9e39343d432487b70d5eca51737c2.rpm -rw-r--r-- 2 root root 260850 Aug 3 13:04 /cluster/rpms/opensaf-imm-libs-4.4.0-R8C01.5044.79.x8⇒ ⇒6_64.0609a2984262051a8719197354a4ce50.rpm -rw-r--r-- 2 root root 95201122 Jul 8 04:47 /cluster/rpms/linux-control-R7B02-0.x86_64.961b0971⇒ 99a6090bdf2fccea81694818.rpm -rw-rw-r-- 2 72971 1060 883649 Jul 14 18:23 /cluster/rpms/lm-maf-R2-A42.x86_64.3c72cca19d14c1b99⇒ 47879e397df8c22.rpm -rw-r--r-- 2 65476 16416 416760 Jun 30 12:31 /cluster/rpms/com_pm-4.0-17.x86_64.32ea3be84c0bdcfd⇒ 7de0a817f47dc071.rpm -rw-r--r-- 2 root root 164473 Aug 3 13:04 /cluster/rpms/opensaf-ckpt-nodedirector-4.4.0-R8C01.5⇒ 044.79.x86_64.047399a568fdb504e71f16dc5d06c619.rpm -rw-r--r-- 2 root root 161262 Aug 3 13:04 /cluster/rpms/opensaf-clm-server-4.4.0-R8C01.5044.79.⇒ x86_64.c9a6fe6335fb31e70f59c537892964db.rpm -rw-r--r-- 2 root root 177331 Aug 3 13:04 /cluster/rpms/opensaf-ckpt-director-4.4.0-R8C01.5044.⇒ 79.x86_64.5fa6d7453fa68af65d1bd31ebc6711d8.rpm -rw-r--r-- 2 65476 16416 3403651 Jun 30 12:31 /cluster/rpms/com_cli-4.0-17.x86_64.b5adb6a8355dc3⇒ 0df7420b7803c63510.rpm -rw-r--r-- 2 65476 16416 876162 Jun 30 12:31 /cluster/rpms/com_file_management-4.0-17.x86_64.26d⇒ 3d1042babdbc8823d0cbf71e0163c.rpm -r--r--r-- 2 root root 105438386 Jan 1 2007 /cluster/rpms/linux-payload-R7B02-0.x86_64.rpm -rw-r--r-- 2 root root 777761 Aug 3 15:12 /cluster/rpms/SEC-CERT-AGENT-CXP9024180-R1B02-1.x86_6⇒ 4.1b346c964f5e31a7c2b1e73c1ccc57d6.rpm -rw-r--r-- 2 root root 705683 Aug 3 13:04 /cluster/rpms/opensaf-imm-nodedirector-4.4.0-R8C01.50⇒ 44.79.x86_64.f17debdfcbf7a9822598f08eab9a92ab.rpm -rw-r--r-- 2 root root 403900 Aug 3 13:04 /cluster/rpms/opensaf-libs-4.4.0-R8C01.5044.79.x86_64⇒ .f815bcddbd946cdc1632085e10112d48.rpm -rw-r--r-- 2 root root 490778 Aug 3 13:04 /cluster/rpms/opensaf-imm-director-4.4.0-R8C01.5044.7⇒ 9.x86_64.de2f9380df3e2884ca4d812370b24466.rpm -rw-r--r-- 2 72971 1060 567717 May 26 15:28 /cluster/rpms/Brfc-CXP9018859-1-R3C03.x86_64.6164da7⇒ 96b0ba49ced0a9d5127c5f08e.rpm -rw-r--r-- 2 72971 1060 784200 Apr 28 17:40 /cluster/rpms/LmSa-CXP9021377_1-R1D02.x86_64.2f47eb6⇒ bdf8f55d2090dc46008ffd4e3.rpm -rw-r--r-- 2 root root 852524 Aug 3 13:04 /cluster/rpms/opensaf-pm-director-R8C01-3.4.x86_64.f9⇒ b7a6c1c577776c94b9a0f44817b57a.rpm -rw-r--r-- 2 109383 1115 3027958 Jun 17 10:41 /cluster/rpms/ComSa-CXP9017697_3-R5B02.x86_64.d3cb⇒ 4c6d881d10a205b9715345fdcd1e.rpm -rw-r--r-- 2 65476 16416 2075296 Jun 30 12:31 /cluster/rpms/com_netconf-4.0-17.x86_64.89ce918a98⇒ 37b4a53ec4e47fc72354b0.rpm -rw-r--r-- 2 65476 16416 4757828 Jun 30 12:09 /cluster/rpms/poco-1.4-5p03.x86_64.5986e37f6820312⇒ 520f1b09464e3af5b.rpm -rw-r--r-- 2 65476 16416 1543980 Jun 30 12:31 /cluster/rpms/maf-optional-R2-A47.x86_64.9d15d8483⇒ 10ad26ee3cae4b18d2fe299.rpm
- Show the disk use:
- Collect the different outputs from Step 9 and consult the next level of maintenance support. Further actions are outside the scope of this instruction.
- Job is completed.

Contents