SCSI disk problem identification and resolution

This section describes a few problems that might occur on your storage unit when you are using SCSI disks.

  • In response to errors in the SAN, the kernel might permanently disable a LUN and log a message stating "device set offline" and the specific device. If this happens on the 2.4 kernel, there is no way to bring the LUN online except for unloading the low-level device driver and reloading the driver or rebooting the system.
  • On 2.6 kernels, the device can be brought back online using one of the following methods:
    • Redhat: echo "running" >/sys/class/scsi_host/hostH/device/targetH:C:T/H:C:T:L/state
    • SLES: echo "1" > /sys/class/scsi_host/hostH/device/H:C:T:L /online
  • The system might periodically list processes in the D-state (see the ps command help page), which corresponds to an uninterruptible process, due to the process waiting in the kernel. In error situations, a process might become permanently stuck in this state, and require a system reboot to recover.
  • The Linux kernel buffer cache is designed to discard dirty buffers after an input/output (I/O) error when the system memory resources are constrained. An application that is attempting to use the fsync() command to verify that its writes have completed successfully will receive an indication of success from the command once the writes successfully complete. Some kernels have a bug in the kswapd daemon, that makes it likely that the system will perceive itself to be in a state of constrained memory. Multipathing can reduce the risk of this silent data loss by providing a means to retry failed I/O operations and hide the failure from the buffer cache.
Library | Support | Terms of use | Feedback
© Copyright IBM Corporation 2004, 2007. All Rights Reserved.