SCSI error reporting

SAN Volume Controller nodes can notify their hosts of errors for SCSI commands that are issued.

SCSI status

Some errors are part of the SCSI architecture and are handled by the host application or device drivers without reporting an error. Some errors, such as read and write I/O errors and errors that are associated with the loss of nodes or loss of access to backend devices, cause application I/O to fail. To help troubleshoot these errors, SCSI commands are returned with the Check Condition status and a 32-bit event identifier is included with the sense information. The identifier relates to a specific error in the SAN Volume Controller cluster error log.

If the host application or device driver captures and stores this error information, you can relate the application failure to the error log.

Table 1 describes the SCSI status and codes that are returned by the SAN Volume Controller nodes.

Table 1. SCSI status
Status Code Description
Good 00h The command was successful.
Check condition 02h The command failed and sense data is available.
Condition met 04h N/A
Busy 08h An Auto-Contingent Allegiance condition exists and the command specified NACA=0.
Intermediate 10h N/A
Intermediate - condition met 14h N/A
Reservation conflict 18h Returned as specified in SPC2 and SAM2 where a reserve or persistent reserve condition exists.
Task set full 28h The initiator has at least one task queued for that LUN on this port.
ACA active 30h This is reported as specified in SAM-2.
Task aborted 40h This is returned if TAS is set in the control mode page 0Ch. The SAN Volume Controller node has a default setting of TAS=0 , which is cannot be changed; therefore, the SAN Volume Controller node does not report this status.

SCSI Sense

SAN Volume Controller nodes notify the hosts of errors on SCSI commands. Table 2 defines the SCSI sense keys, codes and qualifiers that are returned by the SAN Volume Controller nodes.

Table 2. SCSI sense keys, codes, and qualifiers
Key Code Qualifier Definition Description
2h 04h 01h Not Ready. The logical unit is in the process of becoming ready. The node lost sight of the cluster and cannot perform I/O operations. The additional sense does not have additional information.
2h 04h 0Ch Not Ready. The target port is in the state of unavailable. The following conditions are possible:
  • The node lost sight of the cluster and cannot perform I/O operations. The additional sense does not have additional information.
  • The node is in contact with the cluster but cannot perform I/O operations to the specified logical unit because of either a loss of connectivity to the backend controller or some algorithmic problem. This sense is returned for offline virtual disks (VDisks).
3h 00h 00h Medium error This is only returned for read or write I/Os. The I/O suffered an error at a specific LBA within its scope. The location of the error is reported within the sense data. The additional sense also includes a reason code that relates the error to the corresponding error log entry. For example, a RAID controller error or a migrated medium error.
4h 08h 00h Hardware error. A command to logical unit communication failure has occurred. The I/O suffered an error that is associated with an I/O error that is returned by a RAID controller. The additional sense includes a reason code that points to the sense data that is returned by the controller. This is only returned for I/O type commands. This error is also returned from FlashCopy target VDisks in the prepared and preparing state.
5h 25h 00h Illegal request. The logical unit is not supported. The logical unit does not exist or is not mapped to the sender of the command.

Reason codes

The reason code appears in bytes 20-23 of the sense data. The reason code provides the SAN Volume Controller node specific log entry. The field is a 32-bit unsigned number that is presented with the most significant byte first. Table 3 lists the reason codes and their definitions.

If the reason code is not listed in Table 3, the code refers to a specific error in the SAN Volume Controller cluster error log that corresponds to the sequence number of the relevant error log entry.

Table 3. Reason codes
Reason code (decimal) Description
40 The resource is part of a stopped FlashCopy mapping.
50 The resource is part of a Metro Mirror or Global Mirror relationship and the secondary LUN in the offline.
51 The resource is part of a Metro Mirror or Global Mirror and the secondary LUN is read only.
60 The node is offline.
71 The resource is not bound to any domain.
72 The resource is bound to a domain that has been recreated.
73 Running on a node that has been contracted out for some reason that is not attributable to any path going offline.
80 Wait for the repair to complete, or delete the virtual disk.
81 Wait for the validation to complete, or delete the virtual disk.
82 An offline space-efficient VDisk has caused data to be pinned in the directory cache. Adequate performance cannot be achieved for other space-efficient VDisks, so they have been taken offline.
85 The VDisk has been taken offline because checkpointing to the quorum disk failed.
86 The svctask repairvdiskcopy -medium command has created a virtual medium error where the copies differed.
Library | Support | Terms of use | Feedback
© Copyright IBM Corporation 2003, 2009. All Rights Reserved.