SIP3150

Use this procedure to perform serial attached SCSI (SAS) fabric problem isolation.

Considerations:
Attention: When SAS fabric problems exist, obtain assistance from your hardware service provider:
  • When SAS fabric problems exist, replacing RAID adapters is not recommended without assistance from your service provider. Because the adapter might contain nonvolatile write cache data and configuration data for the attached disk arrays, additional problems can be created by replacing an adapter.
  • Removing functioning disk units in a disk array is not recommended without assistance from your service provider. A disk array might become unprotected or might fail if functioning disk units are removed. The removal of functioning disk units might also result in additional problems in the disk array.
  1. Was the SRC xxxx3020?
    No:
    Go to step 3.
    Yes:
    Go to step 2.
  2. The possible causes are:
    • More devices are connected to the adapter than the adapter supports. Change the configuration to reduce the number of devices below what is supported by the adapter.
    • A SAS device has been incorrectly moved from one location to another. Either return the device to its original location or move the device while the adapter is powered off.
    • A SAS device has been incorrectly replaced by a SATA device. A SAS device must be used to replace a SAS device.
    This ends the procedure.
  3. Determine the status of the disk units in the array by doing the following steps:
    1. Access the product activity log and display the SRC that sent you here.
    2. Press the F9 key for address information. This is the adapter address.
    3. Return to the SST or DST main menu.
    4. Select Work with disk units > Display disk configuration > Display disk configuration status.
    5. On the Display disk configuration status screen, look for the devices attached to the adapter that was identified.
    Is there a device that has a status of "RAID 5/Unknown", "RAID 6/Unknown", "RAID 5/Failed", or "RAID 6/Failed"?
    No:
    Go to step 5.
    Yes:
    Go to step 4
  4. Other errors should have occurred related to the disk array having degraded protection. Take action on these errors to replace the failed disk unit and restore the disk array to a fully protected state. This ends the procedure.
  5. Have other errors occurred at the same time as this error?
    No:
    Go to step 7.
    Yes:
    Go to step 6
  6. Take action on the other errors that have occurred at the same time as this error. This ends the procedure.
  7. Was the SRC xxxxFFFE?
    No:
    Go to step 10.
    Yes:
    Go to step 8.
  8. Check for the latest PTFs for the device, device enclosure, and adapter and apply them. Did you find and apply a PTF?
    No:
    Go to step 10.
    Yes:
    Go to step 9.
  9. This ends the procedure.
  10. Identify the adapter and adapter port associated with the problem by examining the product activity log. Perform the following:
    1. Access SST or DST.
    2. Access the product activity log and display the SRC that sent you here. Record the adapter address and the adapter port by doing one of the following:
      • If the SRC is xxxxFFFE, press the F9 key for address information. The adapter address is the bus, board, card information. The port is shown in the I/O bus field. Convert the port value from decimal to hexadecimal.
      • Press the F9 key for address information. The adapter address is the bus, board, card information. Then, press F12 to cancel and return to the previous screen. Then press the F4 key to view the additional information, if available. The adapter port is characters 1 and 2 of the unit address. For example, if the unit address is 123456FF, the port would be 12.
      • Go to Hexadecimal product activity log data to obtain the address information. The adapter address is the bus, board, card information. The adapter port is characters 1 and 2 of the unit address. For example, if the unit address is 123456FF, the port would be 12.
  11. Use the adapter address to find the location of the adapter (see System FRU locations ). On the tailstock of the adapter, find the port identified in the previous step. This is the port that is used to attach the device, or device enclosure, that is experiencing the problem.
  12. Because the problem persists, some corrective action is needed to resolve the problem. Proceed by doing the following:

    Perform only one of the following corrective actions (listed in the order of preference). If one of the corrective actions has previously been attempted, proceed to the next one in the list.

    • Reseat cables, if present, on adapter and device enclosure. Perform the following steps:
      1. Use adapter concurrent maintenance to power off the adapter slot, or power off the system or partition.
      2. Reseat the cables.
      3. Use adapter concurrent maintenance to power on the adapter slot, or power on the system or partition.
    • Replace the cable, if present, from the adapter to the device enclosure. Perform the following:
      1. Use adapter concurrent maintenance to power off the adapter slot, or power off the system or partition.
      2. Replace the cables.
      3. Use adapter concurrent maintenance to power on the adapter slot, or power on the system or partition.
    • Replace the device. See Disk drive.
      Note: If there are multiple devices with a path that is not Operational, then the problem is not likely to be with a device.
    • Replace the internal device enclosure or refer to the service documentation for an external expansion drawer. Perform the following:
      1. Power off the system or partition. If the enclosure is external, use adapter concurrent maintenance instead to power off the adapter slot.
      2. Replace the device enclosure.
      3. Power on the system or partition. If the enclosure is external, use adapter concurrent maintenance instead to power on the adapter slot.
    • Replace the adapter. The procedure to replace the adapter can be found in PCI adapter.
    • Contact your service provider.
  13. Does the problem still occur after performing the corrective action?
    • No: This ends the procedure.
    • Yes: Go to step 12.