Problems with loading and starting the operating system (AIX and Linux)

If the system is running partitions from partition standby (LPAR), the following procedure addresses the problem in which one partition will not boot AIX(R) or Linux(R) while other partitions boot successfully and run the operating system successfully.

It is the customer's responsibility to move devices between partitions. If a device must be moved to another partition to run standalone diagnostics, contact the customer or system administrator. (If the optical drive must be moved to another partition, all SCSI devices connected to that SCSI adapter must be moved because moves are done at the slot level, not at the device level.)

Depending on the boot device, a checkpoint may be displayed on the operator panel for an extended period of time while the boot image is retrieved from the device. This is particularly true for tape and network boot attempts. If booting from an optical drive or tape drive, watch for activity on the drive's LED indicator. A blinking LED indicates that the loading of either the boot image or additional information required by the operating system being booted is still in progress. If the checkpoint is displayed for an extended period of time and the drive LED is not indicating any activity, there might be a problem loading the boot image from the device.

Notes:
  1. For network boot attempts, if the system is not connected to an active network or if the target server is inaccessible (which can also result from incorrect IP parameters being supplied), the system will still attempt to boot. Because time-out durations are necessarily long to accommodate retries, the system may appear to be hung. Refer to checkpoint CA00 E174.
  2. If the partition hangs with a 4-character checkpoint in the display, the partition must be deactivated, then reactivated before attempting to reboot.
  3. If a BA06 000x error code (or a 20EE xxxx error code on a 7037-A50 or a 7047-185) is reported, the partition is already deactivated and in the error state. Reboot by activating the partition. If the reboot is still not successful, go to step 3.

This procedure assumes that a diagnostic CD-ROM and an optical drive from which it can be booted are available, or that diagnostics can be run from a NIM (network installation management) server. Booting the diagnostic image from an optical drive or a NIM server is referred to as running standalone diagnostics.

  1. Is a Hardware Management Console attached to the managed system?
  2. Look at the service action event error log in Manage Serviceable Events on the HMC. Perform the actions necessary to resolve any open entries that affect devices in the boot path of the partition or that indicate problems with I/O cabling. Then try to reboot the partition. Does the partition reboot successfully?
  3. Boot to the SMS main menu:

    At the SMS main menu, select Select Boot Options and check to see if the intended boot device is correctly specified in the boot list. Is the intended load device correctly specified in the boot list?

  4. If you are attempting to load the operating system from the network, perform the following:

    Restart the partition and try loading the operating system. Does the operating system load successfully?

  5. Use the SMS menus to add the intended boot device to the boot sequence. Can you add the device to the boot sequence?
  6. Ask the customer or system administrator to verify that the device you are trying to load from is assigned to the correct partition. Then select List All Devices and record the list of bootable devices that displays. Is the device from which you want to load the operating system in the list?
  7. Try to load and run standalone diagnostics against the devices in the partition, particularly against the boot device from which you want to load the operating system. You can run standalone diagnostics from an optical drive or a NIM server. To boot standalone diagnostics, follow the detailed procedures in Running the online and eServer stand-alone diagnostics.
    Note:
    When attempting to load diagnostics on a partition from partition standby, the device from which you are loading standalone diagnostics must be made available to the partition that is not able to load the operating system, if it is not already in that partition. Contact the customer or system administrator if a device must be moved between partitions in order to load standalone diagnostics.

    Did standalone diagnostics load and start successfully?

  8. Was the intended boot device present in the output of the Display Configuration and Resource List option (which is run from the Task Selection menu)?
  9. Did running diagnostics against the intended boot device result in a "No Trouble Found" message?
  10. Perform the following actions:
    1. Perform the first item in the action list below. In the list of actions below, choose SCSI or IDE based on the type of device from which you are trying to boot the operating system.
    2. Restart the system or partition.
    3. Stop at the SMS menus and select Select Boot Options.
    4. Is the device that was not appearing previously in the boot list now present?
      • Yes: Go to Verifying a repair. This ends the procedure.
      • No: Perform the next item in the action list and then return to step 10b. If there are no more items in the action list, go to step 11.
    Action list:
    Note:
    See Locating FRUs for part numbers and links to exchange procedures.
    1. Verify that the SCSI or IDE cables are properly connected. Also verify that the device configuration and address jumpers are set correctly.
    2. Do one of the following:
      • SCSI boot device: If you are attempting to boot from a SCSI device, remove all hot-swap disk drives (except the intended boot device, if the boot device is a hot-swap drive).If the boot device is present in the boot list after you boot the system to the SMS menus, add the hot-swap disk drives back in one at a time, until you isolate the failing device.
      • IDE boot device: If you are attempting to boot from an IDE device, disconnect all other internal SCSI or IDE devices. If the boot device is present in the boot list after you boot the system to the SMS menus, reconnect the internal SCSI or IDE devices one at a time, until you isolate the failing device or cable.
    3. Replace the SCSI or IDE cables.
    4. Replace the SCSI backplane (or IDE backplane, if present) to which the boot device is connected.
    5. Replace the intended boot device.
    6. Replace the system backplane.
  11. Choose from the following:
  12. Have you disconnected any other devices?
  13. Is the problem corrected?
  14. Is a SCSI boot failure (where you cannot boot from a SCSI-attached device) also occurring?
  15. Perform the following actions to determine if another adapter is causing the problem:
    1. Remove all adapters except the one to which the optical drive is attached and the one used for the console.
    2. Reload the standalone diagnostics. Can you successfully reload the standalone diagnostics?
      • Yes: Perform the following:
        1. Reinstall the adapters that you removed (and attach devices as applicable) one at a time. After you reinstall each adapter, retry the boot operation until the problem reoccurs.
        2. Replace the adapter or device that caused the problem.
        3. Go to Verifying a repair. This ends the procedure.
      • No: Continue with the next step.
  16. The graphics adapter (if installed), optical drive, IDE or SCSI cable, or system board is most likely defective. Does your system have a PCI graphics adapter installed?
  17. Perform the following to determine if the graphics adapter is causing the problem:
    1. Remove the graphics adapter.
    2. Attach a TTY terminal to the system port.
    3. Try to reload standalone diagnostics. Do the standalone diagnostics load successfully?
      • Yes: Replace the graphics adapter. This ends the procedure.
      • No: Continue with the next step.
  18. Replace the following (if not already replaced), one at a time, until the problem is resolved:
    1. Optical drive
    2. IDE or SCSI cable that goes to the optical drive
    3. System board that contains the integrated SCSI or IDE adapters.

    If this resolves the problem, go to Verifying a repair. If the problem still persists or if the previous descriptions did not address your particular situation, go to PFW1548: Memory and processor subsystem problem isolation procedure.

    This ends the procedure.