A hot patch fails to be installed on Sx900&T series devices or other errors occur due to abnormalities on the storage devices.
Note: Dual controllers are checked when installing a hot patch. Failure on any controller may cause the installation to fail. Therefore, analyze logs of dual controllers.
The storage device will check the current CPU usage before installing a hot patch. If the CPU usage is over 60%, the hot patch cannot be installed.
Step 1: In the message of the storage system, search for the keyword check cpu using failed to check whether the CPU usage meets the requirement for installing a hot patch.
Step 2: Check whether the high CPU usage is caused by the os_debug_mode_pH thread.
The OS_debug_mode_pH thread is not stopped because the maintenance personnel abnormally exit the debug mode.
Versions affected: V100R005C00SPC003 to V100R005C00SPC900.
Go to the minisystem mode, and run the top command. The CPU usage for the os_debug_mode_pH thread is 35%.
The storage device will check the system memory before installing a hot patch. If the memory is less than 80 MB, the hot patch cannot be installed.
In messages, search for the keyword end to check system memory to check whether the memory meets the requirement for installing a hot patch.
Check whether the busy function is in the call stack of a CPU. If the function is in the call stack, activate the hot patch after the function exits from the call stack. This process will repeat for eight times at an interval of one second. The hot patch will fail to be installed after eight attempts.
In the message log, search for the keyword busy. If the following information is displayed, the failure is caused by a busy function.
1) Delete the os_debug_mode_pH thread.
Check the PID of the thread, which is 19762 in the following figure.
In debug mode of the storage device, run the kill -9 [PID] command. For example, you can run kill -9 19762 to delete the thread, and then reinstall the patch.
2) A hot patch fails to be installed due to excessive CPU usage, which is caused by heavy service pressure.
In the message of the storage system, search for the keyword check cpu using failed to check the CPU usage.
cur indicates the current CPU usage and limit indicates the limited CPU usage.
Solution: Check the current service pressure through the performance statistics function. If the service pressure is heavy, reduce the pressure and install the patch.
In messages, search for the keyword end to check system memory to check whether the memory meets the requirement for installing a hot patch.
Currently, installation failures caused by insufficient memory are not found. Generally, such failures are caused by memory leakage. It is recommended to collect and send the logs to R&D engineers for analysis.
Emergency solution: Restart the controller with insufficient memory and then reinstall the hot patch.
In the message log, search for the keyword busy. If the following information is displayed, the failure is caused by a busy function.
1. If the installation failure occurs because a function is BUSY, reinstall the hot patch.
2. If a patch of a single controller is lost because a function is BUSY after the single controller or dual controllers are restarted.
a) In CLI, run showupgradepkginfo -t 3 to check the controller that has patch loss.
In the figure above, the hot patch of controller A is lost.
b) Run the showcontroller command to show the controller that has patch loss. If the primary controller has patch loss, perform step c. If the secondary controller has patch loss, go to step d.
Note: In the preceding two figures, primary controller A has patch loss. Perform step d.
c) Go to the developer mode of the device. The default password is debug@storage. Run the swapcontroller command to set the controller whose patches are installed properly to the secondary controller. (Note: You need to run this command because the storage device deletes the patches of the secondary controller first.)
d) In developer mode, run the delhotpatch command to delete a patch. (Note: Ignore failure reports for this command.)
e) Run the showupgradepkginfo -t 3 command to check whether the patch is deleted. If the patch is not deleted, return to step a. If the patch is deleted, perform step f.
3 If a patch of dual controllers after restart is lost because a function is BUSY,
Note: Currently, a hot patch is found lost only because a function is Busy. If a hot patch is found lost not by a Busy function, contact R&D engineers.
Solution: Upgrade the system to V1R5C02SPC300 or later versions.