Patch-ID# 100884-17 Keywords: boot hang security tcp clone kernel panic mfar MP zs nfs mount procfs Synopsis: SunOS 5.1: Jumbo kernel patch Date: Jul/15/93 Solaris Release: 2.1 SunOS Release: 5.1 Unbundled Product: Unbundled Release: Relevant Architectures: sparc BugId's fixed with this patch: 1118757 1119235 1109379 1108685 1114069 1108813 1107190 1108112 1108947 1110653 1110373 1105806 1100073 1103645 1106404 1111011 1112756 1113153 1114791 1119071 1110523 1111384 1117508 1120597 1125644 1123266 1112704 1120932 1115127 1119267 1113596 1084913 1104430 1122464 1113596 1111086 1123493 1124179 1121146 1121957 1116255 1120065 1102018 1133751 1132273 1123435 Changes incorporated in this version: 1132273 1123435 Patches accumulated and obsoleted by this patch: 100825-01,100828-01,100829-02,100848-01,100819-01,100858-01,100939-01,100907-01,100947-02 Patches which conflict with this patch: Patches required with this patch: Obsoleted by: Files included with this patch: kernel/drv/clone kernel/drv/tcp kernel/fs/nfs kernel/sys/nfs kernel/unix kernel/fs/procfs Problem Description: SunOS 5.1 and SunOS 5.2 can panic with the following message: panic: page_unlock: pp xxxxxxx is not locked A watchdog reset is caused when running sundiag on a diskless machine (swapping over NFS) when the 'mod_uninstall_daemon()' runs out of kernel stack space. This only happens when swapping over NFS because the call stack is much deeper. The bug synopsis has nothing to do with the actual problem. This patch does not fix the clget() warning. (From 100884-16) This patch fixes a bug in diskless boot introduced by patch 100884-11's bugfix for 1113596. (From 100884-15) When doing integer multiplication emulation, we should put the low part of the 64 bit result into dest[0] and the high part into the y-register. simulate_unimp() attempts to stuff dest[1], which was never set, into rd+1, therefore trashing the contents of rd+1. (The high part of the 64 bit result of .umul should be stuff into the y-register instead. This is already done in crt.s) (From 100884-14) Kernel panics with a data fault. kadb/the core dump shows the crash being in bcopy call by tcp_reinit_fn.1 (From 100884-13) In Solaris 5.1, if alarm(n) is called with large n, alarm() calls returns immediately. That is, a SIGALARM signal is delivered right away. An NIS+ server may hang if a NIS/NIS+ client does a Ctrl-C in the middle of browsing a large map using ypcat/niscat. (From 100884-12) A RACE SITUATION CAN OCCUR WHEN TWO OR MORE PROCESSES ARE TRYING TO WRITE TO THE SAME FILE OVER NFS. THIS PATCH CORRECTS THE PROBLEM. (From 100884-11) 1111086 Galaxy systems with VME devices panic'ed with a M-bus timeout error when accessing the VME interrupt/control registers. The system should not have panic'ed but instead prints a message "VME dropped INT-ACK cycle". However, the implementation of sun4m_impl_bustype() on mars is wrong, causing us to panic. 1124179 This is yet another Viking mfar hardware bug workaround. 1123493 This is a fix for a modctl bug. 1113596 This is the second crank of patch 100884-08 for the problem described below. This fixes a locking problem introduced by the first crank: lockd may spin and generate multiple lock/unlock requests if it receives a signal while waiting for a reply to an NFS lock/unlock request. This is most often manifested when a ksh user logs in or out of a machine which NFS mounts his/her home directory and types ^C during the brief period that ksh is locking or unlocking its history file. This causes ksh to hang and the machine's lockd to consume lots of CPU time. (From 100884-10) 1104430: The problem is a kernel panic: panic: recursive mutex_enter ... This happens when a debugger is applied to a process whose executable (or any of shared libraries invoked by the process) resides on an NFS-mounted filesystem. 1122464: The problem seems to be that the Sun is sending data into a zero window. The first two packets work OK. The cisco rejects the third packet because it contains data but its receive window is 0. This means that the cisco never sees the ACK to its SYN. You'll note that the cisco keeps retransmitting its SYN, and the sun keeps retransmitting the 24 bytes of data. (From 100884-09) The SO_KEEPALIVE option has no affect in SUNos 5.x. Turning on the option with setsockopt does not change the operation of TCP. There is no supporting code to handle the option. (From 100884-08) lockd may spin and generate multiple lock/unlock requests if it receives a signal while waiting for a reply to an NFS lock/unlock request. This is most often manifested when a ksh user logs in or out of a machine which NFS mounts his/her home directory and types ^C during the brief period that ksh is locking or unlocking its history file. This causes ksh to hang and the machine's lockd to consume lots of CPU time. (for 100947-02) If an NFS client mounts a filesystem read-only, access() will still claim that writes are possible. (for 100947-01) Ultrix client "touch" a new file on Solarix 2.1 server will have results in wrong permissions of rwsrwsrwt. This is due to Ultrix sends a (short)-1 instead of a (long)-1 in the mode field for the NFS SETATTR requests. Since only the lower 12 bits are valid mode bits, we check for both (long)-1 and (short)-1. (for 100939-01) If you have a partition on a 4.1.2 (or 4.1.3) server that is full and you write to it on a 5.1 client. The write appears to succeed and the file is reported on the client as having grown. If you look on the server the file is zero length. (for 100907-01) This bug causes Concurrent's "pwd" command to report incomplete pathnames. This may also affect other vendors who rely on the dirent data to include accurate name length data. KRBrown The description field as copied from bug report 1119254 follows: In fastpath, the readdir result is counting the null byte in the file name. For example the name "." has a count of 2 and a value of ".\0" Probable cause is the use of copystr() which returns the string length including the NULL byte. (From 100884-07) Calling the clean user windows trap on sun4m running SunOS 5.1 consistently fails with a segmentation violation. (From 100884-06) Kernel Bug, 1125644: One of the kernel bugs we encountered on the Solaris5.1_fcs sun4m (galaxy) architecture was that the kernel could data fault while taking a pagefault, due to a de-reference of a bogus pointer to the proc structure. The pointer was bogus because it was obtained from the lwp structure which could be in the process of being torn down due to a process exiting. The fix is to obtain the proc pointer from the current thread (the thread taking the pagefault), and not from the current lwp. (From 100884-05) MP startup is fragile. In some circumstances during boot, the system may try to service an interrupt on a CPU that is not yet fully initialized. This problem has been observed on one or two configurations of MP machines with the new 'zs' driver, though other 3rd party drivers active during kernel initialization may provoke the problem. 1120597 zs driver watchdog resets on 4-Viking Galaxy (From 100884-04) This panic may be caused when the system fails to prevent a new segment which overlaps an existing segment, in the address space, from being created. 1110523 Kernel panics with "srmmu_pteload: remap page..." 1119071 crash in ipc_hash_remove due to outer perimeter bug There are bugs in the earlier ROSS 605 chips which can lead to data corruption in Multi-Processor mode. The fix applied determines if these older chips exist in the system and, if so, boots the system in Uni-Processor mode only - and prints a warning message on the console at boot time. The 'MFAR' bug fix is due to the discovery of a bug in the TI SuperSPARC chip. Occasionally due to a unusual set of circumstances on the MBus, a page fault will occur which latches the wrong faulting address. The fix is to look at the faulting instruction to determine the correct fault address. 1111384 sun4m systems should stop the boot if running SVR4 with down-rev 1117508 yet another mfar bug (From 100884-03) 1118757 data fault while doing putpmsg/getpmsg 1119235 kernel hang with patch 100858-01 (From 100858-01) TCP maximum segment size option has a lower limit of 128. (From 100819-01) If you see one of these kernel panics you need to apply the patch: panic: tcp_close_detached - no mblk panic: tcp_clean_death - no mblk (From 100884-02) Several problems have been uncovered in the 4m architecture. Most of these affect either long SunDiag runs (a program used to test various hardware/software interactions - esp. within Sun manufacturing, but also at many Sun OEM sites) or long term stability of the 4m machines. Machines affected by these problems are Sun 4/6XX, SPARCstation 10 (all models), Sunergy and Sunergy Classic. 1114069 C2 (ss10) boot hang (From 100884-01) 1100073 mmap() is not working correcty on 5.0.1/sun4m 1103645 sun4m l15 handler doesn't handle viking module error correctly 1106404 mmap system call fails on galaxy causing unexpected trap 1111011 kernel preempts the 2.8 non-preemptible PROM 1112756 fix module_ross.c to check for pfn, Cacheability, etype. 1113153 seg_kmem.c pass 0 for PTE_RM_MASK when the pte is being invalidated 1114791 Sunergy's and Classics are Watchdog Resetting with invalid Level 0 PTP (From 100848-01) 1108813 security, srmmu window handler does not check %sp (From 100829-02) 1107190 Page create can potentially return a page without acquiring the exclusive lock (From 100828-01) When asyncio calls are made from the NeWSprint handler for the SPARCprinter to write the second page of a job, the number of context switches skyrockets to the point that the user is no longer able to get new input focus until one of the two threads has finished. 1105806 asyncio calls made in NeWSprint cause too many context switches (From 100825-01) The patch fixes various system panics in kmem_alloc/kmem_free when doing file locking. It fixes some problems with locks being lost when upgrading locks and counting of locks is incorrect so the system tunable parameter of the number of locks in the system is not accurate. 1108112 Kernel file locking can hang or crash system. 1108947 Kernel loses track of file locks 1110653 when system lock limit is reached, fcntl() never returns ENOLCK 1110373 system's counting of record locks is incorrect Patch Installation Instructions: -------------------------------- Generic 'installpatch' and 'backoutpatch' scripts are provided within each patch package with instructions appended to this section. Other specific or unique installation instructions may also be necessary and should be described below. Special Install Instructions: ----------------------------- None. Instructions to install patch using "installpatch" -------------------------------------------------- 1. Become super-user. 2. Apply the patch by typing: //installpatch / where is the directory containing the patch and is the patch number. must be a full path name. Example: # /tmp/123456-01/installpatch /tmp/123456-01 3. If any errors are reported, see "Patch Installation Errors" in the Command Descriptions section below. Rebooting the system or restarting the application after a successful patch installation is usually necessary to utilize patch. NOTE: On client server machines the patch package is NOT applied to existing clients or to the client root template space. Therefore, when appropriate, ALL CLIENT MACHINES WILL NEED THE PATCH APPLIED DIRECTLY USING THIS SAME INSTALLPATCH METHOD ON THE CLIENT. See the next section for instructions for installing a patch on a client. Instructions for installing a patch on a diskless or dataless client -------------------------------------------------------------------- 1. Before applying the patch, the following command must be executed on the server to give the client read-only, root access to the exported /usr file system so that the client can execute the pkgadd command: share -F nfs -o ro,anon=0 /export/exec/Solaris_2.1_sparc.all/usr The command: share -F nfs -o ro,root= \ /export/exec/Solaris_2.1_sparc.all/usr accomplishes the same goal, but only gives root access to the client specified in the command. 2. Login to the client system and become super-user. 3. Continue with step 2 in the "Instructions to install patch using installpatch" section above. Instructions for backing out patch using "backoutpatch" ------------------------------------------------------- 1. Become super-user. 2. Change directory to /var/sadm/patch: cd /var/sadm/patch 3. Backout patch by typing: /backoutpatch where is the patch number. Example: # 123456-01/backoutpatch 123456-01 4. If any errors are reported, see "Patch Backout Errors" in the Command Descriptions section below. Instructions for identifying patches installed on system: ---------------------------------------------------------- Type: installpatch -p This command produces a list of the patch IDs of the patches that are currently applied to the system. When executed with the -p option, the installpatch command does not modify the system in any way. Command Descriptions -------------------- NAME installpatch - apply patch package to Solaris 2.x system backoutpatch - remove patch package from Solaris 2.x system SYNOPSIS installpatch [-u] [-d] backoutpatch DESCRIPTION These installation and backout utilities apply only to Solaris 2.x associated patches. They do not apply to Solaris 1.x associated patches. These utilities are currently only provided with each patch package and are not included with the standard Solaris 2.x release software. OPTIONS installpatch -u unconditional install, do not verify file attributes -d do not save original files being replaced -p print a list of the patches currently applied on the system DIAGNOSTICS Patch Installation Errors: -------------------------- Error message: Patch has already been applied. Explanation and recommended action: This patch has already been applied to the system. If the patch has to be reapplied for some reason, backout the patch and then reapply it. Error message: This patch is obsoleted by a patch which has already been applied to this system. Application of this patch would leave the system in an inconsistent state. Patch installation is aborted. Explanation and recommended action: Occasionally, a patch is replaced by a new patch which incorporates the bug fixes in the old patch and supplies additional fixes also. At this time, the earlier patch is no longer made available to users. The second patch is said to "obsolete" the first patch. However, it is possible that some users may still have the earlier patch and try to apply it to a system on which the later patch is already applied. If the obsoleted patch were allowed to be applied, the additional fixes supplied by the later patch would no longer be available, and the system would be left in an inconsistent state. This error message indicates that the user attempted to install an obsoleted patch. There is no need to apply this patch because the later patch has already supplied the fix. Error message: The packages to be patched are not installed on this system. Explanation and recommended action: None of the packages to be updated by this patch are installed on the system. Therefore, this patch cannot be applied to the system. Error message: This patch is not applicable to client systems. Explanation and recommended action: The patch is only applicable to servers and standalone machines. Attempting to apply this patch to a client system will have no effect on the system. Error message: The /usr/sbin/pkgadd command is not executable. Explanation and recommended action: The /usr/sbin/pkgadd command cannot be executed. The most likely cause of this is that installpatch is being run on a diskless or dataless client and the /usr file system was not exported with root access to the client. See the section above on "Instructions for installing a patch on a diskless or dataless client". Error message: Patch directory is not of expected format. Explanation and recommended action: The patch directory supplied as an argument to installpatch did not contain any patch packages. Verify that the argument supplied to installpatch is correct. Error message: The following validation errors were found: Explanation and recommended action: Before applying the patch, the patch application script verifies that the current versions of the files to be patched have the expected fcs checksums and attributes. If a file to be patched has been modified by the user, the user is notified of this fact. The user then has the opportunity to save the file and make a similar change to the patched version. For example, if the user has modified /etc/inet/inetd.conf and /etc/inet/inetd.conf is to be replaced by the patch, the user can save the locally modified /etc/inet/inetd.conf file and make the same modification to the new file after the patch is applied. After the user has noted all validation errors and taken the appropriate action for each one, the user should re-run installpatch using the "-u" (for "unconditional") option. This time, the patch installation will ignore validation errors and install the patch anyway. Error message: Insufficient space in /var/sadm to save old files. Explanation and recommended action: There is insufficient space in the /var/sadm directory to save old files. The user has two options for handling this problem: (1) generate additional disk space by deleting unneeded files, or (2) override the saving of the old files by using the "-d" (do not save) option when running installpatch. However if the user elects not to save the old versions of the files to be patched, backoutpatch CANNOT be used. One way to regain space on a system is to remove the save area for previously applied patches. Once the user has decided that it is unlikely that a patch will be backed out, the user can remove the files that were saved by installpatch. The following commands should be executed to remove the saved files for patch xxxxxx-yy: cd /var/sadm/patch/xxxxxx-yy rm -r save/* rm .oldfilessaved After these commands have been executed, patch xxxxxx-yy can no longer be backed out. Error message: Save of old files failed. Explanation and recommended action: Before applying the patch, the patch installation script uses cpio to save the old versions of the files to be patched. This error message means that the cpio failed. The output of the cpio would have been preceded this message. The user should take the appropriate action to correct the cpio failure. A common reason for failure will be insufficient disk space to save the old versions of the files. The user has two options for handling insufficient disk space: (1) generate additional disk space by deleting unneeded files, or (2) override the saving of the old files by using the "-d" option when running installpatch. However if the user elects not to save the old versions of the files to be patched, the patch CANNOT be backed out. Error message: Pkgadd of package failed. See /tmp/log. for reason for failure. Explanation and recommended action: The installation of one of patch packages failed. Any previously installed packages in the patch should have been removed. See the log file for the reason for failure. Correct the problem and re-apply the patch. Error message: error while adding patch to root template Explanation and recommended action: The install script determined this system to be a client server. The attempt to apply the patch package to the appropriate root template space located under /export/root/templates failed unexpectedly. Check the log file for any failure messages. Correct the problem and re-apply the patch. Patch Backout Errors: --------------------- Error message: Patch has not been applied to this system. Explanation and recommended action: The user has attempted to back out a patch that was never applied to this system. It is possible that the patch was applied, but that the patch directory /var/sadm/patch/ was deleted somehow. If this is the case, the patch cannot be backed out. The user may have to restore the original files from the initial installation CD. Error message: Patch was installed without backing up the original files. It cannot be backed out. Explanation and recommended action: Either the -d option of installpatch was set when the patch was applied, or the save area of the patch was deleted to regain space. As a result, the original files are not saved and backoutpatch cannot be used. The original files can only be recovered from the original installation CD. Error message: Pkgrm of package failed. See /var/sadm/patch//log for reason for failure. Explanation and recommended action: The removal of one of patch packages failed. See the log file for the reason for failure. Correct the problem and run the backout script again. Error message: Restore of old files failed. Explanation and recommended action: The backout script uses the cpio command to restore the previous versions of the files that were patched. The output of the cpio command should have preceded this message. The user should take the appropriate action to correct the cpio failure. KNOWN PROBLEMS: On client server machines the patch package is NOT applied to existing clients or to the client root template space. Therefore, when appropriate, ALL CLIENT MACHINES WILL NEED THE PATCH APPLIED DIRECTLY USING THIS SAME INSTALLPATCH METHOD ON THE CLIENT. See instructions above for applying patches to a client. After a patch package has been installed pkginfo(1) will not recognize the SUNW_PATCHID macro in the patch package pkginfo file. Instead, to identify patches installed on the system use the grep command method described in the patch README. The pkgadd command shipped with Solaris 2.1 fails (drops core without any error message) when there are more than 100 entries in the /etc/mnttab file. This means that installpatch can fail, because it uses pkgadd. Since this is very likely on any big system with lots of automounts, ANY patch could fail. Applying patch 100901-01 fixes this problem (the README for patch 100901 mentions shutting down the automounter while applying it). SEE ALSO pkgadd(1), pkgchk(1), pkgrm(1), pkginfo(1)