Patch-ID# 101946-12 Keywords: kernel libsocket sockmod procfs nfs RDBMS strrput AIX cachefs security Synopsis: SunOS 5.4_x86: jumbo patch for kernel Date: Mar/01/95 Solaris Release: 2.4_x86 SunOS release: 5.4_x86 Unbundled Product: Unbundled Release: Xref: This patch available for SPARC as patch 101945 Topic: SunOS 5.4_x86: jumbo patch for kernel NOTE: This revision was released on the basis of its correspondence to 101945-13, and its extended "burn-in" period, as opposed to complete confirmation for all the new elements it contains. BugId's fixed with this patch: 1120225 1151364 1152710 1157053 1159986 1160112 1164800 1165687 1167235 1169686 1169791 1169909 1171008 1171478 1171939 1172009 1172243 1172542 1172979 1173301 1173969 1174830 1175115 1175478 1176467 1177091 1177572 1177578 1177600 1178114 1178236 1178641 1178835 1178957 1183395 Changes incorporated in this version: 1159986 1171008 1178114 1183395 Relevant Architectures: i386 Patches accumulated and obsoleted by this patch: 101903-01,101919-01,101976-01 Patches which conflict with this patch: Patches required with this patch: Obsoleted by: Files included with this patch: /etc/lib/unix_scheme.so.1 /kadb /kernel/drv/cn /kernel/drv/tl /kernel/fs/cachefs /kernel/fs/nfs /kernel/fs/procfs /kernel/mach/uppc /kernel/misc/strategy /kernel/misc/strplumb /kernel/misc/swapgeneric /kernel/strmod/sockmod /kernel/sys/nfs /kernel/unix /sbin/init /sbin/su /sbin/sulogin /usr/kvm/crash /usr/include/sys/ddi_impldefs.h /usr/lib/fs/nfs/inetboot /usr/lib/libauth.a /usr/lib/libauth.so.1 /usr/lib/libsocket.a /usr/lib/libsocket.so.1 /usr/lib/security/unix_scheme.so.1 Problem Description: 1183395 strategy: x86 System hangs when lomemalloc fails 1178114 ioctl SIOCSPGRP/FIOSETOWN path broken for MT libsocket(linked to libthread) code 1171008 Mux hangs when expecting messages on lower stream during I_LINK/UNLINK 1159986 lckpwdf causes passwd to crash (from 101946-11) 1177600 No way to cache the root and /usr file systems with CacheFS (from 101946-10) 1178957 sigurg not delivered on second oob data arrival 1164800 panic: ddi_setcallback: no callback structures (from 101946-09) 1178835 RCS operations fail on file system NFS mounted from AIX system The problem happens when application do fchmod between writes (which is very rare) which has a chance to lead different views of the file attributes on client from that on server. The solution purges the client cache before doing setattr so that the views will be the same. (from 101946-08) 1178641 NFS client should fail to open files with the mandlock bit set 1175478 Panic in prototype inkernel logdmuxunlink() after munlink failed 1178236 System panics with data fault in free_zero_zero() 1175115 nfs write error "(file handle: xxx xxx" message cannot be redirected by syslog The problem occurs when nfs encounters write errors. NFS will print a write error to the console. In some cases the physical console is printed upon in the event that the console driver is deprived or resources. What has been done is to put a throttle on NFS write error messages, enabling the administrator to type on the console and try to figure out what is going on. Socket interface networking programs under heavy use may panic the machine with free_zero_zero() on the kernel call stack. This fixes the problem in the sockmod module. I_UNLINK or I_PUNLINK commands may time out and close the stream before the multiplexor has processed the command. The NFS server will deny access to mandatory lock files. This is done for two reasons. First, mandatory locking is not supported over NFS. Second, it is dangerous for the server to access mandatory lock files. It would be very easy for a normal user to completely hang the NFS server. The user could create a file and set the mode to indicate that it is a mandatory lock file. It could then lock the file with a program which then just does a pause. This user could then go to an NFS client and try to access the file. With each request from the client, including retries, another NFS server daemon on the server would get blocked, until the server ran out of NFS server daemons. (from 101946-07) 1177091 prgetstatus can generate pagefault holding p_lock, can deadlock if freemem is 0 1177578 strmakemsg/strgeterr causes panic in strrput due to NULL mblk ptr 1176467 fcntl system call fails in process run by rcmd 1172243 Customer runs application from dumb terminal and system crashes. The system can freeze under heavy swapping pressure due to procfs holding a critical lock when it takes a page fault. Doing I_SETSIG on a console window through serial line and exiting the process could cause a system panic. Kernel panic in putnext/ptcwrite. A socket endpoint not created through the socket library (by dup() of a socket endpoint for example) may experience some failures on fcntl()/ioctl() calls. (This bug is only limited to 2.4 release) (from 101946-06) 1177572 installing Solaris 2.4 ON patch 101945-05 and running OW causes machine to panic The patch to bug ID 1151364 broke OW's consolidation. This happened because releasef() changed to have an extra argument. OW shouldn't have been dependent on releasef() which is private to the ON consolidation. Since this problem was not discovered until after the patch was made, it made more sense for ON to produce a new patch which restores releasef() to have its old interface. The interface changed for kaio. A new interface is added called areleasef() which is only used by kaio. (from 101946-05) 1174830 savecore on diskless machine didn't generate unix, vmcore is trash 1151364 asynchronous I/O in the user level hurts RDBMS performance This is a performance improvement for applications that are using libaio for doing async IO to raw files or devices. There are no API changes, only a new version of libaio.so.1 is installed. One side benefit of this fix is that async IO to tape should now work. This patch to bug 1151364 requires installation of libaio/kaio patch 102021-01 or later) Kernel crash dumps generated on diskless sun4m, sun4d or i86pc systems are not complete. (from 101946-04) 1172243 Customer runs application from dumb terminal and system crashes 1169686 4.1.3 system on network goes down, hangs 2.3 system The problem shows up when a "ps" thread is running through the virtual memory area to get the address space size for a mapped file. The address space lock is held and a get attributes function is called. This initiates an nfs get attribute request. If the machine that the request is made to is not responding the nfs request will block. The address space lock which is held by the blocked ps thread might block other processes on the local machine. Typically when a server goes down all nfs file system activity is blocked on any clients. The nfs operation resumes once the server comes up. In this situation a server is powered down and causes a client to hang. The hang is due to a process pile-up. The client is doing a ps and its thread is holding the address space lock (as_lock) for a running process lets call A. The A process is a mapped file from the server. The client ps thread path has reached rm_assize() which needs to get the file size so it calls VOP_GETATTR() which goes across the wire to the server. This operation goes nowhere because the server is not running. The as_lock held by the ps process is blocking other processes such as init. The solution is not to go over the wire but to return a cached entry for the file size. The change is to define a new attribute flag in vnode.h called ATTR_HINT. The rm_assize() function recognizes will use this flag when it calls VOP_GETATTR(). The nfs getattr function will see that the size of the file is requested and that the passed in flag is ATTR_HINT. It will return the file size from the rnode rather than make a request to the server. Typically when a server goes down all nfs file system activity is blocked on any clients. The nfs operation resumes once the server comes up. In this situation a server is powered down and causes a client to hang. The hang is due to a process pile-up. The client is doing a ps and its thread is holding the address space lock (as_lock) for a running process lets call A. The A process is a mapped file from the server. The client ps thread path has reached rm_assize() which needs to get the file size so it calls VOP_GETATTR() which goes across the wire to the server. This operation goes nowhere because the server is not running. The as_lock held by the ps process is blocking other processes such as init. The solution is not to go over the wire but to return a cached entry for the file size. The change is to define a new attribute flag in vnode.h called ATTR_HINT. The rm_assize() function recognizes will use this flag when it calls VOP_GETATTR(). The nfs getattr function will see that the size of the file is requested and that the passed in flag is ATTR_HINT. It will return the file size from the rnode rather than make a request to the server. Running applications that do I_SETSIG on console, when console is the serial port (i.e not the frame buffer), causes system to crash, when attempting to send signal to a process. (from 101946-03) 1169909 Running xlib code in Realtime class causes code to block. in poll() 1167235 panic data fault in strioctl - apparently doing TIOCSPGRP Protect with mutex the testing and setting of the session and controlling terminal related flags in the streamhead. Real time stream threads will block in a poll. (from 101946-02) 1172979 spurious SIGALRM received in test program that forks child processes 1172009 recv() on sockets should return the error only once for SunOS 4.X compatibility 1165687 non-blocking reads on sockets block under Solaris 2.3 1160112 socket library accidentally closes file descriptor on error 1120225 recv() returns EPIPE when called with MSG_PEEK 1152710 socket lib in 2.3/2.2 have problems with not clearing bad connections and errno 1171478 socket recv() calls fail with EINVAL due to bad fix in 494 AF_UNIX and AF_INET sockets can sometimes get EPIPE errors for recv(MSG_PEEK). When the socket library sees the EPIPE error it will in some cases close the file descriptor causing the application to get EBADF errors for subsequent operations. A AF_UNIX listening socket can get into a permanent error state (returning EPIPE or ECONNRESET) for any operation until the socket is closed. The non-blocking attribute of a socket endpoint is not transferred from a non-blocking listener endpoint to a accepting endpoint. This causes some socket non-blocking programs to block. This patch fixes the problem by setting the accepting endpoint non-blocking attribute if the listener was non-blocking. In SunOS 4.X sockets when a read() or recv*() call returns an error the application can do another read()/recv*() and get an EOF. This patch applies this subtle aspect of socket semantics to SunOS 5.X. This specification of signal actions from the signal(5) manual page was being violated: Setting a signal action to SIG_IGN for a signal that is pending causes the pending signal to be discarded, whether or not it is blocked. Any queued values pending are also discarded, and the resources used to queue them are released and made available to queue other signals. The condition under which the pending signal was not being discarded was the specific case of SIGALRM signals generated by the setitimer(ITIMER_REAL) interface. The malfunction happens in a narrow race condition which will be triggered under intensive setting of a signal handler and setting it to SIG_IGN while the itimer is active. (from 101946-01) 1173969 MT process doesn't stop on multi processor systems dbx appears to malfunction when controlling a multithreaded process that does many fork1()s. The bug is in the system, not dbx. Also, stopping dbx with a jobcontrol signal from the terminal, ^Z, while it is controlling a multithreaded process will cause the multithreaded process to becomed permanently stopped. (from 101903-01) 1172542 gettimeofday() returns negative nanosecond value on x86 1171939 Process dump core at random on loaded systems 1169791 processes often getting killed with SIGABRT and core dumped on MP IntelExpress gettimeofday() call can return negative nsec value at times. Processes can dump core on heavily loaded systems. (from 101919-01) 1157053 System panics when doing a copy to NFS file system mounted across FDDI-S Cause of problem is due to non-aligned transfers. The memory address alignment trap happened in xdr_writeargs() when copying data in a loop. The address was not on a long word boundary, it was on a word boundary. nfs_feedback() can adjust the transfer address and size for a request such as for a retransmission. The xdr_writeargs() can make use of bcopy(). The xdr_writeargs() is in file nfs_xdr.c. There are a few other functions in this file that do a similar copy operation that should be changed to use bcopy. (from 101976-01) 1173301 Files can sometimes be missing from a cachefs mounted directory. This can happen if the entry in question is the last one in the directory block, but would be the first one in the cachefs front file. If a client system runs touch on this file, it will erase the contents of the file on the server. Patch Installation Instructions: -------------------------------- Generic 'installpatch' and 'backoutpatch' scripts are provided within each patch package with instructions appended to this section. Other specific or unique installation instructions may also be necessary and should be described below. Special Install Instructions: ----------------------------- Reboot after installing patch. Instructions to install patch using "installpatch" -------------------------------------------------- 1. Become super-user. 2. Apply the patch by typing:
.
See /tmp/log. for reason for failure.
Explanation and recommended action: The installation of one of
patch packages failed. Installpatch will backout the patch
to leave the system in its pre-patched state. See the log file
for the reason for failure. Correct the problem and
re-apply the patch.
Error message:
Pkgadd of package failed with error code .
Will not backout patch...patch re-installation.
Warning: The system may be in an unstable state!
See /tmp/log. for reason for failure.
Explanation and recommended action: The installation of one of
the patch packages failed. Installpatch will NOT backout the
patch. You may manually backout the patch using backoutpatch,
then re-apply the entire patch. Look in the log file for the
reason pkgadd failed. Correct the problem and re-apply the
patch.
Patch Installation Messages:
---------------------------
Note: the messages listed below are not necessarily considered errors
as indicated in the explanations given. These messages are, however,
recorded in the patch installation log for diagnostic reference.
Message:
Package not patched:
PKG=SUNxxxx
Original package not installed
Explanation: One of the components of the patch would have patched a
package that is not installed on your system. This is not
necessarily an error. A Patch may fix a related bug for several
packages. Example: suppose a patch fixes a bug in both the
online-backup and fddi packages. If you had online-backup installed
but didn't have fddi installed, you would get the message
Package not patched:
PKG=SUNWbf
Original package not installed
This message only indicates an error if you thought the package
was installed on your system. If this is the case, take the
necessary action to install the package, backout the patch (if
it installed other packages) and re-install the patch.
Message:
Package not patched:
PKG=SUNxxx
ARCH=xxxxxxx
VERSION=xxxxxxx
Architecture mismatch
Explanation: One of the components of the patch would have patched a
package for an architecture different from your system. This is not
necessarily an error. Any patch to one of the architecture specific
packages may contain one element for each of the possible
architectures. For example, Assume you are running on a sun4m. If
you were to install a patch to package SUNWcar, you would see the
following (or similar) messages:
Package not patched:
PKG=SUNWcar
ARCH=sparc.sun4c
VERSION=11.5.0,REV=2.0.18
Architecture mismatch
Package not patched:
PKG=SUNWcar
ARCH=sparc.sun4d
VERSION=11.5.0,REV=2.0.18
Architecture mismatch
Package not patched:
PKG=SUNWcar
ARCH=sparc.sun4e
VERSION=11.5.0,REV=2.0.18
Architecture mismatch
Package not patched:
PKG=SUNWcar
ARCH=sparc.sun4
VERSION=11.5.0,REV=2.0.18
Architecture mismatch
The only time these messages indicate an error condition
is if installpatch does not correctly recognize your architecture.
Message:
Package not patched:
PKG=SUNxxxx
ARCH=xxxx
VERSION=xxxxxxx
Version mismatch
Explanation: The version of software to which the patch is applied is
not installed on your system. For example, if you were running Solaris
5.3, and you tried to install a patch against Solaris 5.2, you would
see the following (or similar) message:
Package not patched:
PKG=SUNWcsu
ARCH=sparc
VERSION=10.0.2
Version mismatch
This message does not necessarily indicate an error. If
the version mismatch was for a package you needed patched, either
get the correct patch version or install the correct package version.
Then backout the patch (if necessary) and re-apply.
Message:
Re-installing Patch.
Explanation: The patch has already been applied, but there is
at least one package in the patch that could be added. For
example, if you applied a patch that had both Openwindows and
Answerbook components, but your system did not have Answerbook
installed, the Answerbook parts of the patch would not have
been applied. If, at a later time, you pkgadd Answerbook, you
could re-apply the patch, and the Answerbook components of the
patch would be applied to the system.
Message:
Installpatch Interrupted.
Installpatch is terminating.
Explanation: Installpatch was interrupted during execution
(usually through pressing ^C). Installpatch will clean up
its working files and exit.
Message:
Installpatch Interrupted.
Backing out Patch...
Explanation: Installpatch was interrupted during execution
(usually through pressing ^C). Installpatch will clean up
its working files, backout the patch, and exit.
Patch Backout Errors:
---------------------
Error message:
prebackout patch exited with return code .
Backoutpatch exiting.
Explanation and corrective action: the prebackout script
supplied with the patch exited with a return code other
than 0. Generate a script trace of backoutpatch to determine
why the prebackout script failed. Correct the reason for
failure, and re-execute backoutpatch.
Error message:
postbackout patch exited with return code .
Backoutpatch exiting."
Explanation and corrective action: the postbackout script
supplied with the patch exited with a return code other than
0. Look at the postbackout script to determine why it failed.
Correct the failure and, if necessary, RE-EXECUTE THE
POSTBACKOUT SCRIPT ONLY.
Error message:
Only one service may be defined.
Explanation and corrective action: You have attempted to specify
more than one service from which to backout a patch. Different
services must have their patches backed out with different
invocations of backoutpatch.
Error message:
The -S and -R arguments are mutually exclusive.
Explanation and recommended action: You have specified both a
non-native service to backout, and a package installation root.
These two arguments are mutually exclusive. If backing out a
patch from a non-native usr partition, the -S option should be
used. If backing out a patch from a client's root
partition (either native or non-native), the -R option
should be used.
Error message:
The service cannot be found on this system.
Explanation and recommended action: You have specified a non-
native service from which to backout a patch, but the
specified service is not installed on your system. Correctly
specify the service when backing out the patch.
Error message:
Only one rootdir may be defined.
Explanation and recommended action: You have specified more than
one package install root using the -R option. The -R option
may be used only once per invocation of backoutpatch.
Error message:
The directory cannot be found on this system.
Explanation and recommended action: You have specified a
directory using the -R option which is either not mounted,
or does not exist on your system. Verify the directory name
and re-backout the patch.
Error message:
Patch has not been successfully applied to this system.
Explanation and recommended action: You have attempted to backout
a patch that is not applied to this system. If you must
restore previous versions of patched files, you may have to
restore the original files from the initial installation CD.
Error message:
Patch has not been successfully applied to this system.
Will remove directory
Explanation and recommended action: You have attempted to back
out a patch that is not applied to this system. While the
patch has not been applied, a residual
/var/sadm/patch/ (perhaps from an unsuccessful
installpatch) directory still exists. The patch cannot be
backed out. If you must restore old versions of the patched
files, you may have to restore them from the initial
installation CD.
Error message:
This patch was obsoleted by patch .
Patches must be backed out in the order in
which they were installed. Patch backout aborted.
Explanation and recommended action: You are attempting to backout
patches out of order. Patches should never be backed-out out
of sequence. This could undermine the integrity of the more
current patch.
Error message:
Patch was installed without backing up the original
files. It cannot be backed out.
Explanation and recommended action: Either the -d option of
installpatch was set when the patch was applied, or the save
area of the patch was deleted to regain space. As a result, the
original files are not saved and backoutpatch cannot be used.
The original files can only be recovered from the original
installation CD.
Error message:
pkgrm of package failed return code .
See /var/sadm/patch//log for reason for failure.
Explanation and recommended action: The removal of one of
patch packages failed. See the log file for the reason for
failure. Correct the problem and run the backout script again.
Error message:
Restore of old files failed.
Explanation and recommended action: The backout script uses the
cpio command to restore the previous versions of the files
that were patched. The output of the cpio command should
have preceded this message. The user should take the
appropriate action to correct the cpio failure.
KNOWN PROBLEMS:
On client server machines the patch package is NOT applied
to existing clients or to the client root template space.
Therefore, when appropriate, ALL CLIENT MACHINES WILL NEED
THE PATCH APPLIED DIRECTLY USING THIS SAME INSTALLPATCH
METHOD ON THE CLIENT. See instructions above for
applying patches to a client.
A bug affecting a package utility (eg. pkgadd, pkgrm, pkgchk)
could affect the reliability of installpatch or backoutpatch
which uses package utilities to install and backout the patch
package. It is recommended that any patch that fixes package
utility problems be reviewed and, if necessary, applied before
other patches are applied. Such existing patches are:
100901 Solaris 2.1
101122 Solaris 2.2
101331 Solaris 2.3
SEE ALSO
pkgadd, pkgchk, pkgrm, pkginfo, showrev, cpio