Patch-ID# 101318-21
Keywords: kernel patch C2+ sx cgfourteen syslog libc lockd tcp ip sockmod timod
Synopsis: SunOS 5.3: Jumbo patch for kernel, C2+, sx, cgfourteen, syslog, libc, lockd, tcp, ip, sockmod, timod
Date: Jan/04/94

Solaris Release: 2.3

SunOS release: 5.3

Unbundled Product:

Unbundled Release:

Topic: SunOS 5.3: Jumbo patch for kernel, C2+, sx, cgfourteen, syslog, libc, lockd, tcp, ip, sockmod, timod

BugId's fixed with this patch: 1139493 1108615 1139124 1130721 1144765 1146912 1146985 1130721 1143439 1140209 1137581 1144922 1145401 1145746 1150058 1142365 1140047 1123788 1132554 1147226 1139753 1147620 1147165 1150306 1149105 1149088 1123140 1146534 1149928

Changes incorporated in this version: 1151619

Relevant Architectures: sparc

Patches accumulated and obsoleted by this patch: 101267-01,101326-01,101349-01,101319-02,101319-02,101346-03

Patches which conflict with this patch:

Patches required with this patch:

Obsoleted by:

Files included with this patch:


	kernel/unix
		all of {sun4,sun4c,sun4d,sun4e,sun4m} versions
	kernel/fs/procfs
        	all of {sun4,sun4c,sun4d,sun4e,sun4m} versions
        kernel/drv/log
	kernel/drv/sx_cmem
	kernel/drv/sx
	kernel/drv/cgfourteen
	kernel/misc/seg_drv
	kernel/sched/TS
	usr/kernel/sched/R
        usr/sbin/syslogd
	postinstall script to edit etc/syslog.conf 
	postremove script to remove edits from etc/syslog.conf
	usr/lib/libc.a
	usr/lib/libc.so.1
	usr/lib/nfs/lockd
	usr/lib/pics/libc_pic.a
	kernel/drv/tcp
	kernel/drv/ip
	kernel/strmod/sockmod
	kernel/strmod/timod


Problem Description:


1151619: sockmodwput data fault panic due to socklog problem

Problem Description:

socklog() was being passed a NULL pointer while calculating
the size of the message block. This resulted in the kernel
panic with Data Fault.


(from 101318-20)

1149928: TCP/IP scalability problems

This patch reduces the time spent locking and unlocking the outer perimeters
used by TCP and IP.

1149929: STREAMS outer perimeter scalability problems

This patch reduces the time spent locking and unlocking the outer perimeters
used by TCP and IP. It also reduces the lock contention on the strmsglock
(used by the STREAMS allocator) and reduces the time spent running at
high IPL from the Ethernet driver.

(from 101318-19)

1146534 swift_mmu_writeptp code in wrong order causing watchdog reset.


Under heavy load, a SPARCstation 5 will watchdog reset.  This
has been seen running kenbus, LST, and svvs.


(from 101318-18)

1149088 tcp and sockmod does not protect against QUEUE_ptr in T_CONN_RES going away
1123140 transport providers can crash if accessing T_CON_RES QUEUE_ptr field


1149088: tcp and sockmod does not protect against QUEUE_ptr in T_CONN_RES 
going away
1123140: transport providers can crash if accessing T_CON_RES QUEUE_ptr field

If TLI applications close the accepting file descriptor (passed
to t_accept) while the t_accept is in progress the kernel
can panic in tcp_accept, in sockmod, or in timod.
(The sockmod panic will only occur if the file descriptor that is opened
by the accept() in the socket library is closed.)


(from 101318-17)

1149105 Lost entries in wtmpx and wtmp

 wtmp/wtmpx and utmp/utmpx corrupted during syncronization (update) 

(from 101346-03)
 
 
1145617: NFS/NIS+ servers + clients hang in tcp_lookup
 
If a Solaris machine receives a tcp packet sent to the all-zeros IP address
(an old broadcast address that should no longer by used) the kernel
might go in an infinite loop.
 
The loop is in drain_syncq calling tcp_rput calling tcp_lookup_listeners and
then calling put.
 
 
(from 101346-02)
 
1145661 accept() fails with EPROTO, attempts to reconnect on socket fail
 
 
Applications can see the socket accept() call fail with errno being EPROTO.
This error indicates that the TCP 3-way open handshake failed to complete
and should be handled by just retrying the poll/select/accept call.
 
This patch prevents the EPROTO errors from being returned by accept().
 
 
(from 101346-01)
 
1144308 Solaris crashes with urgent data RFC 1122
 
 
The machine can get a watchdog reset or alternatively hang when receiving
urgent data. If it hangs it hangs "hard" i.e. L1-A does not work, and unpluggingand replugging the keyboard does not work either.

A snoop trace of last packet received should have the Urgent flag bit set
and with an Urgent pointer of 0. (Note: the 2.2 version of snoop does not
print the Urgent pointer field - the 2.3 version does.)
 
 
(from 101319-02)
 
1144228 Sparc center 2000 running Solaris 2.2 panics with data fault in do_urg_outofline
 
 
System panics in various places in do_urg_outofline() routine.  Typical
stack trace would look like:
 
do_urg_outofline()
sockmodrsrv()
runservice()
 
with a NULL message block(bp).
 
 
(from 101319-01)
 
1137978 telnet returning "protocol error" when attempting to telnet to netbuilder router
 
From either solaris 2.1 or 2.2 system, telnet returns "protocol error"
when telneting into the 3com router.


(from 101318-16)


1147165: Streams resources depleted suddenly (due to no syncq flow control)

A machine can rapidly run out of kernel memory under heavy load.
This is signified by netstat -m (on the core dump) reporting
tens of thousands of allocated messages.

1150306: data fault in background - streams close race

The kernel can crash with a data fault. The stack trace shows
that background calling mutex_enter which takes a data fault.


(from 101318-15)

1147620 system hangs in deadflck

Under certain circumstances, the kernel may hang due to an error in file
and record locking.  In this case, a kernel thread will be found to be
looping infinitly in deadflck().


(from 101318-14)

1139753 locking hangs under heavy load; disturbing ICMP messages


Under heavy loads, NFS locking clients may be unable to provide replies
to their servers' occasional portmap GETPORT requests within the default RPC
timeout.  This in turn prevents the server from responding to outstanding
locking requests from that client (and others), causing the server lockd to
appear to be hung or dead.


(from 101318-13)

1132554 fcntl: error No record locks available, lockd: out of lock
1147226 NFS locking broken when byte order is different


1132554: NFS file servers can leak record locks.  Eventually all lock
requests (including local locks) fail with ENOLCK.  Another symptom is
syslog messages from lockd (on the server) complaining that it is out
of locks.  This bug can also cause the server to incorrectly grant
lock requests, which can lead to corruption of user data files.

1147226: Patch 101267-01 introduced a bug in NFS clients that could
cause locking operations to fail if the server is not running SunOS or
if the server is not a SPARC system.  The symptom is syslog messages
from lockd (on the client), complaining about malformed filehandles.


(from 101267-01)

1142365: lockd incorrectly examines export information when comparing
filehandles.  Consider a scenario where a PC application, running under
WABI or SunPC, uses File Sharing to synchronize instances of itself.
If one instance is running on an NFS server and another instance is
running on an NFS client, the NFS server will allow access to both
instances at the same time, when it should really only allow access to
one at a time.  This can cause data corruption.

1140047: suppose a 3-byte (or bigger) region of an NFS file is locked.
Now suppose that one or more bytes in the middle of the region are
unlocked, leaving two locked regions on either side of the "hole".  The
client does not properly manage these two regions when they are
unlocked.  The problem does not appear until the server reboots and the
client attempts to reclaim (relock) at least one of the regions. This
can lead to situations where the server thinks a region is locked, but
nodbody owns the lock.  The server console may display
 
    _nfssys: error Stale NFS file handle
 
if the file was deleted before the server rebooted.
 
1123788: lockd on an NFS client detects and filters out retransmitted
requests from the client kernel.  The code to detect retransmissions
does not look at the filehandle in the request.  Although this does not
seem to have been a problem in practice, it could conceivably lead to
cases where the application gets the wrong return code from a lock
request.

(from 101318-12)

1150058 SPARCstation-10 SX Vid SIMM Cursor RAM Write Enable is weak and corrupts writes

This fix is to the Video SIMM Operating System Driver (cg14 driver) and
provides a software workaround to problems observed with a broken cursor image
when the cursor is written to.


(from 101318-11)

Bug id 1146924:

SS10-51 SS600-51 will fail "watchdog reset" or hard hang under load


(from 101318-10)

1140209 Cannot exit login sessions simultaneously from Alphanumeric terminals properly

The zombie processes were not being removed by the parent process
when the handler for SIGCHLD was being reset,


1142882: panic on exit

The u.u_ttyp field was being set incorrectly when a pre-svr4 module
was being pushed. The oldvalue of u.u_ttyp was not saved and later
checked to see if it needs to be reset to NULL or not.
(from 101318-09)

1143439 using fork() and libaio together leads to system panics


When using libaio to do asynchronous I/O in a process
and also doing a fork() in the same process, there is
a window in which the system will panic.  The same
phenomenon occurs with multi-threaded processes that use
fork1() (this has been observed with SunPC and the volume
manager).  Finally, using a /proc tool that reads the
address space of a running process, like /usr/ucb/ps -ww,
can lead to a panic of the same (not identical) sort.


(from 101349-01)

1137581 C2+ gets watch dog reset with Sundia
1144922 cgfourteen driver could still get remap panic
1145401 sx driver memory leak
1145746 C2+ panics when creating an X Window
 
 
        The reliability lab typically runs Sundiag on machines continuously
        for extended periods of time (more than a week). When doing such
        relibility testing on the SPARstation 10BSX machines we discovered
        problems:

                a) machines randomly get a watchdog reset (bug ids
                   (1137581 and 1144922).
                b) After running the machines for a period of 72 hours or
                   greater the machines seem to hang or behave sluggishly
                   after exiting from Sundiag. (bug id 1145401)
        	c) In some very rare situations, when unmapping a range of
           	virtual addresses cloned for SX, the machine panics,
           	because the thread unmapping the address range holds the
           	writer's lock on the address space and then tries to acquire
           	a reader's lock on the same address space. (Bug id 1145746

(from 101318-08)

1130721 panic messages are not logged in /var/adm/messages

previous putback for this bug caused system to panic if more than one syslogd
was started


(from 101318-07)

1146985 data fault panic in lock_try due to interval timer signal


There is a race condition in exit() and lwp_exit()
where they are cancelling outstanding itimer()
callouts.  If the race is lost, a callout remains
that eventually fires and attempts to access a
non-existent lwp or process, leading to the
system panic reported by the customer.


(from 101318-06)

1130721 panic messages are not logged in /var/adm/messages

Added postinstall script to edit etc/syslog.conf and postremove
script to remove the edits. This should have been done as part of
101318-03

(from 101318-05)

1146912 panic: deadlock - cycle in blocking chain when using /proc to read a process


When using tools that read the address space of other
processes via /proc, there is a window of vulnerability in
the operating system that can cause a panic with the message:

	Deadlock condition detected: cycle in blocking chain.

Tools that read the address space of other processes include:
	/usr/bin/truss
	/usr/ucb/ps
	/usr/bin/adb
	/opt/SUNWspro/bin/dbx
	3rd party debuggers (e.g., gdb)

The window of vulnerability is extremely small, but the
problem has been seen on heavily-loaded multiprocessors.


(from 101318-04)

1144765 SunPC fails on sun4m systems running Solaris 2.3


	The SunPC card doesn't work on sun4m platforms


(from 101318-03)

1130721 panic messages are not logged in /var/adm/messages


        the mechanism implemented in sunos5.0 to save log messages
        produced before syslogd is started doesn't allow messages
        recorded in the message buffer before the reboot to be logged.
        this patch returns to the original method of saving log messages
        and corrects the problems which prompted the incorrect fix in 5.0.


(from 101318-02)

1108615 I_LOOK etc tests for end of stream by walking mid point qnext


Kernel crash (data fault).
The pc is in the SAMESTR macro either in the build_sqlist function or in the
getendq function.


(from 101318-01)

1139493 fcntl(2) => ENOLCK and "klm_lockctl: bad nonblk LOCK error 3"


If there are problems communicating with the lock manager on an NFS
server and a blocking lock request (e.g., fcntl(..., F_SETLKW, ...))
receives a signal, the lock request might not get cancelled.  This
would leave the file locked with no way to unlock it, short of
rebooting the client or server.


(from 101326-01)

1139124 syslog does not output more than approx 100 characters, no errors reported


syslog messages longer than 100 characters result in an empty syslogd
posting.  Only the header of the message is printed.  The message part
is empty.


Patch Installation Instructions:
--------------------------------
Generic 'installpatch' and 'backoutpatch' scripts are provided
within each patch package with instructions appended to this section.
Other specific or unique installation instructions may also be
necessary and should be described below.

Special Install Instructions:
-----------------------------

none


Instructions to install patch using "installpatch"
--------------------------------------------------

1.  Become super-user.

2.  Apply the patch by typing:

	<dir>/<patch-id>/installpatch <dir>/<patch-id>

    where <dir> is the directory containing the patch and <patch-id>
    is the patch number.  <dir> must be a full path name.

    Example:

	# /tmp/123456-01/installpatch /tmp/123456-01

3.  If any errors are reported, see "Patch Installation Errors" in
    the Command Descriptions section below.

    Rebooting the system or restarting the application after a successful
    patch installation is usually necessary to utilize patch.

    NOTE: On client server machines the patch package is NOT applied
	  to existing clients or to the client root template space.  
	  Therefore, when appropriate, ALL CLIENT MACHINES WILL NEED 
	  THE PATCH APPLIED DIRECTLY USING THIS SAME INSTALLPATCH 
	  METHOD ON THE CLIENT.  See the next section for instructions
	  for installing a patch on a client.


Instructions for installing a patch on a diskless or dataless client
--------------------------------------------------------------------

1.  Before applying the patch, the following command must be executed
    on the server to give the client read-only, root access to the
    exported /usr file system so that the client can execute the
    pkgadd command:

    share -F nfs -o ro,anon=0 /export/exec/Solaris_2.1_sparc.all/usr

    The command:

    share -F nfs -o ro,root=<client_name> \
		/export/exec/Solaris_2.1_sparc.all/usr

    accomplishes the same goal, but only gives root access to the
    client specified in the command.

2.  Login to the client system and become super-user.

3.  Continue with step 2 in the "Instructions to install patch using
    installpatch" section above.


Instructions for backing out patch using "backoutpatch"
-------------------------------------------------------

1.  Become super-user. 

2.  Change directory to /var/sadm/patch:
 
        cd /var/sadm/patch
 
3.  Backout patch by typing:
 
        <patch-id>/backoutpatch <patch-id>
 
    where <patch-id> is the patch number.

    Example:

	# 123456-01/backoutpatch 123456-01

4.  If any errors are reported, see "Patch Backout Errors" in 
    the Command Descriptions section below.


Instructions for identifying patches installed on system:
----------------------------------------------------------

Type:

    installpatch -p

This command produces a list of the patch IDs of the patches that
are currently applied to the system.  When executed with the -p
option, the installpatch command does not modify the system in
any way.


Command Descriptions
--------------------

NAME

     installpatch - apply patch package to Solaris 2.x system
     backoutpatch - remove patch package from Solaris 2.x system

SYNOPSIS

     installpatch [-u] [-d] <patch directory>
     backoutpatch <patch-id>

DESCRIPTION

     These installation and backout utilities apply only to
     Solaris 2.x associated patches. They do not apply to Solaris
     1.x associated patches. These utilities are currently only
     provided with each patch package and are not included with
     the standard Solaris 2.x release software.

OPTIONS

    installpatch

	-u  unconditional install, do not verify file attributes

	-d  do not save original files being replaced

	-p  print a list of the patches currently applied on the system

DIAGNOSTICS

    Patch Installation Errors:
    --------------------------

    Error message: Patch has already been applied.

      Explanation and recommended action: This patch has already been
	applied to the system.  If the patch has to be reapplied
	for some reason, backout the patch and then reapply it.

    Error message: This patch is obsoleted by a patch which has
	already been applied to this system.  Application of this
	patch would leave the system in an inconsistent state.
	Patch installation is aborted.

      Explanation and recommended action: Occasionally, a patch
	is replaced by a new patch which incorporates the bug fixes
	in the old patch and supplies additional fixes also.  At
	this time, the earlier patch is no longer made available
	to users.  The second patch is said to "obsolete" the
	first patch.  However, it is possible that some users
	may still have the earlier patch and try to apply it to
	a system on which the later patch is already applied.
	If the obsoleted patch were allowed to be applied, the
	additional fixes supplied by the later patch would no
	longer be available, and the system would be left in an
	inconsistent state.  This error message indicates that
	the user attempted to install an obsoleted patch.  There
	is no need to apply this patch because the later patch
	has already supplied the fix.

    Error message: The packages to be patched are not installed on
	this system.

      Explanation and recommended action:  None of the packages
	to be updated by this patch are installed on the system.
	Therefore, this patch cannot be applied to the system.

    Error message: This patch is not applicable to client systems.

      Explanation and recommended action: The patch is only
	applicable to servers and standalone machines.  Attempting
	to apply this patch to a client system will have no effect on
	the system.

    Error message: The /usr/sbin/pkgadd command is not executable.

      Explanation and recommended action:   The /usr/sbin/pkgadd
	command cannot be executed.  The most likely cause of this
	is that installpatch is being run on a diskless or dataless
	client and the /usr file system was not exported with
	root access to the client.  See the section above on
	"Instructions for installing a patch on a diskless or
	dataless client".

    Error message: Patch directory is not of expected format.

      Explanation and recommended action: The patch directory
	supplied as an argument to installpatch did not contain
	any patch packages.  Verify that the argument supplied
	to installpatch is correct. 

    Error message: The following validation errors were found:
	           <validation error(s)>

      Explanation and recommended action: Before applying the patch,
	the patch application script verifies that the current
	versions of the files to be patched have the expected
	fcs checksums and attributes.  If a file to be patched has
	been modified by the user, the user is notified of this
	fact.  The user then has the opportunity to save the
	file and make a similar change to the patched version.
	For example, if the user has modified /etc/inet/inetd.conf
	and /etc/inet/inetd.conf is to be replaced by the patch,
	the user can save the locally modified /etc/inet/inetd.conf
	file and make the same modification to the new file
	after the patch is applied.  After the user has noted all
	validation errors and taken the appropriate action for
	each one, the user should re-run installpatch using
	the "-u" (for "unconditional") option. This time, the
	patch installation will ignore validation errors and
	install the patch anyway.

    Error message:  Insufficient space in /var/sadm to save old files.

      Explanation and recommended action:  There is insufficient
        space in the /var/sadm directory to save old files. 
	The user has two options for handling this problem: 
	(1) generate additional disk space by deleting unneeded
	files, or (2) override the saving of the old files by
	using the "-d" (do not save) option when running installpatch.
	However if the user elects not to save the old versions of
	the files to be patched, backoutpatch CANNOT be used.

	One way to regain space on a system is to remove the
	save area for previously applied patches.  Once the user
	has decided that it is unlikely that a patch will be
	backed out, the user can remove the files that were saved
	by installpatch.  The following commands should be executed
	to remove the saved files for patch xxxxxx-yy:

	cd /var/sadm/patch/xxxxxx-yy
	rm -r save/*
	rm .oldfilessaved

	After these commands have been executed, patch xxxxxx-yy can
	no longer be backed out.

    Error message:  Save of old files failed.

      Explanation and recommended action:  Before applying the patch,
	the patch installation script uses cpio to save the old
	versions of the files to be patched.  This error message
	means that the cpio failed.  The output of the cpio
	would have been preceded this message.  The user should
	take the appropriate action to correct the cpio failure.
	A common reason for failure will be insufficient disk
	space to save the old versions of the files.  The user
	has two options for handling insufficient disk space:
        (1) generate additional disk space by deleting unneeded
        files, or (2) override the saving of the old files by
        using the "-d" option when running installpatch. However
        if the user elects not to save the old versions of the
        files to be patched, the patch CANNOT be backed out.

    Error message: Pkgadd of <pkgname> package failed.  See
	       /tmp/log.<patchnum> for reason for failure.

      Explanation and recommended action:  The installation of one of
	patch packages failed.  Any previously installed packages
	in the patch should have been removed.  See the log file
	for the reason for failure.  Correct the problem and
	re-apply the patch.

    Error message: error while adding patch to root template

      Explanation and recommended action:  The install script
	determined this system to be a client server.  The attempt 
	to apply the patch package to the appropriate root
	template space located under /export/root/templates
	failed unexpectedly.  Check the log file for any failure
	messages.  Correct the problem and re-apply the patch.


    Patch Backout Errors:
    ---------------------

    Error message:  Patch <patchnum> has not been applied to this system.

      Explanation and recommended action:  The user has attempted to back
	out a patch that was never applied to this system.  It is
	possible that the patch was applied, but that the patch
	directory /var/sadm/patch/<patchnum> was deleted somehow.
	If this is the case, the patch cannot be backed out.  The
	user may have to restore the original files from the
	initial installation CD.

    Error message:  Patch <patchnum> was installed without backing up the 
		original files.  It cannot be backed out.

      Explanation and recommended action:  Either the -d option of
	installpatch was set when the patch was applied, or the save
	area of the patch was deleted to regain space.  As a result, the
	original files are not saved and backoutpatch cannot be used.  The 
	original files can only be recovered from the original 
	installation CD.

    Error message: Pkgrm of <pkgname> package failed.  See
	       /var/sadm/patch/<patchnum>/log for reason for failure.

      Explanation and recommended action:  The removal of one of
	patch packages failed.  See the log file
	for the reason for failure.  Correct the problem and
	run the backout script again.

    Error message:  Restore of old files failed.

      Explanation and recommended action:  The backout script uses the
	cpio command to restore the previous versions of the files
	that were patched.  The output of the cpio command should
	have preceded this message.  The user should take the
	appropriate action to correct the cpio failure.

KNOWN PROBLEMS:

     On client server machines the patch package is NOT applied
     to existing clients or to the client root template space.
     Therefore, when appropriate, ALL CLIENT MACHINES WILL NEED
     THE PATCH APPLIED DIRECTLY USING THIS SAME INSTALLPATCH
     METHOD ON THE CLIENT.  See instructions above for
     applying patches to a client.
 
     After a patch package has been installed pkginfo(1) will
     not recognize the SUNW_PATCHID macro in the patch package
     pkginfo file.  Instead, to identify patches installed on
     the system use the grep command method described in the
     patch README.

     The pkgadd command shipped with Solaris 2.1 fails (drops core
     without any error message) when there are more than 100
     entries in the /etc/mnttab file.  This means that installpatch
     can fail, because it uses pkgadd.  Since this is very likely on
     any big system with lots of automounts, ANY patch could fail.
     Applying patch 100901-01 fixes this problem (the README for 
     patch 100901 mentions shutting down the automounter while 
     applying it).

SEE ALSO
     pkgadd(1), pkgchk(1), pkgrm(1), pkginfo(1)