Patch-ID# 101318-21 Keywords: kernel patch C2+ sx cgfourteen syslog libc lockd tcp ip sockmod timod Synopsis: SunOS 5.3: Jumbo patch for kernel, C2+, sx, cgfourteen, syslog, libc, lockd, tcp, ip, sockmod, timod Date: Jan/04/94 Solaris Release: 2.3 SunOS release: 5.3 Unbundled Product: Unbundled Release: Topic: SunOS 5.3: Jumbo patch for kernel, C2+, sx, cgfourteen, syslog, libc, lockd, tcp, ip, sockmod, timod BugId's fixed with this patch: 1139493 1108615 1139124 1130721 1144765 1146912 1146985 1130721 1143439 1140209 1137581 1144922 1145401 1145746 1150058 1142365 1140047 1123788 1132554 1147226 1139753 1147620 1147165 1150306 1149105 1149088 1123140 1146534 1149928 Changes incorporated in this version: 1151619 Relevant Architectures: sparc Patches accumulated and obsoleted by this patch: 101267-01,101326-01,101349-01,101319-02,101319-02,101346-03 Patches which conflict with this patch: Patches required with this patch: Obsoleted by: Files included with this patch: kernel/unix all of {sun4,sun4c,sun4d,sun4e,sun4m} versions kernel/fs/procfs all of {sun4,sun4c,sun4d,sun4e,sun4m} versions kernel/drv/log kernel/drv/sx_cmem kernel/drv/sx kernel/drv/cgfourteen kernel/misc/seg_drv kernel/sched/TS usr/kernel/sched/R usr/sbin/syslogd postinstall script to edit etc/syslog.conf postremove script to remove edits from etc/syslog.conf usr/lib/libc.a usr/lib/libc.so.1 usr/lib/nfs/lockd usr/lib/pics/libc_pic.a kernel/drv/tcp kernel/drv/ip kernel/strmod/sockmod kernel/strmod/timod Problem Description: 1151619: sockmodwput data fault panic due to socklog problem Problem Description: socklog() was being passed a NULL pointer while calculating the size of the message block. This resulted in the kernel panic with Data Fault. (from 101318-20) 1149928: TCP/IP scalability problems This patch reduces the time spent locking and unlocking the outer perimeters used by TCP and IP. 1149929: STREAMS outer perimeter scalability problems This patch reduces the time spent locking and unlocking the outer perimeters used by TCP and IP. It also reduces the lock contention on the strmsglock (used by the STREAMS allocator) and reduces the time spent running at high IPL from the Ethernet driver. (from 101318-19) 1146534 swift_mmu_writeptp code in wrong order causing watchdog reset. Under heavy load, a SPARCstation 5 will watchdog reset. This has been seen running kenbus, LST, and svvs. (from 101318-18) 1149088 tcp and sockmod does not protect against QUEUE_ptr in T_CONN_RES going away 1123140 transport providers can crash if accessing T_CON_RES QUEUE_ptr field 1149088: tcp and sockmod does not protect against QUEUE_ptr in T_CONN_RES going away 1123140: transport providers can crash if accessing T_CON_RES QUEUE_ptr field If TLI applications close the accepting file descriptor (passed to t_accept) while the t_accept is in progress the kernel can panic in tcp_accept, in sockmod, or in timod. (The sockmod panic will only occur if the file descriptor that is opened by the accept() in the socket library is closed.) (from 101318-17) 1149105 Lost entries in wtmpx and wtmp wtmp/wtmpx and utmp/utmpx corrupted during syncronization (update) (from 101346-03) 1145617: NFS/NIS+ servers + clients hang in tcp_lookup If a Solaris machine receives a tcp packet sent to the all-zeros IP address (an old broadcast address that should no longer by used) the kernel might go in an infinite loop. The loop is in drain_syncq calling tcp_rput calling tcp_lookup_listeners and then calling put. (from 101346-02) 1145661 accept() fails with EPROTO, attempts to reconnect on socket fail Applications can see the socket accept() call fail with errno being EPROTO. This error indicates that the TCP 3-way open handshake failed to complete and should be handled by just retrying the poll/select/accept call. This patch prevents the EPROTO errors from being returned by accept(). (from 101346-01) 1144308 Solaris crashes with urgent data RFC 1122 The machine can get a watchdog reset or alternatively hang when receiving urgent data. If it hangs it hangs "hard" i.e. L1-A does not work, and unpluggingand replugging the keyboard does not work either. A snoop trace of last packet received should have the Urgent flag bit set and with an Urgent pointer of 0. (Note: the 2.2 version of snoop does not print the Urgent pointer field - the 2.3 version does.) (from 101319-02) 1144228 Sparc center 2000 running Solaris 2.2 panics with data fault in do_urg_outofline System panics in various places in do_urg_outofline() routine. Typical stack trace would look like: do_urg_outofline() sockmodrsrv() runservice() with a NULL message block(bp). (from 101319-01) 1137978 telnet returning "protocol error" when attempting to telnet to netbuilder router From either solaris 2.1 or 2.2 system, telnet returns "protocol error" when telneting into the 3com router. (from 101318-16) 1147165: Streams resources depleted suddenly (due to no syncq flow control) A machine can rapidly run out of kernel memory under heavy load. This is signified by netstat -m (on the core dump) reporting tens of thousands of allocated messages. 1150306: data fault in background - streams close race The kernel can crash with a data fault. The stack trace shows that background calling mutex_enter which takes a data fault. (from 101318-15) 1147620 system hangs in deadflck Under certain circumstances, the kernel may hang due to an error in file and record locking. In this case, a kernel thread will be found to be looping infinitly in deadflck(). (from 101318-14) 1139753 locking hangs under heavy load; disturbing ICMP messages Under heavy loads, NFS locking clients may be unable to provide replies to their servers' occasional portmap GETPORT requests within the default RPC timeout. This in turn prevents the server from responding to outstanding locking requests from that client (and others), causing the server lockd to appear to be hung or dead. (from 101318-13) 1132554 fcntl: error No record locks available, lockd: out of lock 1147226 NFS locking broken when byte order is different 1132554: NFS file servers can leak record locks. Eventually all lock requests (including local locks) fail with ENOLCK. Another symptom is syslog messages from lockd (on the server) complaining that it is out of locks. This bug can also cause the server to incorrectly grant lock requests, which can lead to corruption of user data files. 1147226: Patch 101267-01 introduced a bug in NFS clients that could cause locking operations to fail if the server is not running SunOS or if the server is not a SPARC system. The symptom is syslog messages from lockd (on the client), complaining about malformed filehandles. (from 101267-01) 1142365: lockd incorrectly examines export information when comparing filehandles. Consider a scenario where a PC application, running under WABI or SunPC, uses File Sharing to synchronize instances of itself. If one instance is running on an NFS server and another instance is running on an NFS client, the NFS server will allow access to both instances at the same time, when it should really only allow access to one at a time. This can cause data corruption. 1140047: suppose a 3-byte (or bigger) region of an NFS file is locked. Now suppose that one or more bytes in the middle of the region are unlocked, leaving two locked regions on either side of the "hole". The client does not properly manage these two regions when they are unlocked. The problem does not appear until the server reboots and the client attempts to reclaim (relock) at least one of the regions. This can lead to situations where the server thinks a region is locked, but nodbody owns the lock. The server console may display _nfssys: error Stale NFS file handle if the file was deleted before the server rebooted. 1123788: lockd on an NFS client detects and filters out retransmitted requests from the client kernel. The code to detect retransmissions does not look at the filehandle in the request. Although this does not seem to have been a problem in practice, it could conceivably lead to cases where the application gets the wrong return code from a lock request. (from 101318-12) 1150058 SPARCstation-10 SX Vid SIMM Cursor RAM Write Enable is weak and corrupts writes This fix is to the Video SIMM Operating System Driver (cg14 driver) and provides a software workaround to problems observed with a broken cursor image when the cursor is written to. (from 101318-11) Bug id 1146924: SS10-51 SS600-51 will fail "watchdog reset" or hard hang under load (from 101318-10) 1140209 Cannot exit login sessions simultaneously from Alphanumeric terminals properly The zombie processes were not being removed by the parent process when the handler for SIGCHLD was being reset, 1142882: panic on exit The u.u_ttyp field was being set incorrectly when a pre-svr4 module was being pushed. The oldvalue of u.u_ttyp was not saved and later checked to see if it needs to be reset to NULL or not. (from 101318-09) 1143439 using fork() and libaio together leads to system panics When using libaio to do asynchronous I/O in a process and also doing a fork() in the same process, there is a window in which the system will panic. The same phenomenon occurs with multi-threaded processes that use fork1() (this has been observed with SunPC and the volume manager). Finally, using a /proc tool that reads the address space of a running process, like /usr/ucb/ps -ww, can lead to a panic of the same (not identical) sort. (from 101349-01) 1137581 C2+ gets watch dog reset with Sundia 1144922 cgfourteen driver could still get remap panic 1145401 sx driver memory leak 1145746 C2+ panics when creating an X Window The reliability lab typically runs Sundiag on machines continuously for extended periods of time (more than a week). When doing such relibility testing on the SPARstation 10BSX machines we discovered problems: a) machines randomly get a watchdog reset (bug ids (1137581 and 1144922). b) After running the machines for a period of 72 hours or greater the machines seem to hang or behave sluggishly after exiting from Sundiag. (bug id 1145401) c) In some very rare situations, when unmapping a range of virtual addresses cloned for SX, the machine panics, because the thread unmapping the address range holds the writer's lock on the address space and then tries to acquire a reader's lock on the same address space. (Bug id 1145746 (from 101318-08) 1130721 panic messages are not logged in /var/adm/messages previous putback for this bug caused system to panic if more than one syslogd was started (from 101318-07) 1146985 data fault panic in lock_try due to interval timer signal There is a race condition in exit() and lwp_exit() where they are cancelling outstanding itimer() callouts. If the race is lost, a callout remains that eventually fires and attempts to access a non-existent lwp or process, leading to the system panic reported by the customer. (from 101318-06) 1130721 panic messages are not logged in /var/adm/messages Added postinstall script to edit etc/syslog.conf and postremove script to remove the edits. This should have been done as part of 101318-03 (from 101318-05) 1146912 panic: deadlock - cycle in blocking chain when using /proc to read a process When using tools that read the address space of other processes via /proc, there is a window of vulnerability in the operating system that can cause a panic with the message: Deadlock condition detected: cycle in blocking chain. Tools that read the address space of other processes include: /usr/bin/truss /usr/ucb/ps /usr/bin/adb /opt/SUNWspro/bin/dbx 3rd party debuggers (e.g., gdb) The window of vulnerability is extremely small, but the problem has been seen on heavily-loaded multiprocessors. (from 101318-04) 1144765 SunPC fails on sun4m systems running Solaris 2.3 The SunPC card doesn't work on sun4m platforms (from 101318-03) 1130721 panic messages are not logged in /var/adm/messages the mechanism implemented in sunos5.0 to save log messages produced before syslogd is started doesn't allow messages recorded in the message buffer before the reboot to be logged. this patch returns to the original method of saving log messages and corrects the problems which prompted the incorrect fix in 5.0. (from 101318-02) 1108615 I_LOOK etc tests for end of stream by walking mid point qnext Kernel crash (data fault). The pc is in the SAMESTR macro either in the build_sqlist function or in the getendq function. (from 101318-01) 1139493 fcntl(2) => ENOLCK and "klm_lockctl: bad nonblk LOCK error 3" If there are problems communicating with the lock manager on an NFS server and a blocking lock request (e.g., fcntl(..., F_SETLKW, ...)) receives a signal, the lock request might not get cancelled. This would leave the file locked with no way to unlock it, short of rebooting the client or server. (from 101326-01) 1139124 syslog does not output more than approx 100 characters, no errors reported syslog messages longer than 100 characters result in an empty syslogd posting. Only the header of the message is printed. The message part is empty. Patch Installation Instructions: -------------------------------- Generic 'installpatch' and 'backoutpatch' scripts are provided within each patch package with instructions appended to this section. Other specific or unique installation instructions may also be necessary and should be described below. Special Install Instructions: ----------------------------- none Instructions to install patch using "installpatch" -------------------------------------------------- 1. Become super-user. 2. Apply the patch by typing: