
<!-- "@(#)failfast	1.7 97/10/03 Sun Microsystems, Inc." >

<title>Failfast Device Driver</title>
<h1>Failfast Device Driver</h1>

Failfast is a mechanism used by the 
    <a href="sc/sc_system">SC</a>
software to enforce its timing requirements. <p>

The SC system uses timeouts to determine if certain components (nodes, for
example) have failed.  To use timeouts effectively there must be a
guarantee that the timeout mechanism functions correctly.  The failfast
mechanism guarantees this by causing a system panic if an event has not
occurred by a certain timeout.  It is better for a badly behaved system
to panic than to have it interfere with the operation of other nodes in
the 
    <a href="sc/cluster">cluster</a>.
<p>

Failfast panics generally occur when a system is overloaded to the
extent that the highest priority processes are not being scheduled for
even a short amount of time, or if the reconfiguration process hangs
for some reason, preventing recovery from completing reliably. <p>

Note that the SC software uses standard techniques provided by the
operating system to implement timeouts and that failfast is only used
as a backup to detect a failure of these techniques. <p>

Failfast can also be used to panic the system if a crucial process
which is required for correct system behavior is killed.  For example,
killing the 
    <tt><a href="sc/cmm">clustd</a></tt>
processes will panic the system. <p>

The failfast mechanism is implemented using a pseudo driver
<tt>/dev/ff</tt> and is available in the <tt>SUNWff</tt> package. <p>

