#! /bin/sh
# File: hafmconfig
#
#ident "@(#)hafmconfig   1.47     97/07/14 SMI"
#
# Copyright 07/14/97 Sun Microsystems, Inc.  All Rights Reserved.
#
# High-Availability Fault Monitoring Configuration file
# This file is dot-included into sh shell scripts for fault monitoring.
# It also (for convenience) defines some configuration variables
# that are for the mainstream of HA (not just for fault monitoring).

# Disclaimer and Caution: This file is purely for the internal use
# of Sun's High Availability product, and is not to be modified by
# customers or customers applications.  The format and meaning of this
# file will definitely change in future releases of the product.
# It is especially not feasible to decrease values in this file in order to
# acheive faster fault detection and takeover.  The values are as large as
# they are based on our experience in development and testing.  Attempts to
# set them to lower values will merely cause instability of the HA
# configuration, e.g., a server taking over from its sibling server 
# merely because the sibling server is moderately loaded.


# All the environment variables defined for fault monitoring start with the
# prefix "HA_FM_".  Other more general HA variables do not start with
# "HA_FM_", rather, they just start with "HA_".


# Path of file containing "suspend" boolean.  If the file contents
# are the character "T" we don't run any fault probes.  
# The file is locked with an exclusive lock while it is being updated.
HA_FM_SUSPEND_FILE=$HA_FILES/hafmsuspend
export HA_FM_SUSPEND_FILE


# Config variable HA_FM_MIRROR_SECS says how much extra time a Data
# Service fault monitor must allow, over and above its other time-outs,
# for mirroring to mask a disk fault.  The motivation is that 
# Solaris disk drivers take some time to notice that a disk is bad.
# Mirroring will mask a single disk fault, but because of the time
# consumed by the disk driver, it takes some elapsed time to mask
# the fault.  If a Data Service is too eager to time-out, then
# there won't be enough time for the mirroring recovery code to
# come into play.  Thus, a Data Service fault monitor must *add* 
# HA_FM_MIRROR_SECS onto whatever time-out would otherwise be using.
HA_FM_MIRROR_SECS=180
export HA_FM_MIRROR_SECS


# Config variable HA_FM_NET_PUBBADTAKEOVER says whether the failure of a 
# host's public network interface(s) will cause its brother host to takeover.
# Legal values are the strings:  NO  ALL  ANY
# "NO" means don't takeover.  "ALL" and "ANY" are designed for multi-homed
# hosts:  "ALL" means all of a host's public network interfaces must be bad
# in order for a takeover by its brother host to occur.  "ANY" is more 
# aggressive about takeovers -- if any of the interfaces are bad, a takeover 
# by the brother host will occur.  
# For single-home hosts, the distinction between ANY and ALL is vacuous -- 
# they are equivalent.
HA_FM_NET_PUBBADTAKEOVER=ANY
export HA_FM_NET_PUBBADTAKEOVER


# The config variable HA_FM_NET_BADREPORTMINS controls how often some fault 
# probes will report a problem with a bad network interface.  The idea is
# is that we don't want to generate identical error reports over and over
# again at a high rate.
HA_FM_NET_BADREPORTMINS=60
export HA_FM_NET_BADREPORTMINS


# The config variable HA_FM_NET_PROBEBROTHERSECS controls how often the
# net_probe_brother fault probe attempts to RPC to the brother host, for
# both public and private subnetwork interfaces.  More precisely,
# it is how long to sleep between probes.
HA_FM_NET_PROBEBROTHERSECS=30
export HA_FM_NET_PROBEBROTHERSECS


# Config variable HA_FM_NSTOOLONGSECS says how quickly the name service must
# respond in order for the fault probing scripts to consider it to be up.  
# If it does not respond within that many seconds, we consider it to be
# unavailable, which causes (some) takeovers to be inhibited.
# Thus, this time is relatively short.  
# Alas, it is problematic to make it too short.  The probe of the
# name service looks for a non-existent hostname, because a real existing name
# may be cached in the client, or may be found in the local /etc/hosts
# file if the site's /etc/nsswitch.conf file lists "files" first.
# Now, the reason the timeout cannot be too short is as follows:
# Suppose the /etc/nsswitch.conf file lists "dns", for example,
#    hosts: nisplus dns files
# The fact that we are searching for a non-existent hostname will cause
# dns to be consulted.  
# Suppose the /etc/resolv.conf file for dns happens to give an ordered list of
# domain name service servers to try where the first one in the ordered list
# happens to be down.  Then the time to respond to a request can be large, for 
# example, 18 seconds.  But we really don't want a single dns server being 
# down to cause takeovers to be inhibited, mainly because dns is not the 
# preferred normal name service in the Solaris environment, rather, nisplus 
# or nis (formerly "yp") are.
# It is also problematic to make HA_FM_NSTOOLONGSECS too large.  Here's
# why.  In probing the demons related to nfs service, if a demon does not
# respond within a timeout, the probe then executes some logic to try to
# determine if the demon's non-response was because it was hung up waiting
# for the name service.  With a short value of HA_FM_NSTOOLONGSECS, we
# detect the fact that the name service wasn't responding quickly, and
# we avoid blaming the nfs related demon -- thus, we inhibit takeovers.
# The point here is that if HA_FM_NSTOOLONGSECS is too large relative
# to the timeout for non-responsiveness of an nfs related demon, then
# we will end up concluding that the name service is NOT to blame,
# even though the real reason why the nfs related demon did not respond
# may be because it was hung up waiting for the name service.
# Here's a further delimna.  Nothing checks that for the two HA servers
# that their /etc/nsswitch.conf files are the same, or that their
# /etc/resolv.conf files are the same.  So the two HA servers can have
# different behavior with respect to using the name service -- one may be 
# sluggish due to going down long timeout chains, and one may be quick. 
# 15-Jun-1994: was 15 seconds, that was too short, see comments above.
HA_FM_NSTOOLONGSECS=30
export HA_FM_NSTOOLONGSECS


# Config variable HA_FM_NFS_LOCALRESTARTGRACESECS is for NFS fault probe.
# It defines a grace period, which is used when one host, HostA, has
# detected an NFS problem with its brother host (HostB).  The grace period
# period says how much extra time HostA will give to HostB before doing a takeover.
# During this period, HostA keeps retrying the NFS probe.
# The grace period serves two purposes, and therefore must be large:
# (1) HostB is monitoring his own service locally, and can restart demons
# that are in trouble.  We want to give this local restart a chance to occur.
# (2) HostB may simply be temporarily overloaded.  The grace period gives an
# opportunity for HostB to respond if his load eases up enough for him to do so.
HA_FM_NFS_LOCALRESTARTGRACESECS=60
export HA_FM_NFS_LOCALRESTARTGRACESECS


# The config variable HA_FM_RPCBIND_CONNREFUSED is for a scenario
# where the prober discovers that the probee host is refusing
# tcp connections for rpcbind.  Typically, that means that rpcbind
# is not running.  This config variables says how long in seconds
# the prober should wait waiting for the rpcbind daemon to exist
# again.  Recall that rpcbind has a dump and resume feature, such
# that an administrator could (in principle) be restarting it.
HA_FM_RPCBIND_CONNREFUSED=360
export HA_FM_RPCBIND_CONNREFUSED


# The config variables HA_FM_NFS_SUPPRESS_TAKEOVER_ZZZ control whether 
# unresponsiveness of an individual service, ZZZ, causes a takeover.
# 0 means unresponsive does cause takeover, but 1 means takeover is inhibited.
HA_FM_NFS_SUPPRESSTAKEOVER_RPCBIND=0
export HA_FM_NFS_SUPPRESSTAKEOVER_RPCBIND
HA_FM_NFS_SUPPRESSTAKEOVER_MOUNTD=0
export HA_FM_NFS_SUPPRESSTAKEOVER_MOUNTD
HA_FM_NFS_SUPPRESSTAKEOVER_NFSD=0
export HA_FM_NFS_SUPPRESSTAKEOVER_NFSD
# Statd and lockd are lumped together as "locking".  The default and
# extremely recommended value is 1, i.e., statd and lockd problems do
# not cause a takeover, for the following reason.  A down client, or
# a client which itself has a hung lockd/statd (!) can end up causing
# the lockd/statd on one of our HA servers to hang, when the server
# lockd/statd tries to do callbacks to the down/hung client.  
# And takeover will not fix the problem; the new server will also get 
# stuck with the bad client.  
HA_FM_NFS_SUPPRESSTAKEOVER_LOCKING=1
export HA_FM_NFS_SUPPRESSTAKEOVER_LOCKING


# The config variable HA_FM_SUPPRESSTAKEOVER_NAMESERVICE is similar
# in defintion to the other SUPPRESSTAKEOVER_ZZZ variables, but its
# extremely recommended value is "1".  Basicly, if name service is not
# available, then after a takeover, the new primary host is unlikely to
# get very far in starting up services.  
HA_FM_SUPPRESSTAKEOVER_NAMESERVICE=1
export HA_FM_SUPPRESSTAKEOVER_NAMESERVICE


# Config variable HA_FM_NFS_TOUCHFILE controls whether we try to access
# a file in the root directory of each monitored file system.  Values
# are as follows (these are all constants), with the following meanings:
# NONE: don't access the file at all, and probed host does not create the file
# RDONLY: access the file read-only.  Create it read-only.
# RDWR: access the file for both reading and writing
# Choosing RDWR will cause the nfs fault probes to create the file as world
# read-writable.  Even though it is world read-writable, the opportunity
# that that presents to a malicious user is limited, because the
# nfs fault probes continually truncate the file to have size 1 byte.
# The config variable itself:
HA_FM_NFS_TOUCHFILE=RDWR
export HA_FM_NFS_TOUCHFILE


# Config variable HA_FM_NFS_LOCKFILE controls whether or not we the
# nfs fault probes will try to get a lock on a file.  The motivation
# for having this configurable is as an escape hatch so that we can
# turn it off easily.
HA_FM_NFS_LOCKFILE=1
export HA_FM_NFS_LOCKFILE


# The following config variables say what timeout, in seconds, to use when 
# probing the various demons for nfs service.  Most of them
# are set to 2 minutes, based on experiments with a production file
# server; do not try to set them lower.
# The timeout value for lockd is especially high because lockd does
# callbacks on client machines, and a client may be down when the
# callback is attempted.  If the client is down, then the lockd will
# will do two rpc client_create function calls, in sequence, attempting
# to contact it.  Each client_create takes 2 minutes and 30 seconds
# to timeout (with Solaris 2.4).  Note that the timeout value we
# currently set here for lockd of 4 minutes 30 seconds is NOT as
# big as 2*(2m30s) because we assume that the trycommand function in 
# nfs_probe_one_common will itself try twice before doing a takeover.  
# We may be able to decrease this timeout when lockd improvements
# are made.
# Beware: default lockd recovery takes 45 seconds, so don't try to
# make the timeout less than that.
# The 4m30s is simply the sum of the 2m30s bottleneck plus the 2m that
# we give to every other demon.
HA_FM_NFS_TIMEOUT_RPCBIND=120
export HA_FM_NFS_TIMEOUT_RPCBIND
HA_FM_NFS_TIMEOUT_MOUNTD=120
export HA_FM_NFS_TIMEOUT_MOUNTD
HA_FM_NFS_TIMEOUT_NFSD=120
export HA_FM_NFS_TIMEOUT_NFSD
HA_FM_NFS_TIMEOUT_LOCKD=270
export HA_FM_NFS_TIMEOUT_LOCKD
HA_FM_NFS_TIMEOUT_STATD=120
export HA_FM_NFS_TIMEOUT_STATD

# XXX Old name for HA_FM_NFS_TIMEOUT_MOUNTD, remove this eventually:
HA_MOUNTTIMEOUT=$HA_FM_NFS_TIMEOUT_MOUNTD
export HA_MOUNTTIMEOUT

# XXX Old name for HA_FM_NFS_TIMEOUT_LOCKD, remove this eventually:
HA_LOCKDTIMEOUT=$HA_FM_NFS_TIMEOUT_LOCKD
export HA_LOCKDTIMEOUT


# Config vars that control the behavior of the nfs_mon.c nfs fault
# probing daemon.  
# The config var HA_FM_NFS_MON_POLLSECS says how often the nfs_mon.c daemon
# wakes up.  This is the basic polling interval of nfs_mon.c.  All the
# probing actions of nfs_mon.c are driven off of this basic polling interval.
# The frequency of those actions is specified in units of the basic
# polling interval, that is, if another action has frequency K, then
# it means that said action will get done only on every K'th iteration
# of the basic polling interval.
HA_FM_NFS_MON_POLLSECS=5
export HA_FM_NFS_MON_POLLSECS
# The config var HA_FM_NFS_MON_FREQ_DAEMON_NULL_RPC gives the frequency
# of doing NULL RPC fault probes, in units of the basic polling interval.
# Also, each time this action runs, it probes only one of the nfs-related
# rpc daemons, but in a round-robin fashion, such that successive runs
# of this action will probe all of the nfs-related rpc daemons.
# Thus, the elapsed time to probe *all* of the daemons will be:
#   (# daemons) * HA_FM_NFS_MON_FREQ_DAEMON_NULL_RPC * HA_FM_NFS_MON_POLL_SECS
HA_FM_NFS_MON_FREQ_DAEMON_NULL_RPC=1
export HA_FM_NFS_MON_FREQ_DAEMON_NULL_RPC
# The config var HA_FM_NFS_MON_FREQ_NFS gives the frequency of making
# client side-nfs fault probes, that is, of doing an end-to-end test of
# nfs.  The frequency is in units of the basic polling interval.
# Also, if we have many different exported file systems to probe,
# this action will probe only one of them each time it runs, but in
# a round-robin fashion, such that successive runs of this action will
# probe all of the exported file systems.
# Thus, the elapsed time to probe all of the exported file systems will
# be:
#   (# file systems) * HA_FM_NFS_MON_FREQ_NFS * HA_FM_NFS_MON_POLLSECS
HA_FM_NFS_MON_FREQ_NFS=1
export HA_FM_NFS_MON_FREQ_NFS
# The remaining two config vars for nfs_mon.c are:
#   HA_FM_NFS_MON_FREQ_NFS_MOUNT and
#   HA_FM_NFS_MON_FREQ_NFS_LOCK
# They are given in units not of HA_FM_NFS_MON_POLLSECS, but of
# HA_FM_NFS_MON_FREQ_NFS.  That is, for example, if 
# HA_FM_NFS_MON_FREQ_NFS_MOUNT is 4, then it means that only on every
# fourth time that a file system is probed by HA_FM_NFS_MON_FREQ_NFS will
# it also be probed for a mount working okay.
# Similarly for HA_FM_NFS_MON_FREQ_NFS_LOCK.
# Should set HA_FM_NFS_MON_FREQ_NFS_MOUNT big, as a mount request is 
# very expensive on the server side, e.g., it must look the client up 
# in the name service.
HA_FM_NFS_MON_FREQ_NFS_MOUNT=4
export HA_FM_NFS_MON_FREQ_NFS_MOUNT
# Likewise, should set HA_FM_NFS_MON_FREQ_NFS_LOCK big, as a locking 
# operation is very expensive on both the server and the client side.
HA_FM_NFS_MON_FREQ_NFS_LOCK=4
export HA_FM_NFS_MON_FREQ_NFS_LOCK




# nfs local restart:
# The following three config variables control whether, and how often,
# a server will attempt to restart nfs demons on itself ("local restart"),
# without waiting for a takeover by the sibling server to happen.
# The local restart is done by simply doing a cluster reconfiguration.
# A local restart is done only if one or more of the nfs related demons no 
# longer exists according to ps(1).  (We do NOT kill a demon that exists but
# is hung/slow.)  
# 1. Config variable HA_FM_NFS_LOCALRESTART says whether or not to
# do local restarts.  1 means do them, 0 do not.
# 2. Config variable HA_FM_NFS_LOCALRESTART_UPMINS controls how often
# local restart can happen.
# We must avoid doing the local restart too often, because a
# server that is really sick and unable to restart its demons
# successfully should basicly let its brother takeover from it, 
# but reconfiguration keeps the brother from taking over.
# Furthermore, local restart is not intended to mask repititive software
# bugs in the demons -- it is beyond the scope of HA-NFS to do that.
# We use the following heuristic.  The demons are required to
# have stayed up okay (according to ps) for some amount of time,
# HA_FM_NFS_LOCALRESTART_UPMINS, since the last local restart.
# If they haven't, then we don't try to restart them.
#   Do not decrease HA_FM_NFS_LOCALRESTART_UPMINS below, say, 30 minutes,
# or the system may be unstable.
# 3. The config variable HA_FM_NFS_LOCALRESTART_PIDCHECKSECS controls how
# frequently the code checks that a each demon's pid still exists.
# Since this "inner loop" is implemented in C, it is feasible to check
# fairly often.  Current default value is 10 seconds.
HA_FM_NFS_LOCALRESTART=1
export HA_FM_NFS_LOCALRESTART
HA_FM_NFS_LOCALRESTART_UPMINS=60
export HA_FM_NFS_LOCALRESTART_UPMINS
HA_FM_NFS_LOCALRESTART_PIDCHECKSECS=10
export HA_FM_NFS_LOCALRESTART_PIDCHECKSECS


# Halt vs reboot:
# Config var HA_FM_REBOOT_UPMINS controls whether a server that
# is a victim of a takeover does a halt or reboot, specifically,
# if it has been up for less than HA_FM_REBOOT_UPMINS, the
# victim does a halt, otherwise, he does a reboot.
# This variable should be set big enough that the server is
# required to provide some useful service in order to be
# eligible for a reboot rather than a halt. 
# Caveat: this variable is effective on only one code path
# that does halt or reboot, namely, the cltrans_stop_all
# code path.  It doesn't affect whether the cluster membership
# monitor does a halt or reboot, nor does it affect the scsi disk
# driver failfast probes (which always do a reboot).
#   Do not decrease HA_FM_REBOOT_UPMINS below, say, 60 minutes,
# or the system may be unstable, in particular, in the event of
# a problem in the multi-hosted data, the two servers may
# each repeatedly do takeovers from the other, with each one
# rebooting, finding a problem with its sibling, doing a takeover,
# and then later becoming a victim itself when its sibling does
# the same thing.  Whereas a proper value of HA_FM_REBOOT_UPMINS
# will cause a server to halt rather than reboot, which at
# least breaks the loop even though it does not repair the
# problem with the data.
HA_FM_REBOOT_UPMINS=60
export HA_FM_REBOOT_UPMINS


# Config var HA_RSHTIMEOUTSHORT says how many seconds to wait for
# a trivial rsh command to the brother host, e.g., to run /bin/true
# to see if he's working.  This is only done when cluster membership
# is both.
HA_RSHTIMEOUTSHORT=120
export HA_RSHTIMEOUTSHORT


# Config var HA_FM_LOADPROBE says whether to automatically probe the load
# on the HA server hosts, and to then alert the administrator about overloads.
# A downside of automatically running this probe is that it doesn't allow
# for servers that experience high load at routine planned times (e.g.
# during backup).  A site that gets to many alerts may wish to disable
# HA_FM_LOADPROBE and instead run the program "haload" out
# of crontab at the times the site thinks are worth checking.
# The config var HA_FM_CHECKLOADINTERVALMINS says over what interval
# to measure the load.  It should be long enough so that temporary
# peaks are ignored, but short enough to capture conditions of sustained
# overload.
# The config var HA_FM_CHECKLOADTHRESHHOLD is a percentage utilization
# of the server.  It says what percentage is considered to be too high.
# The exact rule is: if the sum of the utilization of both servers
# is greater than twice this threshhold, a message alerting the
# administrator is generated (in syslog).  
HA_FM_LOADPROBE=1
export HA_FM_LOADPROBE
HA_FM_CHECKLOADINTERVALMINS=30
export HA_FM_CHECKLOADINTERVALMINS
HA_FM_CHECKLOADTHRESHHOLD=90
export HA_FM_CHECKLOADTHRESHHOLD



# Config vars to control whether netstat -i -I statistics generate
# any error log reports.  Briefly, the fault probe compares two netstat -i -I
# outputs, taken at different points in time and checks the 
# number of error packets and the error rate during the interval
# between the two snapshots.
# The config var HA_FM_NETSTAT_ERR_ABS says how many error packets
# must have occurred in order to generate an error report, and
# the config var HA_FM_NETSTAT_ERR_PERCENT says what percentage
# of error packets out of total packets must have occurred.
# Both conditions must be true in order to generate the error report.
# Note that the netstat output is never used as the basis for
# a takeover decision (it is too undependable for that), so setting
# these config variables too low will not cause any real harm, just
# possibly alot of excess error logging.
HA_FM_NETSTAT_ERR_ABS=25
export HA_FM_NETSTAT_ERR_ABS
HA_FM_NETSTAT_ERR_PERCENT=20
export HA_FM_NETSTAT_ERR_PERCENT


# Config var HA_FM_TAKEOVER_ABORT_TIMEOUT is used by the fault probing
# logic when one host is forcibly taking over from another.  Briefly, 
# in doing the forcible takeover, the prober first tries 
# "clustm abort <sibling>" and if the sibling doesn't die after a
# time-out period, the prober then seizes control of the disksets.
# The time-out period is given by this config var.  It should be
# long enough to give "abort" on the victim some opportunity to
# execute last wishes, however, because we are doing this at a
# point where the victim has already been diagnosed as sick, it
# would not be appropriate to give the victim too long.
# This is a compromise value.  It comes from our basic two minute
# rule for each abort method plus some fudge.
# The old name of this variable was HA_FM_TAKEOVER_STOP_TIMEOUT;
# keep it around in case some code uses that name.
HA_FM_TAKEOVER_ABORT_TIMEOUT=260
export HA_FM_TAKEOVER_ABORT_TIMEOUT
HA_FM_TAKEOVER_STOP_TIMEOUT=$HA_FM_TAKEOVER_ABORT_TIMEOUT
export HA_FM_TAKEOVER_STOP_TIMEOUT


# XXX This config var is obsolete; remove it eventually.
# Config var HA_FM_TAKEOVER_STOP_GRACE_SECS is used during a takeover
# that is invoked by fault monitoring probes, in the routine
# fdl_consider_takeover_crit.  This routine first tries to do a 'clustm stop'
# of the sibling node.  If after some number of seconds a reconfiguration
# has not occurred, it assumes that the sibling got hung up doing the
# 'stop' and therefore does a 'clustm abort' of the sibling node.
# The time that the routine waits is equal to the timeout on the
# 'stop' transition (as given in the cmm_transitions file) plus
# a fudge factor to account for the overhead of our scripts.
# The fudge factor is given by this config variable.
HA_FM_TAKEOVER_STOP_GRACE_SECS=30
export HA_FM_TAKEOVER_STOP_GRACE_SECS


# Config var HA_CLOCKSYNC says whether HA software should be doing
# primitive clock synchronization between the two HA hosts.
# 1 means yes, 0 means no.
HA_CLOCKSYNC=1
export HA_CLOCKSYNC


# Config var HA_ARP_RESPONSE says whether we should broadcast an
# ARP response packet to announce that this host is servicing one
# of the virtual (aka logical aka relocatable) ip addresses for
# a diskset.  1 means yes, 0 means no.  Note that we always broadcast
# the ARP request packet (there is no switch for disabling that).
HA_ARP_RESPONSE=1
export HA_ARP_RESPONSE


# Config var HA_JUSTME_BCAST_PING says whether or not to do a broadcast
# on all the public network(s) to test whether this machine's networking
# (hardware and software) is working before allowing it to procede with
# takeover, in step1 of cluster reconfiguration.  Motivation for doing
# the ping is that if this machine cannot use the public network(s) then
# we shouldn't allow it to takeover the disks, and, furthermore, if
# its networking software is wedged such that it cannot use the public
# or private networks, it should really defer to letting sibling its node 
# be the master.  (That is, suppose this nodes networking software isn't
# letting through any packets--the node will look like it is the only
# node in the cluster.)  Please note that we've never actually seen
# the networking software wedge up in that manner.
# Motivation for NOT doing the ping would be that it requires that
# some other machine(s) (a router, a client, another server, or whatever)
# be up and running on some of the public network(s): requiring that
# could complicate booting up HA after a site-wide power-failure, since
# a single HA server won't come up on its own until some other machine is
# also up.  Consider the case of a network of diskless clients.
HA_JUSTME_BCAST_PING=1
export HA_JUSTME_BCAST_PING


# Configuration variable HA_METHOD_DEFAULT_TIMEOUT_SECS declares the
# default method timeout for registered data service methods that did not 
# specify a timeout.
HA_METHOD_DEFAULT_TIMEOUT_SECS=3600
export HA_METHOD_DEFAULT_TIMEOUT_SECS


# Configuration variable HA_ABORT_TIMEOUT_SECS is the imposed timeout
# on the aborting methods, ABORT_NET and ABORT.
HA_ABORT_TIMEOUT_SECS=120
export HA_ABORT_TIMEOUT_SECS


# Configuration variable HA_SCSIRSTD allows turning off the scsirstd daemon
# as an emergency work-around, should it be required.  The default value
# of 1 means to employ the scsirstd, whereas 0 would mean don't use it.
# If this variable is not defined, the code treats that as being
# equivalent to the default value of 1.
HA_SCSIRSTD=1
export HA_SCSIRSTD


# Configuration variable HA_HALTINTERACTIVE says whether in order to
# acheive the effect of a halt of a node, Solstice HA should do an
# interactive reboot.  That could be useful in an multi-hosted scsi
# environment, because the interactive reboot does a scsi bus reset.
# The value 1 means to do the interactive reboot, 0 means to do a vanilla halt.
# If this variable is undefined, the code treats that as being like 0.
HA_HALTINTERACTIVE=0
export HA_HALTINTERACTIVE


# Whether scripts are faked up to monitor jurassic:
HA_FM_FAKEJURASSIC=0
export HA_FM_FAKEJURASSIC


# XXX To Do:
# XXX --  Every timeout or retry count should be configurable in this file
# XXX Many scripts have the timeout hardwired, or default it to whatever
# XXX the programs they use happen to use, eg., mount or nfs soft mounted
# XXX write.  Consider writing looping driver.
# XXX --  For "escalations" use lockedrun to minimize load on host, but be
# XXX careful of recursion and of timeouts.
