<!-- ""@(#)haapi 1.3 97/10/16 Sun Microsystems Inc"" >
 
<title>HA Data Services API</title>
<h1>HA Data Services API</h1>

Sun Cluster   2.0 HA Data  Services   API makes use   of command  line
utilities and   C-callable library  to register  arbitrary application
programs   with   the  cluster     frameowrk, thereby rendering    the
applications highly available. Sun provides several HA applications as
part     of  the  Sun   Cluster    2.0   product.   These include   <a
href="sc/hanfs">NFS</a>,<a href="sc/haipro">Netscape Web services</a>,
and   <a  href="sc/hadbms">standard    Oracle, Sybase,  and   Informix
RDBMS</a>.<p>

A  data  service  is  registered   with the  cluster   framework using
hareg(1m). Registration is  persistent  in  that it  survives   across
takeovers,  switchovers, and  reboots.  Registration  with the cluster
framework is   usually  done  as the    last step of   installing  and
configuring a  data service. Registration is a  one-time event. A data
service can also be unregistered using hareg(1m). <p>

In addition to   the distinction between   registered and unregistered
states, Sun Cluster  2.0 also has the  concept of a data service being
in the "on" or "off" state. The purpose of the "on"  or "off" state is
to provide the system  administrator with a mechanism  for temporarily
shutting  down a data service without  having to take the more drastic
step of unregistering it. <p>

When a  data service  first registers  with the  cluster framework, it
registers a   set   of callback   programs, or methods.   The  cluster
framework makes callbacks to  the data service's methods when  certain
key events in the cluster occur.<p>

After the  failure  of a  host,  the cluster framework  takes  care of
moving the <a  href="sc/loghost">logical hosts</a>(both disk groups and
logical network IP addresses) mastered  on the faulty physical host to
the surviving physical  hosts of the cluster. At  this point, the data
service's software   must be restarted on  the  new masters of  the <a
href="sc/loghost">logical  hosts</a>.  The  cluster framework    itself
cannot restart a  data service. Instead, it  makes a call to the  data
service requesting  it  to restart itself. This  call  is to  the data
service's start method.<p>

The     haswitch(1m)  command    smoothly       shuts  down   a     <a
href="sc/loghost">logical host</a>     on  one  physical     server  in
preparation  for moving the   <a href="sc/loghost">logical host</a>  to
another physical server.  For the cluster framework to coordinate this
shut down  work  with layered  data services,  each  data service also
registers a  stop  method.  The   cluster   framework calls the   data
service's stop method  during haswitch(1m) operations, and  whenever a
physical host is    removed  from the cluster   membership   using the
scadmin(1m) stopnode command. The stop method  performs a smooth, safe
shutdown of the data service.  This occurs without waiting for clients
on the network to completely finish their work,  because waiting for a
client could introduce an unbounded delay.<p>

Whenver a physical server is forced to leave the cluster membership by
aborting itself, the cluster framework gives each data service on that
physical host an opportunity to execute abort programs before the node
itself is  aborted. The cluster  framework then calls the abort method
of each data service. A data service that  does not need this clean-up
opportunity can choose not to register an abort method.<p>

The HA Data Services API also  allows fault monitoring agents specific
to    the   data  service to    be      registered with the    cluster
framework. Methods to  start and stop the  fault probes are registered
as callback programs,  and these programs  are invoked at  appropriate
points during cluster configuration. Each of the Sun supported HA data
services has its own data service specific fault monitor.<p>

Writing  a  data service  specific  fault   monitor  is, in   general,
diffficult.   It requires  advanced   knowledge  of  the client-server
protocol  used between the  data service and  its clients, familiarity
with  the error codes  returned by  the data  service, and familiarity
with  the  conditions that can   cause the  data   service to take  an
unusually long time  to process a query or  an update, even though the
data service is not permanently hung.<p>

Writing  a data service  fault monitor  is  optional, and even  if the
implementor of the  data service chooses  not  to write one,  the data
service  will still benefit from the  basic monitoring provided by the
Sun  Cluster 2.0 framework.   The basic monitoring facilities  include
detection  of failures of the  host's hardware, failures of the host's
operating system, and failure of a  host to be  able to communicate on
its public network.<p>

