		Parallel Programming with CDL
		=============================

  This disk contains a number of programs which are designed to execute
as parallel task forces under Helios. Some of the programs have been listed
in Helios documentation, the Parallel Programming Tutorial and the
CDL guide, while others are new. It is assumed that the reader has copies
of these manuals, as the inner workings of the programs are not described
here.

PI
==
   This subdirectory contains various versions of the pi program described
in the CDL guide. The programs may not be exactly the same as those listed
in the guide, mainly because of minor changes needed to make them look
pretty in an A5 manual. All programs are supplied in source form with
suitable makefiles. The makefiles may have to be changed depending on
the processor network, they contain versions for either T800 networks or
T414 networks, and you will have to comment out one or the other.

   pi_ring contains the first form of the pi example, using a controller
that does no calculation of its own and a task force topology in the form
of a ring. There are two programs, control and worker, and a CDL script
pi.cdl. This script can be compiled or executed as a shell script. It
takes a single argument, the number of workers.

   pi_farm contains the equivalent task force but using a farm topology
without a load balancing component. The CDL script is slightly more
complicated, it takes a compile-time argument for the number of workers
and a run-time argument for the number of intervals per workers. The
makefile will compile the CDL script to produce a task force with
5 workers, and you may want to change this number.

pi_fort contains a Fortran version of the ring topology using the Meiko
Fortran compiler, and pi_pasc contains a Pascal version using Prospero
Pascal. Please note that Prospero Pascal only supports the T800, so there
is no T414 option in the makefile.

  pi_mix contains a CDL script which runs a mixed-language task force. It
takes the C version of the control, a number of C workers, a number of
Fortran workers, and a number of Pascal workers.

  pi_fast is a modified version of the ring topology, written in C. First,
the controller does the same amount of arithmetic as every worker which
avoids certain mapping problems. On a network with n Helios processors the
CDL script should be invoked with (n-1) workers. Second, AccelerateCode()
and Accelerate() are used to move the evaluation routine's code and the
stack into fast on-chip memory. This gives approximately a 70 percent
speed-up, but its use is limited by the amount of on-chip memory that is
available. Apart from the X-windows Server no standard Helios program makes
use of fast on-chip memory because such a valuable resource is best left
to the user's application. Note that the eval() routine is moved to a
separate module, because in the 1.1 release of Helios AccelerateCode() needs
to copy an entire module into fast memory, and hence the module should be
as small as possible.


LB
==
  This directory contains the source of the load balancer program
as described in the CDL guide.

Factor
======
      The Factor subdirectory contains the program listed in the CDL guide,
for determining the number of factors within a range of very large numbers.
