
NTFLOP version 1.01           October 1996

written by
Sharon L. Smith
(sharon@pa.dec.com)

This tool is intended to aid in the identification of floating point
exceptions in programs developed in Visual C/C++ for Windows NT based
Alpha systems.

Changes since 1.0
----------------- 
Version 1.01 fixes the problem of the debugger not attaching in time 
to catch floating point exceptions in some console applications.  In addition
two new options are allowed: attach to a running process (-A) and create
a separate console (-C).  See below for a description.

Outline:
--------
Programs with significant amounts of floating point computation suffer
considerable performance degradation when compiled and run with the
/QAieee switch on Alpha systems.   In certain cases, the /QAieee switch
can be turned off and the programs run faster if the possibility for
floating point exceptions is eliminated.  This tool provides a means
for identifying areas in a program where floating point exceptions occur.

The tool works as follows.  NTFLOP attaches to your program as a
debugger process.  Floating point exceptions which would normally
cause your program to halt are intercepted by NTFLOP. NTFLOP notes
the type of exception and where it occurred in your program and
then attempts to continue execution.  In particular, it tries
to "repair" the floating point problem in accordance with the
Visual C/C++ interpretation of the IEEE standard and then continue
execution at the next statement in your program.

Requirements:
-------------
In order for NTFLOP to work, it is advised that all of the modules in your
program be compiled using the /QApe switch.  This switch inserts
trap barriers after every floating point instruction and thereby allows
precise floating point exceptions to occur. These exceptions can then
be intercepted by the debugger process.   If your program is compiled
without floating point switches, it is not guaranteed that NTFLOP can
repair floating point problems or that your program will run as expected.
If your program is compiled with the /QAieee switch, then floating point
exceptions are never exposed to the debugger process, so NTFLOP cannot
identify floating point problems that may exist.

NTFLOP tries to repair the following types of floating exceptions on
any instruction that can generate a floating point exception:

EXCEPTION_FLT_DIVIDE_BY_ZERO    - floating point divide by zero.
EXCEPTION_FLT_INVALID_OPERATION - floating point invalid operand.
EXCEPTION_FLT_OVERFLOW          - floating point overflow.

In order to get information about the image that generated a floating
point exception and the line number of the exception in the source
code, it is necessary to generate COFF format symbol information.
The program and its images must therefore be compiled with the /Zi 
option and linked with /debug and /debugtype:coff (or /debugtype:both).
(Note that if you do not compile with symbol information NTPROF,
will report the memory address of where a floating point exception 
took place.)

Limitations:
------------
This tool cannot be used to identify floating point underflow exceptions.
The default behavior using the /QApe is to flush underflow results to
zero and not raise an exception.   In a similar manner, floating
point exceptions for inexact results are not raised under the /QApe,
so this behavior cannot be intercepted or recorded using NTFLOP.

Denormal operands are intercepted by NTFLOP if EXCEPTION_FLT_INVALID_OPERATION
is raised and NTFLOP can ascertain that one of the operands was a denormal.
The denormal is treated as a zero for the purposes of repairing the
instruction that caused the floating point exception and continuing
execution.

Integer overflow and underflow exceptions are not raised by the /QApe,
so this behavior is not handled by NTFLOP.  In addition, the default
behavior of both the /QAieee and the /QApe is to report integer division
by zero as an exception.   In accordance with the /QAieee switch, no attempt 
is made to repair this type of exception, so the existence of integer 
divide-by-zeros will cause a user's program to abort.

Usage:
------
ntflop [options] -P "program-name"

[Options]

     -W:xxx           Number of milliseconds to wait until debug process has starts.
     -D:x             Turns on debuging messages (0-9).
     -C               Creates a separate console window for debug process.
     -A:xxx           Attaches to a running process with pid xxx (decimal)
     -t:xxx           Number of milliseconds to wait for a DebugEvent.
     -v               The version of the program.
     -h               This message.


The -W option is useful for applications that perform initializations
and loading of DLLs before beginning execution or waiting for user
input.  In this mode, the debugger process spawns "program-name"
and attaches to the spawned process only after the specified time
interval has elapsed or the spawned process has indicated that it
is ready to receive input (as in the case of GUIs).

The -C option specifies that a console process is to be created with a separate
console.  If this option is omitted, all output of the process associated with
"program-name" goes to the console window where ntflop is initiated. 

The -A option allows for the user to attach to an existing process with decimal 
id xxx.  Note that the -P "program-name" must still be specified. This command 
will fail if the process id is not correct or if the specified process has not 
already been started. 

Examples:
---------
ntflop -P "myFPprogram"

myFPprogram is started and output from ntflop appears on the screen.



ntflop -P "myFPprogram" -A:128

This command will cause ntflop to attach to the process "myFPprogram". The
process id is 128 (80 hexidecimal).  The process id can be obtained using
pstat, which is available with the WIN32 SDK.


ntflop -W:10000 -P "guiFP" > ntflop.out

This will start the program "guiFP" and waits until guiFP is ready to receive
input (or 10 seconds have elapsed). After the user exits guiFP, the output
from ntflop is redirected to ntflop.out.


Sample program and NTFLOP output:
---------------------------------
The following program shows how the NTFLOP tool works.

//---------------------------------------------------------------------
//
// fp_example.c: Illustrates the use of NTFLOP.
//
//---------------------------------------------------------------------
#include <windows.h>
#include <stdio.h>
#include <float.h>
#include <math.h>

void main(argc,argv)
DWORD	argc;
LPTSTR	argv[];
{
	float a, aneg, b, bneg, c1, c2, c3, c4, d1, d2, d3, d4;
	char x;

   // This program illustrates the following floating point exceptions:
   // 1. Overflow.
   // 2. Divide by zero
   // 3. Invalid operands
   
   // ------------- Single precision overflow -------------------------
   a = 3.2e38F;
   b = 1.0e38F;
   aneg = -a;
   bneg = -b;
   c1 = a + b;
   c2 = aneg + bneg;
   c3 = aneg - b;
   c4 = a - bneg; 
   printf ("a = %g; b = %g \n", a,b);
   printf("Addition overflow: \n a + b    = %.6f \n -a + -b  = %.6f \n\n", 
   		c1, c2);
   printf("Subtraction overflow: \n -a - b   = %.6f \n a - (-b) = %.6f \n\n", 
   		c3, c4);

   // ------------ Single Precision divide-by-zero cases --------------
   a = 10.0F;
   b = 2.0F;
   aneg = -a;
   c1 =  a/(b-b);
   c2 =  aneg/(b-b);
   printf ("a = %g; b = %g \n", a,b);
   printf ("Divide by zero: \n a/(b - b)    = %.6f \n (-a)/(b - b) = %.6f \n\n", 
   	c1, c2);

   // ------------ Single Precision invalid op cases -------------- 
   // both operands infinity:
   a = c1;
   aneg = c2;
   b = c1;
   bneg = aneg;
   d1 = a + b;
   d2 = aneg + b;
   d3 = a + bneg;
   d4 = aneg + bneg;
   printf ("a = %g; b = %g \n", a,b);
   printf("addition 2 invalid ops: \n a + b   = %.6f \n -a + b  = %.6f \n",
   	d1, d2);
   printf(" a +(-b) = %.6f \n -a + -b = %.6f \n", d3,d4);
}


fp_example output:
==================
a = 3.2e+038; b = 1e+038
Addition overflow:
 a + b    = 1.#INF00
 -a + -b  = -1.#INF00

Subtraction overflow:
 -a - b   = -1.#INF00
 a - (-b) = 1.#INF00

a = 10; b = 2
Divide by zero:
 a/(b - b)    = 1.#INF00
 (-a)/(b - b) = -1.#INF00

a = 1.#INF; b = 1.#INF
addition 2 invalid ops:
 a + b   = 1.#INF00
 -a + b  = -1.#IND00
 a +(-b) = -1.#IND00
 -a + -b = -1.#INF00

NTFLOP output:
==============
cProgram = fp_example.exe
The following floating point exceptions were detected :
----------------------------------------------------------------------------
Exception Type      Memory Address   Frequency             Image      Source
----------------------------------------------------------------------------
INVALID OPERATION    0x004021D0       00000001      fp_example.exe    main: 52
INVALID OPERATION    0x004021BC       00000001      fp_example.exe    main: 51
INVALID OPERATION    0x004021A8       00000001      fp_example.exe    main: 50
INVALID OPERATION    0x00402194       00000001      fp_example.exe    main: 49
FLT DIVIDE_BY_ZERO   0x00402138       00000001      fp_example.exe    main: 39
FLT DIVIDE_BY_ZERO   0x00402118       00000001      fp_example.exe    main: 38
FLT OVERFLOW         0x00402098       00000001      fp_example.exe    main: 29
FLT OVERFLOW         0x00402084       00000001      fp_example.exe    main: 28
FLT OVERFLOW         0x00402070       00000001      fp_example.exe    main: 27
FLT OVERFLOW         0x0040205C       00000001      fp_example.exe    main: 26


Known Problems/Bugs:
--------------------
In GUI programs the following error may occur:

	Unexpected Failure in DebugActiveProcess
        An unexpected failure occured while processing a DebugActiveProcess 
        API request. You may choose OK to terminate the process or Cancel 
        to ignore the error.

Usually choosing "cancel" allows NTFLOP to proceed without problems.  To 
eliminate this message in GUI programs, try adding the -W option to allow 
the program to load and signal that it is ready for input before the 
debugger process attaches to it. Typically this error only occurs if the 
program is not loaded when the debugger is asked to attach to the process.
  
The performance of the tool can be significantly slowed in the presence of 
multiple floating point exceptions (on the order of hundreds, as might 
occur in a tight loop). For GUI applications, these effects are noticeable.   
In such situations, scripted test suites might be best used for uncovering 
floating point problems.

Disclaimer:
-----------
This tool may not catch all of the floating point problems that exist in 
a program, and, given a set of different inputs, may catch different 
problems.  In addition, there are certain numerical codes which, for
reasons of numerical stability, should always be run with /QAieee 
and possibly additional settings to enable gradual underflow. It is 
therefore cautioned that extreme care be exercised in turning 
off the /QAieee.  For further information about possible problems 
of running software that is not IEEE compliant, see the following:

[1]  W. Kahan, Lecture notes on the Status of IEEE Standard 754 for Binary 
Floating-Point arithmetic, 
http://www.cs.berkeley.edu/~wkahan/ieee754status/ieee754.ps

[2]  X. Li and J. Demmel, Faster Numerical Algorithms via Exception Handling. IEEE 
Transactions on Computers, vol.43, no.8, August 1994, 983-992. 

Please report all bugs, problems, suggestions or comments to 
sharon@pa.dec.com
