Article ID: 125056
Article Last Modified on 2/24/2005
x = 1.100000000000000 y = 1.100000023841858The result of multiplying a single precision value by an accurate double precision value is nearly as bad as multiplying two single precision values. Both calculations have thousands of times as much error as multiplying two double precision values.
true = 1.320000000000000 (multiplying 2 double precision values) y = 1.320000052452087 (multiplying a double and a single) z = 1.320000081062318 (multiplying 2 single precision values)
C Compile options: none
real*8 x,y,z
x = 1.1D0
y = 1.1
print *, 'x =',x, 'y =', y
y = 1.2 * x
z = 1.2 * 1.1
print *, x, y, z
end
Root = -1.1500000000Instead, it generates the following error:
C Compile options: none
real*8 a,b,c,x,y
a=1.0D0
b=2.3D0
c=1.322D0
x = b**2
y = 4*a*c
print *,x,y,x-y
print "(' Root =',F16.10)",(-b+dsqrt(x-y))/(2*a)
end
C Compile options: none
real*8 y
y=27.1024D0
x=27.1024
z=y
if (x.ne.z) then
print *,'X does not equal Z'
end if
if (x.eq.z) then
print *,'X equals Z'
end if
end
x = 1.00000000000000000 (one bit more than 1.0) y = 1.00000000000000000 (exactly 1.0) x-y = .00000000000000022 (smallest possible difference)Some versions of FORTRAN round the numbers when displaying them so that the inherent numerical imprecision is not so obvious. This is why x and y look the same when displayed.
x = 10.00000000000000000 (one bit more than 10.0) y = 10.00000000000000000 (exactly 10.0) x-y = .00000000000000178The binary representation of these numbers is also displayed to show that they do differ by only one bit.
x = 4024000000000001 Hex y = 4024000000000000 HexThe last part of sample code 4 shows that simple nonrepeating decimal values often can be represented in binary only by a repeating fraction. In this case x=1.05, which requires a repeating factor CCCCCCCC....(Hex) in the mantissa. In FORTRAN, the last digit "C" is rounded up to "D" in order to maintain the highest possible accuracy:
x = 3FF0CCCCCCCCCCCD (Hex representation of 1.05D0)Even after rounding, the result is not perfectly accurate. There is some error after the least significant digit, which we can see by removing the first digit.
x-1 = .05000000000000004
C Compile options: none
IMPLICIT real*8 (A-Z)
integer*4 i(2)
real*8 x,y
equivalence (i(1),x)
x=1.
y=x
i(1)=i(1)+1
print "(1x,'x =',F20.17,' y=',f20.17)", x,y
print "(1x,'x-y=',F20.17)", x-y
print *
x=10.
y=x
i(1)=i(1)+1
print "(1x,'x =',F20.17,' y=',f20.17)", x,y
print "(1x,'x-y=',F20.17)", x-y
print *
print "(1x,'x =',Z16,' Hex y=',Z16,' Hex')", x,y
print *
x=1.05D0
print "(1x,'x =',F20.17)", x
print "(1x,'x =',Z16,' Hex')", x
x=x-1
print "(1x,'x-1=',F20.17)", x
print *
end
/* Compile options needed: none
*/
#include <stdio.h>
void main()
{
float floatvar;
double doublevar;
/* Print double constant. */
printf("89.95 = %f\n", 89.95); // 89.95 = 89.950000
/* Printf float constant */
printf("89.95 = %f\n", 89.95F); // 89.95 = 89.949997
/*** Use double constant. ***/
floatvar = 89.95;
doublevar = 89.95;
printf("89.95 = %f\n", floatvar); // 89.95 = 89.949997
printf("89.95 = %lf\n", doublevar); // 89.95 = 89.950000
/*** Use float constant. ***/
floatvar = 89.95f;
doublevar = 89.95f;
printf("89.95 = %f\n", floatvar); // 89.95 = 89.949997
printf("89.95 = %lf\n", doublevar); // 89.95 = 89.949997
}
Keywords: kbinfo kblangfortran kblangc kbcode KB125056