Opened 3 years ago

Closed 3 years ago

#9564 closed bug (invalid)

Floating point subnormals overrounded on output

Reported by: jrp Owned by:
Priority: normal Milestone:
Component: Runtime System Version: 7.8.3
Keywords: Cc: simonmar
Operating System: MacOS X Architecture: x86_64 (amd64)
Type of failure: Incorrect result at runtime Test Case:
Blocked By: Blocking:
Related Tickets: Differential Rev(s):
Wiki Page:

Description

The following test suggests that GHC is overrounding Float subnormals, when printing them out

Prelude> 2**(-149)
1.401298464324817e-45
Prelude> 2**(-149) :: Float
1.0e-45
Prelude> isDenormalized it
True
Prelude>
Prelude> 2**(-149) * 2**(100) * 2**(49) :: Float
1.0

By comparison

#include <stdio.h>
#include <stdlib.h>

int main (void)
{
 float clang_dec, clang_hex, libc_dec, libc_hex;

 clang_dec = 1.401298464324817e-45;
 clang_hex = 0x0.000002P-126;
 printf("clang (from decimal)   = %a\n",clang_dec);
 printf("clang (from decimal)   = %g\n",clang_dec);
 printf("clang (from hex)       = %a\n",clang_hex);
 printf("clang (from hex)       = %g\n",clang_hex);

 libc_dec = strtod("1.401298464324817e-45",NULL);
 libc_hex = strtod("0x0.000002P-126",NULL);
 printf("libc (from decimal) = %a\n",libc_dec);
 printf("libc (from decimal) = %g\n",libc_dec);
 printf("libc (from hex)     = %a\n",libc_hex);
 printf("libc (from hex)     = %g\n",libc_hex);
}

produces

clang (from decimal)   = 0x1p-149
clang (from decimal)   = 1.4013e-45
clang (from hex)       = 0x1p-149
clang (from hex)       = 1.4013e-45
libc (from decimal) = 0x1p-149
libc (from decimal) = 1.4013e-45
libc (from hex)     = 0x1p-149
libc (from hex)     = 1.4013e-45

Change History (8)

comment:1 Changed 3 years ago by rwbarton

Overrounding in what sense? Do we have a spec for the Show Float, Show Double instances? (Honest question.)

I thought that we tried to show the fewest number of significant figures that would round unambiguously to the true value of the floating-point number (which is 1.40129846432481707092372958328991613128026194187651577175706828388979108268586060148663818836212158203125e-45 in this case). In that case 1.0e-45 would be correct since 1.401...e-45 is the smallest positive Float and 1.0e-45 is more than half of it. However, I don't recall where I got that impression.

comment:2 Changed 3 years ago by jrp

Overrounding in the sense that I would expect something like 1.4013e-45 rather than 1.0e-45.

I have not delved into the specs for Show Float / Double.

comment:3 Changed 3 years ago by carter

the standard defines the read/show for Float/Double as

instance  Show Float  where  
    showsPrec p         = showFloat
instance  Read Float  where  
    readsPrec p         = readSigned readFloat
instance  Show Double  where  
    showsPrec p         = showFloat
instance  Read Double  where  
    readsPrec p         = readSigned readFloat

at the bottom of https://www.haskell.org/onlinereport/haskell2010/haskellch9.html#x16-1710009

then per https://www.haskell.org/onlinereport/haskell2010/haskellch38.html#x46-31400038

showFloat :: RealFloat a => a -> ShowS
Show a signed RealFloat value to full precision using standard decimal notation for arguments 
whose absolute value lies between 0.1 and 9,999,999, and scientific notation otherwise.

then looking at the stuff in 4.7.0.1 base http://hackage.haskell.org/package/base-4.7.0.1/docs/src/GHC-Float.html#showFloat

-- | Show a signed 'RealFloat' value to full precision
-- using standard decimal notation for arguments whose absolute value lies
-- between @0.1@ and @9,999,999@, and scientific notation otherwise.
showFloat :: (RealFloat a) => a -> ShowS
showFloat x  =  showString (formatRealFloat FFGeneric Nothing x)

I'll try to dig into this a teeny bit more, but I think thank as long as (read . show) :: Float -> Float acts as the identity function on all floating point values when we roundtrip them. (and as long as they get correctly parsed to that same internal value when read/showed between haskell and another language)

phrased differently, if we dont have roughly that read_hs . show_clang == read_clang . show_hs == read_clang . show_clang == read_hs . show_hs == id

then yes we have a problem, but we dont quite show that problem with theses tests as above, right? We just demonstrate that the particular choice in default representation for show differs from C, right? Its important to remember that parsing a floating point number itself will do rounding to the nearest floating point value too.

i'm trying to focus on some work work this week, but if someone could test that these roundtripining identities work out ok, that'd be awesome (I may try to do it soon myself, but who knows how long that will take to happen)

comment:4 Changed 3 years ago by jrp

Are we not at risk of overcomplicating this?

Raskell on my iPad (hugs, I think) shows

2**(-149)
1.40129846432483e-45


2**(-149)::Float
1.401298e-45

which is more what I would expect.

I don't know whether GHC generates floating point representations natively or whether it uses a C library, but what I would want would be compatibility with many other language implementations such as C, java, PHP, Python, etc, and use the library that has a pedigree that is nearly as long as Haskell's -- David Gay's http://www.netlib.org/fp/dtoa.c for floating point conversions. As well as greater chance of interoperability, this would have the additional side-benefit of providing hex floating point literals without resorting to Template Haskell when you ned to be able to i/o exact floating point representations.

Last edited 3 years ago by jrp (previous) (diff)

comment:5 Changed 3 years ago by rwbarton

These extra digits past the decimal point suggest a degree of precision that doesn't exist in a Float: for instance 1.4013e-45 is the same Float as 1.4014e-45. In fact all the numbers 1.0e-45 through 2.0e-45 (and a bit farther in each direction) round to the same Float, so why not output the simplest one, 1.0e-45? This is the same logic by which we display the Float 1/10 as 0.1 and not, say, 0.10000000149 even though the latter is closer to the true value.

printf's %g is not a format you can round-trip through: by default it always prints 6 digits (in this case the last digit would be 0 and is dropped). So it's not a good analogue of show. (Also, %g is printing a double, not a float, but it has the same behavior with 2-1074, the smallest positive double.)

Python has similar behavior to GHC: it doesn't have single-precision floating point AFAIK, but for doubles, it displays 2-1074 as 5e-324, just like GHC displays 5.0e-324. Arguably we could drop the ".0" part, but I don't think that's worth changing.

comment:6 Changed 3 years ago by simonmar

Owner: simonmar deleted

comment:7 Changed 3 years ago by jrp

For the record, looking through the code, libraries/base/GHC/Float.lhs seems to use FloatToDigits to generate the decimal representation. The algorithm used is "based on"

"Printing Floating-Point Numbers Quickly and Accurately"-- by R.G. Burger and R.K. Dybvig in PLDI 96. http://www.cs.indiana.edu/~dyb/pubs/FP-Printing-PLDI96.pdf This version uses a much slower logarithm estimator. It should be improved.

In turn, that algorithm was based on "How to Print Floating-Point Numbers Accurately" by Steele and White http://kurtstephens.com/files/p372-steele.pdf but it does not distinguish between significand and insignifican trailing zeros.

The latest approach to conversion appears to be "Printing Floating-Point Numbers Quickly and Accurately with Integers" by Florian Loitsch http://www.cs.tufts.edu/~nr/cs257/archive/florian-loitsch/printf.pdf (Bryan O'Sullivan http://www.serpentine.com/blog/2011/06/29/here-be-dragons-advances-in-problems-you-didnt-even-know-you-had/ has provided a library for the Double version http://hackage.haskell.org/package/double-conversion.)

For a full set of references, including to source code, see http://www.ryanjuckett.com/programming/printing-floating-point-numbers/

I'll have a further think about this, including rwbarton's comments.

comment:8 Changed 3 years ago by rwbarton

Resolution: invalid
Status: newclosed

Revisiting this ticket, I still think that GHC's output 1.0e-45 is correct, so closing.

Note: See TracTickets for help on using tickets.