Opened 2 years ago

Closed 2 years ago

#10521 closed bug (fixed)

Wrong results in strict Word8 storage on x64

Reported by: VincentBerthoux2 Owned by: rwbarton
Priority: highest Milestone: 7.10.2
Component: Compiler Version: 7.10.1
Keywords: Cc:
Operating System: Unknown/Multiple Architecture: x86_64 (amd64)
Type of failure: Incorrect result at runtime Test Case:
Blocked By: Blocking:
Related Tickets: Differential Rev(s): Phab:D993
Wiki Page:

Description (last modified by VincentBerthoux2)

The following snippet produce two different results in function of the compiler platform used:

import Data.Word( Word8 )

-- removing the bang patterns on V definition makes
-- the problem go away.
data V = V !Word8 !Word8 deriving Show

toV :: Float -> V
toV d = V (truncate $ d * coeff) (fromIntegral $ exponent d + 128) where
  coeff = significand d *  255.9999 / d

main :: IO ()
main =
  print $ map toV [ 3.56158e-2, 0.7415215, 0.5383201, 0.1289829, 0.45520145 ]

On GHC 7.10.1 x86 (under windows and Linux) the output is:

[V 145 124,V 189 128,V 137 128,V 132 126,V 233 127]

On GHC 7.10.1 x64 (under windows and Linux), the (invalid) output is:

[V 0 124,V 0 128,V 0 128,V 0 126,V 0 127]

The bug appear at the following optimisation levels:

  • -O1
  • -O2
  • -O3

the results are the same at -O0

This bug was discovered in a bug report in the library JuicyPixels https://github.com/Twinside/Juicy.Pixels/issues/98.

The same problem has been seen with GHC 7.10.2 RC1

Change History (13)

comment:1 Changed 2 years ago by VincentBerthoux2

Description: modified (diff)
Milestone: 7.10.2
Priority: normalhigh

comment:2 Changed 2 years ago by VincentBerthoux2

Milestone: 7.10.2
Priority: highnormal

Reverting previous change, to act accordingly to the FAQ

Last edited 2 years ago by VincentBerthoux2 (previous) (diff)

comment:3 Changed 2 years ago by rwbarton

Milestone: 7.10.2
Priority: normalhighest

Regression relative to 7.8, investigating...

comment:4 Changed 2 years ago by simonpj

I suggest generating removing just one bang (which apparently is enough to provoke the difference) and compare output of -ddump-simpl

comment:5 Changed 2 years ago by rwbarton

I reduced the test case further to

import Data.Word( Word8 )

toV :: Float -> Word8
toV d = let coeff = significand d *  255.9999 / d
            a = truncate $ d * coeff
            b = exponent d
        in a `seq` (b `seq` a)  -- just   b `seq` a   is not enough to reproduce the bug

main :: IO ()
main =
  print $ map toV [ 3.56158e-2, 0.7415215, 0.5383201, 0.1289829, 0.45520145 ]

It's looking like an issue in the backend where it doesn't understand that F1 and D1 are the same physical register (%xmm1) on x86_64. Possibly an old issue that was uncovered by 7.10's new implementation of significand :: Float -> Float going through Double.

Specifically for the (correct) Core

Main.main14 =
  \ (d_aC5 [OS=ProbOneShot] :: Float) ->
    case d_aC5 of _ [Occ=Dead] { GHC.Types.F# x_a22N ->
    case GHC.Prim.decodeFloat_Int# x_a22N
    of _ [Occ=Dead] { (# ipv_a1Gg, ipv1_a1Gh #) ->
    case integer-gmp-1.0.0.0:GHC.Integer.Type.encodeDoubleInteger
           (integer-gmp-1.0.0.0:GHC.Integer.Type.smallInteger ipv_a1Gg) (-24)
    of wild1_a1Gj { __DEFAULT ->
    case GHC.Float.$w$cexponent1 x_a22N of _ [Occ=Dead] { __DEFAULT ->
    case GHC.Prim.divideFloat#
           (GHC.Prim.timesFloat#
              (GHC.Prim.double2Float# wild1_a1Gj) (__float 255.9999))
           x_a22N
    of wild2_a234 { __DEFAULT ->
    GHC.Word.W8#
      (GHC.Prim.narrow8Word#
         (GHC.Prim.int2Word#
            (GHC.Prim.float2Int# (GHC.Prim.timesFloat# x_a22N wild2_a234))))
    }
    }
    }
    }
    }

we generate Cmm like

...
       c3MY:
           I64[Sp] = c3N2;
           R3 = (-24);
           R2 = R1;
           call GHC.Integer.Type.encodeDoubleInteger_info(R3,
                                                          R2) returns to c3N2, args: 8, res: 8, upd: 8;
       c3N2:
           I64[Sp - 8] = c3N6;
           F1 = F32[Sp + 8];
           F64[Sp] = D1;
           Sp = Sp - 8;
           call GHC.Float.$w$cexponent1_info(F1) returns to c3N6, args: 8, res: 8, upd: 8;
...

The assignment to F1 is to set up the arguments for $w$cexponent1 from x_a22N which has been saved on the stack, and the read from D1 is to save the return value wild1_a1Gj from encodeDoubleInteger. Technically the program became incorrect in the "Sink assignments" pass when the read from D1 was moved past the store to F1, but maybe it just doesn't make sense to cater to the possibility of multiple STG global registers being aliases of one another.

comment:6 Changed 2 years ago by rwbarton

The commit that merged F1 and D1 was e2f6bbd3a27685bc667655fdb093734cb565b4cf which is in 7.8 but not 7.6. Here is a reproducer for 7.8 too:

{-# LANGUAGE MagicHash #-}

import GHC.Exts

f :: Float# -> Float#
f x = x
{-# NOINLINE f #-}

g :: Double# -> Double#
g x = x
{-# NOINLINE g #-}

h :: Float -> Float
h (F# x) = let a = F# (f x)
               b = D# (g (2.0##))
           in a `seq` (b `seq` a)

main = print (h 1.0)   -- with ghc -O, prints 0.0

Not sure yet whether it is better to revert (parts of) that commit or to try to account for STG global registers overlapping in the cmmSink pass and wherever else it might be necessary.

comment:7 Changed 2 years ago by simonpj

Reid you are a marvel, thank you.

My instinct is to "account for registers" overlapping. After all, if Cmm optimisations don't know that two registers are the same, all manner of bad things can happen, perhaps not limited to CmmSink. Is this just the Eq instance for CmmReg?

comment:8 Changed 2 years ago by rwbarton

Owner: set to rwbarton

In fact I did try just changing GlobalReg's Eq instance (and Ord instance) to treat FloatReg i and DoubleReg i as the same, and that did fix this issue. Doing this properly will require a bit more refactoring than that, since whether these registers overlap depends on the DynFlags, but it looks like it will be nothing too serious.

comment:9 Changed 2 years ago by rwbarton

Differential Rev(s): Phab:D993

comment:10 Changed 2 years ago by thoughtpolice

Status: newpatch

comment:11 Changed 2 years ago by Reid Barton <rwbarton@…>

In a2f828a370b220839ad9b31a274c0198ef91b7fe/ghc:

Be aware of overlapping global STG registers in CmmSink (#10521)

Summary:
On x86_64, commit e2f6bbd3a27685bc667655fdb093734cb565b4cf assigned
the STG registers F1 and D1 the same hardware register (xmm1), and
the same for the registers F2 and D2, etc. When mixing calls to
functions involving Float#s and Double#s, this can cause wrong Cmm
optimizations that assume the F1 and D1 registers are independent.

Reviewers: simonpj, austin

Reviewed By: austin

Subscribers: simonpj, thomie, bgamari

Differential Revision: https://phabricator.haskell.org/D993

GHC Trac Issues: #10521

comment:12 Changed 2 years ago by rwbarton

Status: patchmerge

comment:13 Changed 2 years ago by thoughtpolice

Resolution: fixed
Status: mergeclosed

Merged to ghc-7.10.

Note: See TracTickets for help on using tickets.