Opened 5 years ago

Last modified 4 days ago

#4012 patch bug

Compilation results are not deterministic

Reported by: kili Owned by: niteria
Priority: high Milestone: 7.12.1
Component: Compiler Version: 6.12.2
Keywords: Cc: lelf, mail@…, the.dead.shall.rise@…, id@…, shacka@…, mail@…, pumpkingod@…, fuuzetsu@…, tkn.akio@…, juhpetersen, cheecheeo@…, bill@…
Operating System: Unknown/Multiple Architecture: Unknown/Multiple
Type of failure: Other Test Case:
Blocked By: Blocking:
Related Tickets: Differential Revisions: Phab:D910, Phab:D1073, Phab:D1133, Phab:D1192

Description (last modified by simonpj)

There are some issues with non-determinism in the output of GHC, which means that compilations are not repeatable. This affects some users (e.g. Debian packagers) who need to be able to get repeatable hashes for the packages of a GHC build.

The cases we know about that lead to non-deterministic results are:

  • The spec_ids (specialised Ids) attached to an Id have a non-deterministic ordering
  • CSE can give different results depending on the order in which the bindings are considered, and since the ordering is non-deterministic, the result of CSE is also non-deterministic. e.g. in x = z; y = z; z = 3, where y and x are exported, we can end up with either x = y; y = 3 or y = x; x = 3.
  • There seems to be something unpredictable about the order of arguments to SpecConstr-generated specialisations, see http://www.haskell.org/pipermail/glasgow-haskell-users/2011-April/020287.html
  • The wrappers generated by the CApiFFI extension have non-deterministic names. (see comment:15 below).
  • See comment:76 for another, different, example

Old ticket description follows

Short story: if you use ghc-6.12.1.20100318 (or similar, probably ghc-6.12.1 release will produce the same results) to bootstrap ghc-6.12, and then use that ghc-6.12 to bootstrap another ghc-6.12, those two instances of ghc-6.12 will have different ABI hashes and interfaces in the ghc package. If you use ghc-6.10 for the bootstrapping, you'll even get differences in the ghc, base and Cabal packages.

Long story: see logfiles and descriptions at http://darcs.volkswurst.de/boot-tests/ (note that the logfiles are quite large, I really don't want to attach 150 MB of logs to this ticket).

Attachments (13)

zap-cbooterversion_-in-an-attempt-to-fix-_4012.dpatch (3.6 KB) - added by kili 5 years ago.
warp-abi-diff (2.8 KB) - added by nomeata 3 years ago.
Changed ABI only due to alpha renaming
warp-no-exposed-change.patch (3.0 KB) - added by nomeata 3 years ago.
This change changes the ABI of warp-1.2.1.1 unexpectedy
Warp-before.hi (20.5 KB) - added by nomeata 3 years ago.
Interface file before the patch
Warp-after.hi (20.5 KB) - added by nomeata 3 years ago.
Interface file after the patch
Warp.hs (4.0 KB) - added by nomeata 3 years ago.
Attempt to minimize the problem (still needs conduit) (before patch)
Warp.2.hs (4.1 KB) - added by nomeata 3 years ago.
Attempt to minimize the problem (still needs conduit) (after patch)
Test.hs (175 bytes) - added by nomeata 3 years ago.
Small testcase
recomp.sh (487 bytes) - added by akio 11 months ago.
Test script
T.hs (17.8 KB) - added by Fuuzetsu 9 months ago.
S.hs (684 bytes) - added by Fuuzetsu 9 months ago.
Internal.hi (80.2 KB) - added by Fuuzetsu 3 months ago.
Differing Constr names 1
Internal.2.hi (80.2 KB) - added by Fuuzetsu 3 months ago.
Differing Constr names 2

Download all attachments as: .zip

Change History (129)

comment:1 Changed 5 years ago by kili

This is much fun... after running diff(1) with some sanitizing -I options on two *.ghc.ifaces lists, the most interesting difference was in the interface of the module `Panic, and there in the unfolding: section of showGhcException. Here's the hunk with that difference:

@@ -187032,7 +187029,7 @@
                                 Panic.showGhcException5
                                 (GHC.Base.++
                                    @ GHC.Types.Char
-                                   Config.cProjectVersion
+                                   Config.cBooterVersion
                                    (GHC.Base.++
                                       @ GHC.Types.Char
                                       Panic.showGhcException4

So what happens here seems to be some optimization replacing cProjectVersion by cBooterVersion iff those two strings are equal (i.e. when building 6.12 with 6.12 as the booter).

All other interface and ABI hash changes are just triggered by this change; indeed I could `fix' the problem by removing all occurrences of cBooterVersion. See attached patch (but don't apply, because it's *not* a proper fix but just a workaround).

I'm not sure wether the real bug is this optimization (substitute cBooterVersion for cProjectVersion) or the fact that the value of those strings make it into the ABI hash.

comment:2 follow-up: Changed 5 years ago by simonmar

  • Resolution set to invalid
  • Status changed from new to closed

See Commentary/Compiler/RecompilationAvoidance, in particular I think the point right at the bottom about CSE is probably what's causing the differences you see.

In a sense it's not a bug, in that we know interfaces are not stable. Of course we'd prefer it if interfaces were stable, and moving towards stability has been a goal of mine over the past year or so. I don't think it'll help to have a bug open on this right now, though.

comment:3 Changed 5 years ago by simonmar

Oops, I forgot: thanks for the report, and you were right to be suspicious! Interface stability is a reasonable thing to expect. The fact that you got different results when changing the booting compiler is really a red herring: you can get different results just by saying 'make clean' and recompiling.

comment:4 in reply to: ↑ 2 ; follow-up: Changed 5 years ago by kili

Replying to simonmar:

In a sense it's not a bug, in that we know interfaces are not stable. Of course we'd prefer it if interfaces were stable, and moving towards stability has been a goal of mine over the past year or so. I don't think it'll help to have a bug open on this right now, though.

But this means if two people build packages, where one used a different bootstrapper for ghc than the other, they can't use each other's packages. I'd consider that a bug. (Actually, the OpenBSD port has a knob for using an already installed ghc-6.12.2 for bootstrapping instead of the prebuilt bindist I provide, but I'll better remove that knob for now)

comment:5 in reply to: ↑ 4 ; follow-up: Changed 5 years ago by simonmar

  • Component changed from Build System to Compiler
  • difficulty set to Difficult (2-5 days)
  • Operating System changed from OpenBSD to Unknown/Multiple
  • Resolution invalid deleted
  • Status changed from closed to new
  • Summary changed from Different stage2 results depending on the version of the bootstrapping compiler to Compilation results are not deterministic

Replying to kili:

But this means if two people build packages, where one used a different bootstrapper for ghc than the other, they can't use each other's packages. I'd consider that a bug.

The bootstrapping GHC is a red herring, as I mentioned - the fact is that compilation results aren't deterministic. They never have been, in fact.

Even if compilation results were deterministic, under what circumstances would you want to have two systems "use each others packages", when they're both using a GHC independently built from source? If they independently build GHC from source, why wouldn't they build packages from source too?

Note that packages are registered with an MD5 hash of the interface, so the package system will spot if you try to register an incompatible package.

Anyway, I do agree that having deterministic compilation results is a desirable thing, so on second thoughts I've re-opened the bug and retitled it.

comment:6 in reply to: ↑ 5 ; follow-ups: Changed 5 years ago by kili

Replying to simonmar:

The bootstrapping GHC is a red herring, as I mentioned - the fact is that compilation results aren't deterministic. They never have been, in fact.

And it's good that the problems are detected now with the hashes.

Even if compilation results were deterministic, under what circumstances would you want to have two systems "use each others packages", when they're both using a GHC independently built from source? If they independently build GHC from source, why wouldn't they build packages from source too?

For example, if two persons are working on a ghc package for their operating system, and also on updates for all the stuff that needs ghc (xmonad, alex, happy, monadius, to mention the most important ones), it's a pity if they're not able to exchange packages.

You mentioned that even a `make clean; make' may change the interfaces. Indeed, I remember at least one case where ghc segfailted during the build (this doesn't happen often, only every 20th or 30th build) and I just restarted the build -- and got interface changes.

Note that packages are registered with an MD5 hash of the interface, so the package system will spot if you try to register an incompatible package.

And that's good, but it's just a workaround, UMHO.

Anyway, I do agree that having deterministic compilation results is a desirable thing, so on second thoughts I've re-opened the bug and retitled it.

Thanks. But is this really *one* Bug? There are the problems with non-determinisms, but I think that the CSE on inlined constants (like cBooterVersion and cProjectVersion) is a separate problem.

comment:7 in reply to: ↑ 6 Changed 5 years ago by igloo

Replying to kili:

Thanks. But is this really *one* Bug? There are the problems with non-determinisms, but I think that the CSE on inlined constants (like cBooterVersion and cProjectVersion) is a separate problem.

I don't think that's a bug at all. You are compiling different source code, so it's entirely reasonable that the ABI should be different.

If the ABI was otherwise stable then it might be worth trying to work around cBooterVersion's value affecting the ABI, though.

comment:8 in reply to: ↑ 6 Changed 5 years ago by simonmar

Replying to kili:

For example, if two persons are working on a ghc package for their operating system, and also on updates for all the stuff that needs ghc (xmonad, alex, happy, monadius, to mention the most important ones), it's a pity if they're not able to exchange packages.

For this to work I think you'd really need stable ABIs, not just deterministic compilation. If you only had deterministic compilation, it would be hard to guarantee that all the inputs to the compilation were identical between the two systems: there are a lot of ways that differences can accidentally creep in (different optimisation flags, system configuration settings, etc.).

See Commentary/Packages for more on this.

You mentioned that even a `make clean; make' may change the interfaces. Indeed, I remember at least one case where ghc segfailted during the build (this doesn't happen often, only every 20th or 30th build)

FWIW, we never see random segfaults in GHC here, so I expect that really is a bug specific to your OS or system that needs investigating.

and I just restarted the build -- and got interface changes.

Yes - unfortunate, but not unexpected.

Anyway, I do agree that having deterministic compilation results is a desirable thing, so on second thoughts I've re-opened the bug and retitled it.

Thanks. But is this really *one* Bug? There are the problems with non-determinisms, but I think that the CSE on inlined constants (like cBooterVersion and cProjectVersion) is a separate problem.

Ian pointed out that you really have source differences here, which I hadn't spotted. (Actually, I'm not sure why cBooterVersion isn't set to the version of the stage1 compiler when building stage2, it doesn't seem useful to record the stage0 version in the stage2 compiler.)

So the underlying problem leading to the CSE issue is that the compiler doesn't use a deterministic ordering for bindings internally. It uses the Unique ordering, which is semi-random. The results of CSE may depend on the order of bindings, but it's entirely possible that there are other non-deterministic consequences of the same kind elsewhere. Making the order deterministic would fix all of them.

comment:9 Changed 5 years ago by igloo

  • Milestone set to 6.14.1

comment:10 Changed 5 years ago by igloo

  • Milestone changed from 7.0.1 to 7.0.2

comment:11 Changed 5 years ago by igloo

  • Milestone changed from 7.0.2 to 7.2.1

comment:12 Changed 4 years ago by nomeata

  • Cc mail@… added

Just a minor comment: This is hurting distributions quite a bit. Is there a chance to at least avoid this particular problem (cBooterVersion vs cProjectVersion) in the next ghc release?

Thanks, Joachim

comment:13 Changed 4 years ago by igloo

  • Milestone changed from 7.2.1 to 7.4.1

comment:14 Changed 4 years ago by igloo

  • Milestone changed from 7.4.1 to 7.6.1
  • Priority changed from normal to low

comment:15 Changed 4 years ago by nomeata

This has bit us again with 7.4.1. I forgot about this issue and we built lots of libraries against the first upload of GHC 7.4.1 (built with 7.0.4), all of which will have to be rebuild after the next minor GHC upload, because that will be built with 7.4.1 and the base API will likely change again.

This is the interface diff causing the ABI change in base, in case anybody wonders:

/usr/lib/ghc/base-4.5.0.0/System/Posix/Internals.hi
--- /dev/fd/63  2012-02-10 20:33:16.717639938 +0000
+++ /dev/fd/62  2012-02-10 20:33:16.717639938 +0000
@@ -5,11 +5,11 @@
 Way: Wanted [],
      got    []
 interface base:System.Posix.Internals 7041
-  interface hash: fef49c410428b674b72ebd8b1a93bd62
-  ABI hash: 33b2adf3f92d97c87fbcbd52d7f22781
+  interface hash: 13159d537315369ddfe00efa59167f8a
+  ABI hash: fe56115a605d2758561d089b972bb8bb
   export-list hash: 83b224804aef34838bb7767a01e8aaa7
   orphan hash: 693e9af84d3dfcc71e640e005bdc5e2e
-  flag hash: 28bf87c2d7df91e45dad874fc0a5931f
+  flag hash: 865402a98b08183763ca20b5e9837ae0
   used TH splices: False
   where
 exports:
@@ -241,7 +241,7 @@
 import  -/  integer-gmp:GHC.Integer.Type 254721fe3c053c778036ed1a1fa4248d
 addDependentFile "libraries/base/include/HsBaseConfig.h"
 addDependentFile "libraries/base/dist-install/build/autogen/cabal_macros.h"
-3e0a4be3b609e3ac62226d26fd8dbfa2
+9e62726eeeec5155bee7f03712b80c79
   $wa :: GHC.Prim.Int#
          -> GHC.Prim.State# GHC.Prim.RealWorld
          -> (# GHC.Prim.State# GHC.Prim.RealWorld, GHC.IO.IOMode.IOMode #)
@@ -258,7 +258,7 @@
                           System.Posix.Internals.fdGetMode3
                           System.Posix.Internals.fdGetMode2
                           (\ ds2 :: GHC.Prim.State# GHC.Prim.RealWorld ->
-                           case {__pkg_ccall base ghc_wrapper_d2ju_fcntl GHC.Prim.Int#
+                           case {__pkg_ccall base ghc_wrapper_d2jn_fcntl GHC.Prim.Int#
                                                                          -> GHC.Prim.Int#
                                                                          -> GHC.Prim.State#
                                                                                 GHC.Prim.RealWorld
@@ -386,7 +386,7 @@
                                               0
                                               -> System.Posix.Internals.fileType2
                                                    w } } } } } } } } } } } }) -}
-178f30c6a50e12552309d6f35e96cfa3
+1e8eed47078d0056cc7209ae4be4cd83
   $wa2 :: GHC.Prim.Int#
           -> GHC.Prim.State# GHC.Prim.RealWorld
           -> (# GHC.Prim.State# GHC.Prim.RealWorld, () #)
@@ -404,7 +404,7 @@
                                                                         GHC.Prim.RealWorld,
                                                                     GHC.Prim.Int# #)}
                           GHC.Prim.realWorld# of wild1 { (#,#) ds2 ds3 ->
-                   case {__pkg_ccall base ghc_wrapper_d2ji_fcntl GHC.Prim.Int#
+                   case {__pkg_ccall base ghc_wrapper_d2jb_fcntl GHC.Prim.Int#
                                                                  -> GHC.Prim.Int#
                                                                  -> GHC.Prim.Int#
                                                                  -> GHC.Prim.State#
@@ -440,7 +440,7 @@
                              (GHC.Types.NTCo:IO (Refl Foreign.C.Types.CInt))
                                ds6 of wild5 { (#,#) new_s1 a76 ->
                         (# new_s1, GHC.Tuple.() #) } } } } } }) -}
-20c72f17c3f40d20ed52caaaa7a2c116
+ce1394a8e53df017e302d2cc305d0231
   $wa3 :: GHC.Prim.Int#
           -> GHC.Types.Bool
           -> GHC.Prim.State# GHC.Prim.RealWorld
@@ -459,7 +459,7 @@
                           System.Posix.Internals.fdGetMode3
                           System.Posix.Internals.setNonBlockingFD3
                           (\ ds2 :: GHC.Prim.State# GHC.Prim.RealWorld ->
-                           case {__pkg_ccall base ghc_wrapper_d2ju_fcntl GHC.Prim.Int#
+                           case {__pkg_ccall base ghc_wrapper_d2jn_fcntl GHC.Prim.Int#
                                                                          -> GHC.Prim.Int#
                                                                          -> GHC.Prim.State#
                                                                                 GHC.Prim.RealWorld
@@ -494,7 +494,7 @@
                                                                                GHC.Prim.RealWorld,
                                                                            GHC.Prim.Int# #)}
                                     GHC.Prim.realWorld# of wild6 { (#,#) ds2 ds3 ->
-                             case {__pkg_ccall base ghc_wrapper_d2ji_fcntl GHC.Prim.Int#
+                             case {__pkg_ccall base ghc_wrapper_d2jb_fcntl GHC.Prim.Int#
                                                                            -> GHC.Prim.Int#
                                                                            -> GHC.Prim.Int#
                                                                            -> GHC.Prim.State#
@@ -530,7 +530,7 @@
                                                                                GHC.Prim.RealWorld,
                                                                            GHC.Prim.Int# #)}
                                     GHC.Prim.realWorld# of wild6 { (#,#) ds4 ds5 ->
-                             case {__pkg_ccall base ghc_wrapper_d2ji_fcntl GHC.Prim.Int#
+                             case {__pkg_ccall base ghc_wrapper_d2jb_fcntl GHC.Prim.Int#
                                                                            -> GHC.Prim.Int#
                                                                            -> GHC.Prim.Int#
                                                                            -> GHC.Prim.State#
@@ -751,7 +751,7 @@
                        ((->)
                             (Sym (Foreign.C.Types.NTCo:CInt))
                             (GHC.Types.IO (Sym (Foreign.C.Types.NTCo:CInt))))) -}
-285fd75770912c1d8f7839e3a5f7913c
+5f043b97e154508a95f4d6814442f737
   c_fcntl_lock :: Foreign.C.Types.CInt
                   -> Foreign.C.Types.CInt
                   -> GHC.Ptr.Ptr System.Posix.Internals.CFLock
@@ -765,7 +765,7 @@
                    case ds1 of ds5 { GHC.Int.I32# ds6 ->
                    case ds2 of ds7 { GHC.Ptr.Ptr ds8 ->
                    (\ ds9 :: GHC.Prim.State# GHC.Prim.RealWorld ->
-                    case {__pkg_ccall base ghc_wrapper_d2j3_fcntl GHC.Prim.Int#
+                    case {__pkg_ccall base ghc_wrapper_d2iW_fcntl GHC.Prim.Int#
                                                                   -> GHC.Prim.Int#
                                                                   -> GHC.Prim.Addr#
                                                                   -> GHC.Prim.State#
@@ -788,7 +788,7 @@
                             ((->)
                                  (Refl (GHC.Ptr.Ptr System.Posix.Internals.CFLock))
                                  (GHC.Types.IO (Sym (Foreign.C.Types.NTCo:CInt)))))) -}
-344994480e335140ee30d5da0c864863
+f98068efddbcdfb5e3350b8ad3fa7be6
   c_fcntl_read :: Foreign.C.Types.CInt
                   -> Foreign.C.Types.CInt
                   -> GHC.Types.IO Foreign.C.Types.CInt
@@ -798,7 +798,7 @@
                    case ds of ds2 { GHC.Int.I32# ds3 ->
                    case ds1 of ds4 { GHC.Int.I32# ds5 ->
                    (\ ds6 :: GHC.Prim.State# GHC.Prim.RealWorld ->
-                    case {__pkg_ccall base ghc_wrapper_d2ju_fcntl GHC.Prim.Int#
+                    case {__pkg_ccall base ghc_wrapper_d2jn_fcntl GHC.Prim.Int#
                                                                   -> GHC.Prim.Int#
                                                                   -> GHC.Prim.State#
                                                                          GHC.Prim.RealWorld
@@ -817,7 +817,7 @@
                        ((->)
                             (Sym (Foreign.C.Types.NTCo:CInt))
                             (GHC.Types.IO (Sym (Foreign.C.Types.NTCo:CInt))))) -}
-6387e106fea1f6634ffe8fe759bd7748
+c8ec806642726c5ab7f4c951173d88ac
   c_fcntl_write :: Foreign.C.Types.CInt
                    -> Foreign.C.Types.CInt
                    -> Foreign.C.Types.CLong
@@ -829,7 +829,7 @@
                    case ds1 of ds5 { GHC.Int.I32# ds6 ->
                    case ds2 of ds7 { GHC.Int.I64# ds8 ->
                    (\ ds9 :: GHC.Prim.State# GHC.Prim.RealWorld ->
-                    case {__pkg_ccall base ghc_wrapper_d2ji_fcntl GHC.Prim.Int#
+                    case {__pkg_ccall base ghc_wrapper_d2jb_fcntl GHC.Prim.Int#
                                                                   -> GHC.Prim.Int#
                                                                   -> GHC.Prim.Int#
                                                                   -> GHC.Prim.State#
@@ -1584,7 +1584,7 @@
                   ((->)
                        (Refl (GHC.Ptr.Ptr Foreign.C.Types.CChar))
                        (GHC.Types.IO (Sym (Foreign.C.Types.NTCo:CInt)))) -}
-b6e75b90c0bc4bc5d2e2ae1b63b9846f
+236fe7b0bd958dd442a1f699d1345485
   c_utime :: Foreign.C.String.CString
              -> GHC.Ptr.Ptr System.Posix.Internals.CUtimbuf
              -> GHC.Types.IO Foreign.C.Types.CInt
@@ -1595,7 +1595,7 @@
                    case ds of ds2 { GHC.Ptr.Ptr ds3 ->
                    case ds1 of ds4 { GHC.Ptr.Ptr ds5 ->
                    (\ ds6 :: GHC.Prim.State# GHC.Prim.RealWorld ->
-                    case {__pkg_ccall base ghc_wrapper_d2hj_utime GHC.Prim.Addr#
+                    case {__pkg_ccall base ghc_wrapper_d2hc_utime GHC.Prim.Addr#
                                                                   -> GHC.Prim.Addr#
                                                                   -> GHC.Prim.State#
                                                                          GHC.Prim.RealWorld
@@ -1931,7 +1931,7 @@
                         (Foreign.C.Types.NTCo:CInt) of wild { GHC.Int.I32# a76 ->
                    case a76 of wild1 {
                      DEFAULT -> GHC.Types.False (-1) -> GHC.Types.True } }) -}
/usr/lib/ghc/base-4.5.0.0/Control/Concurrent.hi
--- /dev/fd/63  2012-02-10 20:33:24.365677867 +0000
+++ /dev/fd/62  2012-02-10 20:33:24.365677867 +0000
@@ -5,11 +5,11 @@
 Way: Wanted [],
      got    []
 interface base:Control.Concurrent 7041
-  interface hash: 859a60ba0f58963d0b32951e150a9384
-  ABI hash: ae249f19a927403afe6459ced1ed82bd
+  interface hash: bbbd6db6b81349eeb98d6de1d9533e0e
+  ABI hash: a2da1fbff0edb4b1499650aa0180adeb
   export-list hash: 43716dbfa877e75f4e9112114a6a1572
   orphan hash: 693e9af84d3dfcc71e640e005bdc5e2e
-  flag hash: 44b3db9df460d6cd34b4094fa836f3ef
+  flag hash: 9d38727ed6a9e7ba8d387df844cc1d1e
   used TH splices: False
   where
 exports:
@@ -174,7 +174,7 @@
   otherwise 82850e7a148d17e005ebe89173cad100
 import  -/  GHC.Conc 62a874d9e31bc9eb426ba181adbd36a1
   exports: 7f1de5b156bcfc21b036b8be089d0862
-import  -/  GHC.Conc.IO 9c0a53bd8477f330b48bd95ab54465be
+import  -/  GHC.Conc.IO a46105f75fe740b6d84dba2a24195977
   threadDelay 3f2ea80f962aa726a69f0a26c2cd72e8
   threadWaitRead ee2b193ffe18e7cc4e8ed3b078e64f0c
   threadWaitWrite 892414941939665c2ba67b0eb12ffb9b
@@ -238,6 +238,16 @@
 import  -/  ghc-prim:GHC.Classes 9f526208b19b2511259880485f6b7413
 import  -/  ghc-prim:GHC.Types d58bb266a5f6fd38ade7006bcfc6ede5
 addDependentFile "libraries/base/dist-install/build/autogen/cabal_macros.h"
+0909aa1928600224faf9e000c5f6b0e0
+  $fforkOS_entry_a1T0 :: GHC.Stable.StablePtr (GHC.Types.IO ())
+                         -> GHC.Types.IO ()
+    {- Arity: 2, HasNoCafRefs, Strictness: U(L)L,
+       Unfolding: InlineRule (0, True, True)
+                  Control.Concurrent.$fforkOS_entry_a1T1
+                    `cast`
+                  ((->)
+                       (Refl (GHC.Stable.StablePtr (GHC.Types.IO ())))
+                       (Sym (GHC.Types.NTCo:IO (Refl ())))) -}
 3ec1a60cf71f47d982cf274d264b867d
   $fforkOS_entry_a1T1 :: GHC.Stable.StablePtr (GHC.Types.IO ())
                          -> GHC.Prim.State# GHC.Prim.RealWorld
@@ -252,16 +262,6 @@
                           sp
                           eta of wild1 { (#,#) new_s a ->
                    a `cast` (GHC.Types.NTCo:IO (Refl ())) new_s } }) -}
-cf9f1b73372486eb748c4c2a9120bf5b
-  $fforkOS_entry_a1T7 :: GHC.Stable.StablePtr (GHC.Types.IO ())
-                         -> GHC.Types.IO ()
-    {- Arity: 2, HasNoCafRefs, Strictness: U(L)L,
-       Unfolding: InlineRule (0, True, True)
-                  Control.Concurrent.$fforkOS_entry_a1T1
-                    `cast`
-                  ((->)
-                       (Refl (GHC.Stable.StablePtr (GHC.Types.IO ())))
-                       (Sym (GHC.Types.NTCo:IO (Refl ())))) -}
 fcb1cef8d2a42a26479f83226cc08633
   type Buffer a
       = (GHC.MVar.MVar (GHC.MVar.MVar [a]), Control.Concurrent.QSem.QSem)

And similarly in /usr/lib/ghc/base-4.5.0.0/GHC/IO/Handle/Internals.hi:

@@ -2181,7 +2181,7 @@
     {- Arity: 3, HasNoCafRefs, Strictness: SU(LLLLLL)L,
        Inline: INLINE[0],
        Unfolding: Worker(ext0: GHC.IO.Handle.Internals.$wa3 (arity 3) -}
-"SC:a_s31q0" [ALWAYS] forall @ dev
+"SC:a_s31j0" [ALWAYS] forall @ dev
                              @ enc_state
                              @ dec_state
                              sc :: GHC.IO.Device.IODevice dev

Maybe the hashes over the inlining expressions could be made independent of the names, by applying some normalization before hashing the expression?

comment:16 Changed 4 years ago by simonmar

  • Description modified (diff)
  • Priority changed from low to high

I've updated the description following the latest evidence. Let's try to do something about this for the next major release.

comment:17 Changed 3 years ago by nomeata

Let me illustrate one consequence of this problem. In Debian, the freeze for wheezy is near. We have just managed to migrate ghc-7.4.1 to wheezy, but there are a few remaining bugs of various severity (#5991, #6156) and having GHCi on ARM would be nice as well. Alas, we cannot upgrade to ghc-7.4.2 which provides all this, as it would require rebuilding everything, which is not possible at this stage any more (http://lists.debian.org/debian-haskell/2012/06/msg00038.html).

What we will probably do is to backport fixes from 7.4.2 onto 7.4.1, but this is not nice either, as it might introduce new bugs and makes our ghc deviate more and more from upstream.

Ideally, if 7.4.2 does not change anything about the actual ABI, it would not force it to change completely. Currently, it does, by encoding the version number in every hash (#5328; granted, I filed that, but that does not mean its the best solution :-)). It should do something like haddock: Keep an internal counter, separate from the version number, that is increased if indeed everything needs to be rebuilt, e.g. changes in the .hi file format, in the calling convention etc. (← identifying this list is probably non-trivial). And if a new compiler version is released that does not do any of these, the number stays the same and previously compiled packages can be re-used.

This would also require the file names of installed packages to not include the ghc version but the ghc abi interface number; but configuring this can be left to the distributions.

(Hmm, this is not _really_ this bug, but closely related.)

comment:18 Changed 3 years ago by igloo

  • Milestone changed from 7.6.1 to 7.6.2

comment:19 Changed 3 years ago by nomeata

Again this is biting me. We are trying to backport a fix in warp to Debian testing, without having to rebuild everything based on it. And indeed the patch does not really change any of the exported information (e.g. no function eligible for inlining). But nevertheless the ABI hash changes. Judging from the diff of ghc --show-iface (attached), this is only due to renaming of internal variables.

Maybe the ABI hashing should first alpha-normalize the expression?

Changed 3 years ago by nomeata

Changed ABI only due to alpha renaming

comment:20 Changed 3 years ago by simonmar

We do alpha-normalise the whole code of the module before we hash it - that's why you're seeing a60 rater than the internal names that look like a_s3fj. It's a mystery to me why there are differences here, so we ought to look into it. Ian has agreed to fix the differences due to ghc_wrapper Ids that were mentioned above, and to look into the differences in "SC:..." rule names, which should fix two of the known issues here.

Fixing the CSE ordering issue is a tricky one, because it means keeping the bindings in a deterministic order throughout the compiler.

comment:21 follow-up: Changed 3 years ago by simonpj

So to summarise Simon M's comments

  • A full and total fix is beyond us at the moment
  • But it seems likely that some modest fixes that are relatively easy (notably ccall wrappers and the "SC.." rule names) may go a long way. Ian is going to have a go at those.

As he says, the a59/a60 diff in your example is a bit mysterious. Perhaps show us both interface files? And preferably a way to reproduce.

Simon

comment:22 Changed 3 years ago by igloo

  • Owner set to igloo

comment:23 in reply to: ↑ 21 Changed 3 years ago by nomeata

Replying to simonpj:

As he says, the a59/a60 diff in your example is a bit mysterious. Perhaps show us both interface files? And preferably a way to reproduce.

To reproduce, fetch haskell-warp-1.2.1.1 and compile with 7.4.1; once unmodified and once with the (to be attached) patch applied. Interface files also attached.

Changed 3 years ago by nomeata

This change changes the ABI of warp-1.2.1.1 unexpectedy

Changed 3 years ago by nomeata

Interface file before the patch

Changed 3 years ago by nomeata

Interface file after the patch

Changed 3 years ago by nomeata

Attempt to minimize the problem (still needs conduit) (before patch)

Changed 3 years ago by nomeata

Attempt to minimize the problem (still needs conduit) (after patch)

comment:24 Changed 3 years ago by ian@…

commit 095b9bf4ed418c43216cfca2ae271c143e555f1d

Author: Ian Lynagh <[email protected]>
Date:   Fri Nov 2 03:45:15 2012 +0000

    Don't put uniqs in ghc wrapper function names; part of #4012
    
    The wrapper functions can end up in interface files, and thus are
    part of the ABI hash. But uniqs easily change for no good reason
    when recompiling, which can lead to an ABI hash needlessly changing.

 compiler/deSugar/DsForeign.lhs |   20 +++++++++++++++-----
 compiler/main/DynFlags.hs      |   11 ++++++++---
 2 files changed, 23 insertions(+), 8 deletions(-)

comment:25 Changed 3 years ago by ian@…

commit ba38f995d6312aa0cfe15873c8e5e9475e03f19c

Author: Ian Lynagh <[email protected]>
Date:   Fri Nov 2 22:54:12 2012 +0000

    Avoid putting uniqs in specconstr rules; part of #4012
    
    There's no need to have the uniq in the rule, but its presence can
    cause spurious ABI changes.

 compiler/specialise/SpecConstr.lhs |   10 +++++++---
 1 files changed, 7 insertions(+), 3 deletions(-)

comment:26 follow-up: Changed 3 years ago by simonpj

Do Ian's changes help?

comment:27 in reply to: ↑ 26 Changed 3 years ago by nomeata

Replying to simonpj:

Do Ian's changes help?

Please excuse me if I don’t set up HEAD in a way to build all the packages required to test the code above. But I did find a small test case that could potentially be added to the test suite. With the attached Test.hs and GHC-7.4.1, I get this result:

$ ghc --make -O Test.hs && cp Test.hi Test1.hi && ghc --make -DVARIANT -O Test.hs && diff -u <(ghc --show-iface Test1.hi) <(ghc --show-iface Test.hi)
[1 of 1] Compiling Test             ( Test.hs, Test.o )
[1 of 1] Compiling Test             ( Test.hs, Test.o )
--- /dev/fd/63	2012-11-07 20:45:57.719385850 +0100
+++ /dev/fd/62	2012-11-07 20:45:57.719385850 +0100
@@ -5,11 +5,11 @@
 Way: Wanted [],
      got    []
 interface main:Test 7041
-  interface hash: f70ea3ebb89cd1fb5fd38e711dc3a86b
-  ABI hash: 3d46e40ea31fe2b5209a9cfa0e8c55a2
+  interface hash: 0c3a556b0b6f4bd91ecf6b9097dd0914
+  ABI hash: fbf17c77263ac40a0a46a1da9debd563
   export-list hash: 10500eb025dddf2ab63ae606468fa772
   orphan hash: 693e9af84d3dfcc71e640e005bdc5e2e
-  flag hash: fc61acebd75ed0e7e110c288131c5ae4
+  flag hash: eda85656077ab2758c42e66fbdb9f82a
   used TH splices: False
   where
 exports:
@@ -33,15 +33,15 @@
                      ds :: ((), a) ->
                    case ds of wild { (,) ds1 z ->
                    case ds1 of wild1 { () -> Test.x @ a $dEq $dNum z } }) -}
-9ca884031f6b6682163a35c8b033ebe1
+3170054a493c32af50e7d3df196daeae
   b :: forall a. [a] -> [a] -> [a]
     {- Arity: 1, HasNoCafRefs, Strictness: L,
        Unfolding: (\ @ a x1 :: [a] ->
                    let {
-                     lvl2 :: [a]
+                     lvl3 :: [a]
                      = let { y :: [a] = GHC.Base.++ @ a x1 x1 } in GHC.Base.++ @ a y y
                    } in
-                   \ z :: [a] -> GHC.Base.++ @ a z lvl2) -}
+                   \ z :: [a] -> GHC.Base.++ @ a z lvl3) -}
 2d4ae0d356c36e2c96109ac55a0ff610
   x :: forall a.
        GHC.Classes.Eq a -> GHC.Num.Num a -> a -> GHC.Types.Bool

Changed 3 years ago by nomeata

Small testcase

comment:28 Changed 3 years ago by simonmar

Note that compiling different code and hoping to get the same ABI hash is a much more difficult proposition than getting the same ABI hash by repeatedly compiling the same code (which is what this ticket was originally about). I realise it's a reasonable thing to want to do, but I just wanted to point out the distinction. We need to solve the easier problem first before we can tackle the harder one.

comment:29 Changed 3 years ago by nomeata

Sure. But it is desireable, and low hanging fruit that fixes the ABI of different code (such as normalizing the interface wrt alpha-renaming) will probably make it more robust against odd changes when compiling the same code twice.

comment:30 Changed 3 years ago by simonmar

Actually you're right. I said that we do alpha-renaming, but in fact the current algorithm keeps the original names when they don't clash, and that gives rise to the lvl2/lvl3 difference in your example.

We should not be using the original OccName when tidying a local binder, we should just always use "x", or use a different OccName for different kinds of binders (e.g. "x" for let-binders, "y" for lambdas, and "z" for case binders). Relevant code is in coreSyn/CoreTidy.lhs:tidyIdBndr.

comment:31 Changed 3 years ago by simonmar

... or, if we like to keep the original names for readability, we could do local renaming only when computing the hash, and throw away the renamed version afterwards (basically hash the DeBruijn representation).

comment:32 Changed 3 years ago by igloo

I think this is only really a problem when ghc generates the same name at both the top level and in an expression. Is it feasible to generate e.g. 'tlvl' for top-level names, and 'lvl' for let/lambda-bound names?

comment:33 Changed 3 years ago by igloo

  • Owner igloo deleted

comment:34 Changed 3 years ago by simonmar

@igloo: I'm not sure if that would fix it. A name that was generated at the top-level might be floated in, and conversely a non-top-level name might be floated out. So wouldn't the same problem still occur?

comment:35 Changed 3 years ago by igloo

Hmm, OK. I was going to suggest a final renaming pass (in which only generated names get renamed) and not throwing the result away, so that it's easier to understand when the hash is or isn't changing.

But I think that would be worse, because then the -ddump-simpl etc names wouldn't match the final names.

So your DeBruijn idea sounds best to me.

comment:36 Changed 3 years ago by refold

  • Cc the.dead.shall.rise@… added

comment:37 Changed 3 years ago by isaacdupree

  • Cc id@… added

comment:38 Changed 2 years ago by shacka

  • Cc shacka@… added

comment:39 follow-up: Changed 20 months ago by nh2

Is this related to https://ghc.haskell.org/trac/ghc/ticket/8144, and does its fix maybe improve the situation?

comment:40 Changed 20 months ago by nh2

  • Cc mail@… added

comment:41 in reply to: ↑ 39 ; follow-up: Changed 20 months ago by nomeata

Replying to nh2:

Is this related to https://ghc.haskell.org/trac/ghc/ticket/8144, and does its fix maybe improve the situation?

Not very much, I believe, as the cases discuss here do not necessarily involve header files. But the fix for #8144 is of course a general improvement.

comment:42 in reply to: ↑ 41 Changed 20 months ago by nh2

Replying to nomeata:

Not very much, I believe, as the cases discuss here do not necessarily involve header files. But the fix for #8144 is of course a general improvement.

It's not so much about header files only - I believe UsageFile can be introduced using multiple ways, not only #include, and maybe some other time staps makes it into those hashes like in that bug.

comment:43 Changed 20 months ago by simonmar

Things got worse for a while when we were including timestamps in the interface, but then #8144 fixed that. There are still a set of underlying problems that cause non-determinism.

comment:44 Changed 19 months ago by nomeata

Debian is (slowly, as usual) working towards reproducible builds, and while some Haskell packages satisfy that requirement, others do not – see this message: https://lists.debian.org/debian-haskell/2014/02/msg00011.html

What I find interesting is that --show-iface shows now change at all, besides a different interface hash. It would make debugging these issues easier if everything that takes part in the hash calculation was visible in --show-iface.

Off your head, any idea what else could contribute to a interface change?

Built paths and time stamps contribute not towards the hash, right?

comment:45 Changed 19 months ago by nomeata

Ah, I just see that #8144 was actually fixed after 7.6. I’ll shut up and report back when 7.8 has hit the Debian archives...

comment:46 Changed 14 months ago by thoughtpolice

  • Priority changed from high to normal

Lowering priority (these tickets are assigned to older versions, so they're getting bumped as they've been around for a while).

comment:47 Changed 14 months ago by thoughtpolice

  • Milestone changed from 7.6.2 to 7.10.1

Moving to 7.10.1.

comment:48 Changed 13 months ago by pumpkin

  • Cc pumpkingod@… added

comment:49 Changed 13 months ago by Fuuzetsu

I just had a problem due to this on NixOS, what would be the way to push this ticket forward bar fixing it myself? I'm afraid that it will just keep getting the milestone bumped forever.

comment:50 Changed 13 months ago by Fuuzetsu

  • Cc fuuzetsu@… added

comment:51 Changed 13 months ago by nh2

I submitted a patch yesterday with which I managed to create byte-identical binaries with ghc --make (and cabal build if you use cabal 1.20):

https://github.com/nh2/ghc/compare/deterministic-binaries

https://phabricator.haskell.org/D175

comment:52 Changed 13 months ago by simonpj

Thanks for producing a patch. THAT is the best way to move the ticket forward!

I've added a saying: The above patches look commendably small. But the resulting code contains absolutely no clue about the importance of the way those particular lines of code are written. Please, I beg,

  • write a Note [Deterministic builds] somewhere,
  • describe the issues in the note,
  • mention the ticket #4012 for more background
  • refer to the Note from the places in the code that are carefully written to support determinism

See our Coding Style guidelines.

Thank you!

Does that close the ticket? Or are there further determinism issues?

Simon

comment:53 Changed 12 months ago by nh2

Simon,

I shall do that if you give me a suggestion where I shall put the note. Deterministic builds seem to depend on many things, so I'm unsure what the best location is.

comment:54 Changed 12 months ago by nh2

Another setback for this ticket is that parallel builds, e.g. ghc -j4, make builds non-deterministic.

The generated closure names (r4WO_closure..., r5BF_closure...) vary with -j but don't without.

Does anybody have an idea on how we could make sure that they are not "as shared", so that the names generated do not depend on the concurrency of the build?

comment:55 Changed 12 months ago by simonmar

We can separate this problem into two:

  1. non-determinism in the ABI, which leads to different ABI hashes being generated, and different package identities in the package database. This annoys OS packagers, who like to be able to generate deterministic package names.
  1. non-determinism in the generated object file. (who cares about this and why?)

(1) is a prerequisite for (2), and (2) is more difficult. The ticket description contains a list of issues that need to be fixed for (1).

comment:56 Changed 12 months ago by nh2

@simonmar: Then I shall leave this ticket to issues concerning (1), and create a separate one for byte-identical compilation.

Why would you want byte-identical compilation (or, how Debian calls it, "reproducible builds")? There are a few reasons (3 is the key point for me right now):

  1. Distributions want it (see https://wiki.debian.org/ReproducibleBuilds#Why_do_we_want_reproducible_builds.3F)
  2. It allows users of open source projects (e.g. Tor, TrueCrypt had famous efforts in this direction) to verify that the binary they downloaded really is built from the unmodified source
  3. It gives recompilation avoidance in build systems outside of GHC. If the binary you just produced is byte-identical to what was already there, you can stop (e.g. don't need to run tests in CI etc.).

And finally, a more-belief-than-evidence-supported point:

  1. It will improve binary distributions of cabal packages and similar things. For example, I would like to build a Haskell equivalent to ccache, and having byte-identical output for the same input would make that much easier.

comment:57 Changed 12 months ago by simonmar

I agree that byte-identical build results might be useful (though difficult), I question whether you need it for ccache. ccache doesn't need to compare build results, only build *inputs*; when the inputs are the same we can re-use the previous output. It doesn't matter whether repeating the build would produce the exact same output, only that it is valid to re-use the previous output.

comment:58 Changed 11 months ago by Fuuzetsu

I'd just like to ask where this ticket stands after the patch. Does it solve simonmar's (1), namely produced package IDs? If we build with this patch *without* parallel builds, does this mean we should now get deterministic package ID and anything else is a bug, or are there still known cases that produce different IDs?

comment:59 Changed 11 months ago by simonmar

No, there are still several problems that remain before (1) is solved. See the description of the ticket for a (possibly incomplete) list.

comment:60 Changed 11 months ago by rwbarton

Fuuzetsu, which patch are you talking about, Phab:D175? I don't think it should affect the ABI hash at all, so it only pertains to (2), not (1).

Changed 11 months ago by akio

Test script

comment:61 Changed 11 months ago by akio

Here is one example where the interface changes while there is no code change at all. On my computer (with GHC 7.8.3 on Linux), the attached shell script compiles Main.hs 4 times rather than just once. B.hi seems to contain some rules whose name is not stable.

comment:62 Changed 11 months ago by akio

  • Cc tkn.akio@… added

comment:63 Changed 11 months ago by nomeata

Nice example. Here is the diff between such interface files.

-"SPEC $cshowsPrec_a2QX @ [GHC.Types.Char]" [ALWAYS] forall $dShow :: GHC.Show.Show
+"SPEC $cshowsPrec_a2Q6 @ [GHC.Types.Char]" [ALWAYS] forall $dShow :: GHC.Show.Show
                                                                        [GHC.Types.Char]

It looks similar in 7.6.3 and in HEAD.

comment:64 Changed 11 months ago by nomeata

I think I can root out this instance of non-determinism.

comment:65 Changed 11 months ago by simonpj

Whoa! The name of an auto-generated rule is itself auto-generated, so that a (savvy) user can track back to which function is being specialised. But nothing will go wrong if two rules accidentally have the same name, which seems entirely possible if you suppress the unique.

I can see that this is a rather gratuitous source of non-determinism. Joachim, do go ahead and suppress the unique (with an accompanying Note!). Thanks.

Simon

comment:66 Changed 11 months ago by nomeata

Simple patch, pushed at Phab:D346 for validation. Will push as soon as that goes through.

comment:67 Changed 11 months ago by Joachim Breitner <mail@…>

In a477e8118137b7483d0a7680c1fd337a007a023b/ghc:

Avoid printing uniques in specialization rules

Akio found an avoidable cause of non-determinisim: The names of RULES
generated by Specialise had uniques in them:
    "SPEC $cshowsPrec_a2QX @ [GHC.Types.Char]" [ALWAYS] forall ...
By using showSDocForUser instead of showSDocDump when building the rule
name, this is avoided:
    "SPEC $cshowsPrec @ [Char]" [ALWAYS] forall ...
See #4012, comments 61ff.

comment:68 Changed 9 months ago by Fuuzetsu

  • Owner set to Fuuzetsu

Oops, guess I forgot to put anything here. I've been tasked with fixing this some weeks ago and hope to make progress in this area within this week. Maybe even I'll have a small test case by later today.

comment:69 Changed 9 months ago by Fuuzetsu

Oops, guess I forgot to put anything here. I've been tasked with fixing this some weeks ago and hope to make progress in this area within this week. Maybe even I'll have a small test case by later today.

comment:70 Changed 9 months ago by Fuuzetsu

I got a test case: I took http-client, a library that was notorious for non-deterministic ABIs, found a small module without any internal imports and stripped it down.

It seems that this can be exhibited by nearly any compilation as long as you are compiling more than one module at once: ghc -O -j2 Foo will, AFAICT, for the simple case, result in the same ABI pretty much always. ghc -O -j2 Foo Bar will often not, but how often depends on the content of each. Foo and Bar can be completely independent, not importing each other. Probably going to be looking at uniqs and that unsafeInterleaveIO (thanks to Duncan for pointing it out) but needed a test case first.

I attach two modules, T and S: they are not ‘trivial’ modules because with those the ABI hash changes much less frequently. These modules only import the libraries that ship with GHC so it is easy to hack on. Follows a sample session with 7.8.3:

[shana@lenalee:~/programming/ghc-nondeterminism]$ nix-shell -p haskellPackages.ghc --pure

[nix-shell:~/programming/ghc-nondeterminism]$ ghc -fforce-recomp -O -j2 -outputdir tmpdir -odir tmpdir -hidir tmpdir -stubdir tmpdir T S && ghc --show-iface tmpdir/T.hi | grep ABI                                                 
[1 of 2] Compiling S                ( S.hs, tmpdir/S.o )
[2 of 2] Compiling T                ( T.hs, tmpdir/T.o )
  ABI hash: 801be37df642d815fae5f74aa0f56580

[nix-shell:~/programming/ghc-nondeterminism]$ ghc -fforce-recomp -O -j2 -outputdir tmpdir -odir tmpdir -hidir tmpdir -stubdir tmpdir T S && ghc --show-iface tmpdir/T.hi | grep ABI                                                 
[1 of 2] Compiling S                ( S.hs, tmpdir/S.o )
[2 of 2] Compiling T                ( T.hs, tmpdir/T.o )
  ABI hash: 59cd140ecf48d626231ac3f8ecb5291e

[nix-shell:~/programming/ghc-nondeterminism]$ ghc -fforce-recomp -O -j2 -outputdir tmpdir -odir tmpdir -hidir tmpdir -stubdir tmpdir T S && ghc --show-iface tmpdir/T.hi | grep ABI                                                 
[1 of 2] Compiling S                ( S.hs, tmpdir/S.o )
[2 of 2] Compiling T                ( T.hs, tmpdir/T.o )
  ABI hash: 563ecab71f525a8aea21d4982ec67388

[nix-shell:~/programming/ghc-nondeterminism]$ l                                                                                                                                                                                                                                 
total 44K
drwxr-xr-x   3 shana nogroup 4.0K Dec 14 19:59 .
drwxr-xr-x 221 shana shana    12K Dec 14 19:58 ..
-rw-r--r--   1 shana nogroup  684 Dec 14 19:59 S.hs
-rw-r--r--   1 shana nogroup  18K Dec 14 19:59 T.hs
drwxr-xr-x   2 shana nogroup 4.0K Dec 14 20:03 tmpdir

Changed 9 months ago by Fuuzetsu

Changed 9 months ago by Fuuzetsu

comment:71 Changed 9 months ago by Fuuzetsu

Trying with GHC HEAD and not experiencing the changing hash with the example I gave above. Maybe not printing uniques helped here but I'll be spending time trying to break it some more.

comment:72 Changed 9 months ago by simonmar

@Fuuzetsu: there are several known causes of non-deterministic ABIs. See the description of the ticket for more details. It's usually possible to reproduce these by compiling a package once, touch the source files, and then recompile it and watch the hashes change.

comment:73 Changed 9 months ago by Fuuzetsu

I certainly plan to investigate each of the cases mentioned in the description but if you have any specific source files you know are likely to change between compilation, I ask that you post them here: I spent quite a lot of time trying to come up with some while I'd rather be fixing the problem instead.

comment:74 Changed 9 months ago by thoughtpolice

  • Milestone changed from 7.10.1 to 7.12.1

Moving to 7.12.1 milestone; if you feel this is an error and should be addressed sooner, please move it back to the 7.10.1 milestone.

comment:75 Changed 7 months ago by juhpetersen

  • Cc juhpetersen added

comment:76 Changed 5 months ago by Fuuzetsu

Got the below in mail few days ago but I don't have time to investigate myself for next few weeks. Just putting it out there. Seems like a pretty big issue.

--

Hi,

trac seems to be down currently so I'm sending this directly to you. Using GHC 7.10, I reduced the code to this:

{-# LANGUAGE GeneralizedNewtypeDeriving #-}
module Test where
import Control.Monad.Trans.Reader
newtype EventM a = EventM { unEventM :: ReaderT () IO a } deriving
(Functor, Applicative, Monad)

Compiling with:

ghc -i../.. test.hs -O
ghc --show-iface test.hi > 1.hi
ghc -i../.. test.hs -O -fforce-recomp
ghc --show-iface test.hi > 2.hi

Now 1.hi and 2.hi differ, including in their ABI hash. I'm not sure if there are any other examples of nondeterminism in this module, since I continued reducing as long as GHC gave different interface hashes, I did not check whether the cause was actually the same.

In this case, this seems to be the most significant difference:

512a513,517
"SPEC/Test $fApplicativeReaderT_$cpure @ () @ IO" [ALWAYS] forall $dFunctor
:: Functor

(ReaderT () IO)

$dApplicative :: Applicative IO
  $fApplicativeReaderT_$cpure @ () @ IO $dFunctor $dApplicative
   = $fMonadEventM_$s$fApplicativeReaderT_$cpure
521,525d525
 "SPEC/Test $fMonadReaderT_$creturn @ () @ IO" [ALWAYS] forall
$dApplicative :: Applicative

(ReaderT () IO)
                                                               $dMonad ::
Monad IO
   $fMonadReaderT_$creturn @ () @ IO $dApplicative $dMonad
   = $fMonadEventM_$s$fMonadReaderT_$creturn

If I read this correctly, it means that in one case, a specialization rule is generated for return, will in the other case, GHC generates one for pure.

Looking further, there is also the following difference:

  $fMonadEventM_$s$fApplicativeReaderT :: Applicative (ReaderT () IO)
  {- HasNoCafRefs, Strictness: m, Inline: [ALWAYS] CONLIKE,
     Unfolding: DFun:.
                  @ (ReaderT () IO)
                  $fMonadEventM3
                  $fMonadEventM_$s$fMonadReaderT_$creturn
                  ($fApplicativeReaderT_$c<*>
                     @ ()
                     @ IO
                     $fMonadEventM3
                     $fApplicativeIO)
                  $fMonadEventM_$s$fApplicativeReaderT_$c*>
                  $fMonadEventM_$s$fApplicativeReaderT_$c<* -}
vs

  $fMonadEventM_$s$fApplicativeReaderT :: Applicative (ReaderT () IO)
  {- HasNoCafRefs, Strictness: m, Inline: [ALWAYS] CONLIKE,
     Unfolding: DFun:.
                  @ (ReaderT () IO)
                  $fMonadEventM3
                  $fMonadEventM_$s$fApplicativeReaderT_$cpure
                  ($fApplicativeReaderT_$c<*>
                     @ ()
                     @ IO
                     $fMonadEventM3
                     $fApplicativeIO)
                  $fMonadEventM_$s$fApplicativeReaderT_$c*>
                  $fMonadEventM_$s$fApplicativeReaderT_$c<* -}

(Notice the pure vs return again)

Regards, Benno

comment:77 Changed 4 months ago by puffnfresh

@Fuuzetsu: I can reproduce that problem on GHC 7.10.1 but I just built a version of master and can not with that. Does that particular instance of non-determinism look fixed in master to you?

comment:78 Changed 4 months ago by simonpj

Mateusz says: I'd like to bring some attention to ticket #4012 about non-determinism.

As many of you may know, the nix package manager distributes binaries throughout its binary caches. The binaries are shared as long as the hash of some of their inputs matches: this means that we can end up with two of the same hashes of inputs but thanks to #4012 means that the actual contents can differ. You end up with machines with some packages built locally and some elsewhere and due to non-determinism, the GHC package IDs don't line up and everything is broken.

The situation was pretty bad in 7.8.4 in presence of parallel builds so we switched those off. Joachim's a477e8118137b7483d0a7680c1fd337a007a023b helped a great deal there and we were hopeful for 7.10. Now that 7.10.1 is out and people have been using and testing it, the situation turns out to be really bad: daily we get multiple reports from people about their packages ending up broken and our only advice is to do what we did back in 7.8 days which is to purge GHC and rebuild everything locally or fetch everything from a machine that already built it all, as long as the two don't mix. This is not really acceptable.

See comment:76 on #4012 for an example of a rather simple file you can compile right now with nothing extra but -O and get different interface hash.

This e-mail is just to raise awareness that there is a serious problem. If people are thinking about cutting 7.10.2 or whatever, I would consider part of this ticket to be a blocker as it makes using GHC reliably while benefitting from binary caches pretty much impossible.

Of course there's the ‘why don't you fix it yourself’ question. I certainly plan to but do not have time for a couple more weeks to do so. For all I know right now, the fix to comment:76 might be easy and someone looking for something to hack on might have the time to get to it before I do.

Last edited 4 months ago by simonpj (previous) (diff)

comment:79 Changed 4 months ago by simonpj

Thanks for bringing this up Mateusz.

The ticket description lists four sources of non-deterministic outputs. I think you are saying that 7.10.1 has a fifth (as yet uncharacterised) source, which somehow bites more often than the preceding four. Is that right? In what way is 7.10.2 worse?

Reproducing it: Is comment:76 the single exemplar of the difficulty, or are there others? Mateusz says that we should not release 7.10 until comment:76 is fixed. Do others agree?

Does anyone feel able to characterise what is going on with comment:76?

Thanks

Simon

Last edited 4 months ago by simonpj (previous) (diff)

comment:80 Changed 4 months ago by simonpj

  • Description modified (diff)

comment:81 Changed 4 months ago by simonpj

The return/pure thing presumably arises because we have return = pure, or vice versa, and one is "shorted out" in favour of the other. See SimplCore.shortOutIndirections. Would someone like to see if this is true?

If someone cares to confirm this, then perhaps shortOutIndirections could choose whether to elminate return or eliminate pure based on a lexicographic comparison of their names?

Would someone like to see if this would work?

Simon

comment:82 Changed 4 months ago by literon

Just +1-ing to express my support for full binary determinism. Reproducible builds are important for efficiency and verification purposes in the distributed and containerized world.

comment:83 Changed 4 months ago by simonpj

  • Priority changed from normal to high

George Colpitts asks: the milestone is 7.12.1, should it be 7.10.2?

It would be unusual to delay a patch-level release for a ticket that has been open for five years.

Doing so would mean indefinitely delaying 7.10.2. By “indefinite” I don't mean "forever", I just mean “we don’t know how long”:

  • We don’t have anyone owning the ticket,
  • we don’t know how difficult it is to fix; and
  • a full bulletproof fix is probably quite challenging (e.g. forcing the unique-supply generator to give reproducible results even though some (ultimately irrelevant) bits of the environment have changes, would be difficult)

At the same time we have some folk urging us to get 7.10.2 out sooner rather than later.

So everyone:

  • If you think we should delay 7.10.2 pending at least some progress on this, please say so on the ticket.
  • Better still, roll up your sleeves and take a crack at understanding more precisely what is going on, and proposing a fix.

Simon

comment:84 Changed 4 months ago by nomeata

I agree that this should not hold up a patch level release that others are waiting for. Better fix this properly, and then, maybe, see if it can be part of a 7.10.3.

comment:85 Changed 4 months ago by Fuuzetsu

It would be unusual to delay a patch-level release for a ticket that has been open for five years.

Well, if anything we'd only be holding it up for the specialisation issue posted recently. To me at least it seems like a serious regression so it should probably be fixed in some patch release, whether it's 7.10.2 or 7.10.3 is up to whenever a fix is found.

But I also don't want to hold up any other fixes. I wish only to not end up in a situation where the final 7.10.x is released and to get much else in we have to wait for 7.12. As long as it's fixed in 7.10 somewhere, I'm happy.

comment:86 Changed 3 months ago by nomeata

I have rooted out another cause for indeterminism and the Debian reproducibilty project managed to compile GHC with a bit-wise identical output: https://reproducible.debian.net/rb-pkg/experimental/amd64/ghc.html

See Phab:D910 for a patch.

I doubt that this solves the problem completely, but it is a step towards it.

comment:87 Changed 3 months ago by thoughtpolice

  • Milestone changed from 7.12.1 to 7.10.2
  • Status changed from new to patch

comment:88 Changed 3 months ago by thoughtpolice

To be clear: I'm marking this as patch to make sure Phab:D910 is not lost and properly merged to 7.10. (It'll be remilestoned afterwords).

comment:89 Changed 3 months ago by Austin Seipp <austin@…>

In 7a82b77691fb90c4af6863673f10e454a449739e/ghc:

newTempName: Do not include pid in basename

The filename of temporary files, especially the basename of C files, can
end up in the output in some form, e.g. as part of linker debug
information. In the interest of bit-wise exactly reproducible
compilation (#4012), the basename of the temporary file no longer
contains random information (it used to ontain the process id).

This is ok, as the temporary directory used contains the pid (see
getTempDir).

This patch has been applied to the Debian package (version 7.10.1-5) and
allowed a fully bit-wise reproducible build:
https://reproducible.debian.net/rb-pkg/experimental/amd64/ghc.html

Reviewed By: austin, rwbarton

Differential Revision: https://phabricator.haskell.org/D910

GHC Trac Issues: #4012

comment:90 Changed 3 months ago by thoughtpolice

  • Milestone changed from 7.10.2 to 7.12.1
  • Owner Fuuzetsu deleted
  • Status changed from patch to new

Okay, moving back to 7.12.1 for future work. 7a82b77691fb90 is in ghc-7.10 now (as 1ff03e466c).

comment:91 Changed 3 months ago by puffnfresh

I'm going to try and start submitting data when I hit this bug. Here is a diff of the Data.ByteString.Interal interface which I had built locally vs the one in Nix's cached GHC:

https://gist.github.com/puffnfresh/fe3c86baf416114ef644#file-local-vs-cached-diff

There seems to be some slightly different strictness and unfolding. The full files are included in that gist, in case they help.

comment:92 Changed 3 months ago by Fuuzetsu

Austin, is there a reason why we're punting to 7.12? Joachim's patch doesn't affect the main complaint I had with 7.10 which was the pure/return specialisation stuff. I even said above

But I also don't want to hold up any other fixes. I wish only to not end up in a situation where the final 7.10.x is released and to get much else in we have to wait for 7.12. As long as it's fixed in 7.10 somewhere, I'm happy.

To us (nix), it's really not acceptable that we have to daily tell users to delete their packages and just try again praying that this time around something more consistent comes up.

puffnfresh, yes, great, very useful, big part of this ticket as a whole is to even find out what breaks.

comment:93 Changed 3 months ago by simonpj

Fuuzetsu: see comment:83. It's just lack of a volunteer to actually dig into this and fix it.

We can just hold up 7.10.2 until a volunteer appears, or until I get free enough to do it. Or we can push 7.10.2 out regardless (current plan).

Open source projects are like this: they rely heavily on volunteers.

Simon

comment:94 Changed 3 months ago by thoughtpolice

It wasn't really clear to me that we were still aiming to fix that for 7.10.2. If you think it should be fixed here, then sure, we can just remilestone it.

comment:95 Changed 3 months ago by thoughtpolice

Also, just a random tidbit: the (excellent!) test in comment:76 can in fact be solved by simply compiling with -fno-cse - the nondeterministic ordering of bindings is the culprit in that case, and CSE chooses return or pure based on what shirt you're wearing.

If determinism here is hurting NixOS, it may be preferrable to simply switch off CSE by default or always default to -fno-cse for every compilation. That sucks, but in the mean time it may help while we work towards a real fix (having a deterministic ordering for Uniques).

comment:96 Changed 3 months ago by simonmar

@thoughtpolice: did you check that -fno-cse works around it? I thought it might be due to shortcutting rather than actual CSE.

@Fuuzetsu we don't think there's a new regression in 7.10, but we think it is possibly more likely to happen due to the Applicative/Monad changes, because we often see pure = return. This triggers the condition in the second bullet point in the description.

I'm actively trying to convince people to work on this, because it's important. (I might end up working on it myself if I can find time). But beyond temporary workarounds, it's very unlikely that anything will happen on the 7.10 branch.

comment:97 Changed 3 months ago by puffnfresh

My gist also seems to suggest CSE is involved, right?

I can reproduce comment:76 with 7.10 but with master I can not. I'd like to look into this more but I can't find anything easily reproducible with master.

comment:98 Changed 3 months ago by thoughtpolice

Yes. I removed shortOutIndirections in SimplCore:

diff --git a/compiler/simplCore/SimplCore.hs b/compiler/simplCore/SimplCore.hs
index 0a2f8e4..95c5cce 100644
--- a/compiler/simplCore/SimplCore.hs
+++ b/compiler/simplCore/SimplCore.hs
@@ -667,7 +667,7 @@ simplifyPgmIO pass@(CoreDoSimplify max_iterations mode)
                 --
                 -- ToDo: alas, this means that indirection-shorting does not happen at all
                 --       if the simplifier does nothing (not common, I know, but unsavoury)
-           let { binds2 = {-# SCC "ZapInd" #-} shortOutIndirections binds1 } ;
+           let { binds2 = {-# SCC "ZapInd" #-} if True then binds1 else shortOutIndirections binds1 } ;
 
                 -- Dump the result of this iteration
            dump_end_iteration dflags print_unqual iteration_no counts1 binds2 rules1 ;

Then I compiled the test in comment:76 with -fno-cse and observed the ABI difference went away. So killing CSE may get us a long way for NixOS

I admit I tested this on GHC 7.10, and not the HEAD branch. Brian, you couldn't reproduce this on HEAD at all? I can double check this, but that somewhat surprises (and worries!) me.

comment:99 Changed 3 months ago by puffnfresh

Correct, I tried reproducing with HEAD about 3 weeks ago, then again yesterday but it's always consistent. I can still reproduce with 7.10.1, so yeah, it's a little worrying and would be awesome if you could check.

My results are published here:

https://gist.github.com/puffnfresh/5a841633f0dde81fe7d1

comment:100 Changed 3 months ago by simonmar

@thoughtpolice, so you disabled shortOutIndirections *and* CSE? In that case we don't know which is the cause of the difference, right? If you disable CSE by itself, do you still see the difference?

comment:101 Changed 3 months ago by thoughtpolice

I actually tested both, sorry. Just checked: yes, if I disable short-outs and CSE, it goes away. If I only disable CSE, it goes away. This is still all on GHC 7.10.

I'll test on HEAD and STABLE again when I get a chance and post the 4-way results.

comment:102 Changed 3 months ago by thoughtpolice

I actually tested both, sorry. Just checked: yes, if I disable short-outs and CSE, it goes away. If I only disable CSE, it goes away. This is still all on GHC 7.10.

I'll test on HEAD and STABLE again when I get a chance and post the 4-way results.

comment:103 Changed 3 months ago by Fuuzetsu

Another sample: parallel building aeson-0.8.1.0, we get sections like

< 1e7b52dc510d1cf5d1c7dba94ff5093f
<   $cr4ZN :: Constr
< 0d767d9de53af9e99867e765de72ede2
<   $cr4ZO :: Constr
< dc67d4c3fc9f3ce6c26b7594d8e6eae3
<   $cr4ZP :: Constr
< a2def0dbb719c6f881fc717fcb8298a4
<   $cr4ZQ :: Constr
< c6db659c87f4f30d7cd7837ade17c21a
<   $cr4ZR :: Constr
< 0d876ce7c0dcbac035d3da6455b1dd5f
<   $cr4ZS :: Constr
---
> bdb4c40eb8e399497201a7ebb5b21b8e
>   $cr5fY :: Constr
> a1b0220e85ae58edc13e93774d8e283f
>   $cr5fZ :: Constr
> 7d4faa76b1cc5a7319b59dca13194f80
>   $cr5g0 :: Constr
> 7fdf605be2abfeb55bab02c7c6b2124d
>   $cr5g1 :: Constr
> 05b9af850c2c1ed35be735adcbdb7f6d
>   $cr5g2 :: Constr
> bb94af6dd870df350419f5d32d4917cf
>   $cr5g3 :: Constr

which consequently screws up everything else. These are presumably generated by Data deriving. Attaching two sample interface files with the above, it was built with 7.10.1. In non-concurrent case, no issues.

Changed 3 months ago by Fuuzetsu

Differing Constr names 1

Changed 3 months ago by Fuuzetsu

Differing Constr names 2

comment:104 Changed 3 months ago by Fuuzetsu

Oh, by the way, there was a small experiment ran by nix users to judge the impact of this issue on parallel builds. You can find the results at https://github.com/peti/ghc-library-id-bug

comment:105 Changed 3 months ago by simonmar

  • Owner set to niteria

@niteria is going to look at this, with help from me.

comment:106 Changed 3 months ago by lelf

  • Cc lelf added

comment:107 Changed 2 months ago by cheecheeo

  • Cc cheecheeo@… added

comment:108 Changed 2 months ago by jb55

  • Cc bill@… added

comment:109 Changed 7 weeks ago by niteria

  • Differential Revisions set to D1073
  • Status changed from new to patch

comment:110 Changed 7 weeks ago by slyfox

  • Differential Revisions changed from D1073 to Phab:D1073

comment:111 Changed 7 weeks ago by Simon Marlow <marlowsd@…>

In 3448f9827d2364b91bc60134aa38b74849b9d54e/ghc:

Reduce non-determinism in ABI hashes with RULES and instance decls

Summary:
Before this change the `RULES` would be attached to one for the names from
the module that appear on the left hand side. The choice depended on the
`uniq` that was generated, which are known to be non-deterministic (a
separate, bigger problem). Now we use `OccName`s which should be stable.

Analogously for instance declarations, but they are attached to one of
the types involved.

Test Plan:
contbuild
it made `Data.Text.Internal.Fusion.Common` interface stable, previously
stream fusion rule would be attached either to `streamList` or
`unstreamList` depending on if the module was compiled after `cabal
clean` or after `find | grep '\.o$' | xargs rm`.

Reviewers: simonpj, austin, bgamari, simonmar

Subscribers: puffnfresh, thomie

Differential Revision: https://phabricator.haskell.org/D1073

GHC Trac Issues: #4012

comment:112 Changed 6 weeks ago by Simon Peyton Jones <simonpj@…>

In 7ec07e40/ghc:

Slight refactoring to the fix for #4012

Add CoreSyn.chooseOrphanAnchor, and use it

comment:113 Changed 6 days ago by thoughtpolice

  • Differential Revisions changed from Phab:D1073 to Phab:D910, Phab:D1073

comment:114 Changed 5 days ago by simonmar

  • Differential Revisions changed from Phab:D910, Phab:D1073 to Phab:D910, Phab:D1073, Phab:D1133

comment:115 Changed 4 days ago by niteria

  • Differential Revisions changed from Phab:D910, Phab:D1073, Phab:D1133 to Phab:D910, Phab:D1073, Phab:D1133, Phab:D1192

comment:116 Changed 4 days ago by Simon Marlow <marlowsd@…>

In 10a0775/ghc:

Anchor type family instances deterministically

Summary:
This is very similar to D1073. It makes type family instances to be
attached to a binding with a least `OccName`, therefore not depending on `Unique` ordering.

Test Plan:
* this makes `Language.Haskell.Exts.SrcLoc` deterministic
* ./validate

Reviewers: simonmar, austin, bgamari, simonpj

Reviewed By: simonpj

Subscribers: thomie

Differential Revision: https://phabricator.haskell.org/D1192

GHC Trac Issues: #4012
Note: See TracTickets for help on using tickets.