Consider a tiny package static-value, consisting of one Haskell file

foreign import ccall unsafe "returnStaticValue" c_returnStaticValue :: IO CInt

printStaticValue :: IO () 
printStaticValue = print =<< c_returnStaticValue

and one corresponding C file

static int theStaticValue = 0;

int returnStaticValue() {
  // Modify the static so the C compiler doesn't optimize it away
  return theStaticValue++;

(test case is attached). If we call printStaticValue using the GHC API:

runGhc (Just libdir) $ do
    flags0 <- getSessionDynFlags
    void $ setSessionDynFlags flags0 {
        hscTarget = HscInterpreted
      , ghcLink   = LinkInMemory
      , ghcMode   = CompManager

    setContext $ [ IIDecl $ simpleImportDecl $ mkModuleName "StaticValue" ]
    _ <- runStmt "StaticValue.printStaticValue" RunToCompletion

then we see "0", as expected. However, if we compile this code using the threaded runtime, and we wrap the above code in a call to either forkIO or forkOS, then we see a different value printed (-907777, whatever that value is).

Some notes:

  • I have been unable to reproduce this bug without using GHC as API; in particular, calling printStaticValue directly, wrapped in forkIO or forkOS or not, always works as expected.
  • If I change the initialization value of staticValue from 0 to anything else (say, 1234), we always get the right answer, never the uninitialized value. Presumably this is because non-zero values require some explicit code to be run (and it does get run), while a zero value gets initialized differently (and apparently, that's where the bug is).
  • I have reproduced this bug in both ghc 7.4 and 7.7.20131227.

This ticket is the result of tracking down a problem with calling createProcess from within the GHC API, which would cause the parent process to stall. As it turns out, runProcess.c (from the process library) declares a static long max_fd = 0, and in runInteractiveProcess checks for this value to be 0, and if it is, does a syscall to figure out what the maximum FD is. But since this static does not get initialized properly (the bug reported in this ticket), it gets left at its (random? but always the same) value (281474975802879), so that the child process proceeds to close rather too many file descriptors (if close_fds was set to True) and the parent stalls. Indeed, changing the initialization to static long max_fd = -1 (and adjusting the later check for zero accordingly) fixes this (so this is a viable workaround in process if we cannot track down the bug in GHC).

comment:2 Changed 5 years ago by duncan

Also note that it works when the static-value package is built as a dynamic lib and the top level exe uses dynamic libs.

So what it looks like is that the ghci linker is not zeroing the memory allocated for the zero-init (.bss) section from the object files.

Of course, we know the linker does have code to zero the .bss (it uses calloc), and it works when not using forkIO.

snoyberg added

Cc: snoyberg added

comment:4 Changed 4 years ago by simonmar

I *think* this bug is fixed by Phab:D975. There were bugs in the way we were allocating the BSS segment for dynamically linked code.

This now just needs a test.

This now just needs a test.

