Opened 5 years ago

Last modified 3 years ago

#8648 new bug

Initialization of C statics broken in threaded runtime

Reported by: edsko Owned by: simonmar
Priority: normal Milestone:
Component: Runtime System Version: 7.7
Keywords: Cc: simonmar, snoyberg
Operating System: Unknown/Multiple Architecture: Unknown/Multiple
Type of failure: None/Unknown Test Case:
Blocked By: Blocking:
Related Tickets: Differential Rev(s):
Wiki Page:

Description

Consider a tiny package static-value, consisting of one Haskell file

foreign import ccall unsafe "returnStaticValue" c_returnStaticValue :: IO CInt

printStaticValue :: IO () 
printStaticValue = print =<< c_returnStaticValue

and one corresponding C file

static int theStaticValue = 0;

int returnStaticValue() {
  // Modify the static so the C compiler doesn't optimize it away
  return theStaticValue++;
}

(test case is attached). If we call printStaticValue using the GHC API:

runGhc (Just libdir) $ do
    flags0 <- getSessionDynFlags
    void $ setSessionDynFlags flags0 {
        hscTarget = HscInterpreted
      , ghcLink   = LinkInMemory
      , ghcMode   = CompManager
      }

    setContext $ [ IIDecl $ simpleImportDecl $ mkModuleName "StaticValue" ]
    _ <- runStmt "StaticValue.printStaticValue" RunToCompletion

then we see "0", as expected. However, if we compile this code using the threaded runtime, and we wrap the above code in a call to either forkIO or forkOS, then we see a different value printed (-907777, whatever that value is).

Some notes:

  • I have been unable to reproduce this bug without using GHC as API; in particular, calling printStaticValue directly, wrapped in forkIO or forkOS or not, always works as expected.
  • If I change the initialization value of staticValue from 0 to anything else (say, 1234), we always get the right answer, never the uninitialized value. Presumably this is because non-zero values require some explicit code to be run (and it does get run), while a zero value gets initialized differently (and apparently, that's where the bug is).
  • I have reproduced this bug in both ghc 7.4 and 7.7.20131227.

This ticket is the result of tracking down a problem with calling createProcess from within the GHC API, which would cause the parent process to stall. As it turns out, runProcess.c (from the process library) declares a static long max_fd = 0, and in runInteractiveProcess checks for this value to be 0, and if it is, does a syscall to figure out what the maximum FD is. But since this static does not get initialized properly (the bug reported in this ticket), it gets left at its (random? but always the same) value (281474975802879), so that the child process proceeds to close rather too many file descriptors (if close_fds was set to True) and the parent stalls. Indeed, changing the initialization to static long max_fd = -1 (and adjusting the later check for zero accordingly) fixes this (so this is a viable workaround in process if we cannot track down the bug in GHC).

Attachments (2)

T8648.hs (907 bytes) - added by edsko 5 years ago.
static-value-0.1.0.0.tar.gz (621 bytes) - added by edsko 5 years ago.

Download all attachments as: .zip

Change History (7)

Changed 5 years ago by edsko

Attachment: T8648.hs added

comment:1 Changed 5 years ago by edsko

I should have mentioned, I can only reproduce this on Linux; on OSX I always get the right answer.

Changed 5 years ago by edsko

Attachment: static-value-0.1.0.0.tar.gz added

comment:2 Changed 5 years ago by duncan

Also note that it works when the static-value package is built as a dynamic lib and the top level exe uses dynamic libs.

So what it looks like is that the ghci linker is not zeroing the memory allocated for the zero-init (.bss) section from the object files.

Of course, we know the linker does have code to zero the .bss (it uses calloc), and it works when not using forkIO.

Last edited 5 years ago by duncan (previous) (diff)

comment:3 Changed 4 years ago by snoyberg

Cc: snoyberg added

comment:4 Changed 4 years ago by simonmar

I *think* this bug is fixed by Phab:D975. There were bugs in the way we were allocating the BSS segment for dynamically linked code.

comment:5 Changed 3 years ago by simonmar

This now just needs a test.

Note: See TracTickets for help on using tickets.